What makes Microsoft-Word-generated HTML documents so large in code?

calendar_today Asked May 25, 2015
thumb_up 12 upvotes
history Updated April 16, 2026

Question posted 2015 · +7 upvotes

Below is a simple W3C-validated code to print “Hello World”:

<!DOCTYPE html>
<html>
<head>
<meta charset = "utf-8">
<title>Hello</title>
</head>
Hello World
</html> 

But when I do the same thing with MS Word, the code generated is of 449 lines Why do all these extra lines appear in the code?

Accepted answer +12 upvotes

Name space of Word:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">

Word keep meta datas informations:

<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Author>xxxxxx</o:Author>
  <o:LastAuthor>xxxxx</o:LastAuthor>
  <o:Revision>2</o:Revision>
  <o:TotalTime>0</o:TotalTime>
  <o:Created>2015-05-25T11:40:00Z</o:Created>
  <o:LastSaved>2015-05-25T11:40:00Z</o:LastSaved>
  <o:Pages>1</o:Pages>
  <o:Words>1</o:Words>
  <o:Characters>11</o:Characters>
  <o:Company>Sopra Group</o:Company>
  <o:Lines>1</o:Lines>
  <o:Paragraphs>1</o:Paragraphs>
  <o:CharactersWithSpaces>11</o:CharactersWithSpaces>
  <o:Version>12.00</o:Version>
 </o:DocumentProperties>
</xml><![endif]-->

Word add a css style:

<style>
<!--
 /* Font Definitions */
 @font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;
    mso-font-charset:0;
    mso-generic-font-family:roman;
    mso-font-pitch:variable;
    mso-font-signature:-536870145 1107305727 0 0 415 0;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;
    mso-font-charset:0;
    mso-generic-font-family:swiss;
    mso-font-pitch:variable;
    mso-font-signature:-536870145 1073786111 1 0 415 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
    {mso-style-unhide:no;
    mso-style-qformat:yes;
    mso-style-parent:"";
    margin-top:0cm;
    margin-right:0cm;
    margin-bottom:10.0pt;
    margin-left:0cm;
    line-height:115%;
    mso-pagination:widow-orphan;
    font-size:11.0pt;
    font-family:"Calibri","sans-serif";
    mso-ascii-font-family:Calibri;
    mso-ascii-theme-font:minor-latin;
    mso-fareast-font-family:Calibri;
    mso-fareast-theme-font:minor-latin;
    mso-hansi-font-family:Calibri;
    mso-hansi-theme-font:minor-latin;
    mso-bidi-font-family:"Times New Roman";
    mso-bidi-theme-font:minor-bidi;
    mso-fareast-language:EN-US;}
.MsoChpDefault
    {mso-style-type:export-only;
    mso-default-props:yes; ......

Word use the css style:

<p class=MsoNormal>Hello World</p>

You need to keep this information if you need to modify it in future. If you are doing a simple export, you can delete all metadatas.

4 code variants in this answer

  • Variant 1 — 5 lines, starts with <html xmlns:v="urn:schemas-microsoft-com:vml"
  • Variant 2 — 18 lines, starts with <!--[if gte mso 9]><xml>
  • Variant 3 — 42 lines, starts with <style>
  • Variant 4 — 1 lines, starts with <p class=MsoNormal>Hello World</p>

Top ms-word Q&A (6)

+12 upvotes ranks this answer #6 out of 31 ms-word solutions on this site — top 19%.