What makes Microsoft-Word-generated HTML documents so large in code?

calendar_today Asked May 25, 2015
thumb_up 12 upvotes
history Updated April 14, 2026

Direct Answer

Name space of Word: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word"…. This is a 6-line Word VBA snippet, ranked #9th of 32 by community upvote score, from 2015.


The Problem (Q-score 7, ranked #9th of 32 in the Word VBA archive)

The scenario as originally posted in 2015

Below is a simple W3C-validated code to print “Hello World”:

<!DOCTYPE html>
<html>
<head>
<meta charset = "utf-8">
<title>Hello</title>
</head>
Hello World
</html> 

But when I do the same thing with MS Word, the code generated is of 449 lines Why do all these extra lines appear in the code?

Why community consensus is tight on this one

Across 32 Word VBA entries in the archive, the accepted answer here holds strong answer (top 25 %%) status — meaning voters are unusually aligned on the right fix.


The Verified Solution — strong answer (top 25 %%) (+12)

6-line Word VBA pattern (copy-ready)

Name space of Word:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">

Word keep meta datas informations:

<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Author>xxxxxx</o:Author>
  <o:LastAuthor>xxxxx</o:LastAuthor>
  <o:Revision>2</o:Revision>
  <o:TotalTime>0</o:TotalTime>
  <o:Created>2015-05-25T11:40:00Z</o:Created>
  <o:LastSaved>2015-05-25T11:40:00Z</o:LastSaved>
  <o:Pages>1</o:Pages>
  <o:Words>1</o:Words>
  <o:Characters>11</o:Characters>
  <o:Company>Sopra Group</o:Company>
  <o:Lines>1</o:Lines>
  <o:Paragraphs>1</o:Paragraphs>
  <o:CharactersWithSpaces>11</o:CharactersWithSpaces>
  <o:Version>12.00</o:Version>
 </o:DocumentProperties>
</xml><![endif]-->

Word add a css style:

<style>
<!--
 /* Font Definitions */
 @font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;
    mso-font-charset:0;
    mso-generic-font-family:roman;
    mso-font-pitch:variable;
    mso-font-signature:-536870145 1107305727 0 0 415 0;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;
    mso-font-charset:0;
    mso-generic-font-family:swiss;
    mso-font-pitch:variable;
    mso-font-signature:-536870145 1073786111 1 0 415 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
    {mso-style-unhide:no;
    mso-style-qformat:yes;
    mso-style-parent:"";
    margin-top:0cm;
    margin-right:0cm;
    margin-bottom:10.0pt;
    margin-left:0cm;
    line-height:115%;
    mso-pagination:widow-orphan;
    font-size:11.0pt;
    font-family:"Calibri","sans-serif";
    mso-ascii-font-family:Calibri;
    mso-ascii-theme-font:minor-latin;
    mso-fareast-font-family:Calibri;
    mso-fareast-theme-font:minor-latin;
    mso-hansi-font-family:Calibri;
    mso-hansi-theme-font:minor-latin;
    mso-bidi-font-family:"Times New Roman";
    mso-bidi-theme-font:minor-bidi;
    mso-fareast-language:EN-US;}
.MsoChpDefault
    {mso-style-type:export-only;
    mso-default-props:yes; ......

Word use the css style:

<p class=MsoNormal>Hello World</p>

You need to keep this information if you need to modify it in future. If you are doing a simple export, you can delete all metadatas.


When to Use It — classic (2013–2016)

A top-10 Word VBA pattern — why it still holds up

Ranks #9th of 32 in the Word VBA archive. The only pattern ranked immediately above it is “Word VBA – Eliminate Floating Object Tables” — compare both if you’re choosing between approaches.

What changed between 2015 and 2026

The answer is 11 years old. The Word VBA object model has been stable across Office 2013, 2016, 2019, 2021, 365, and 2024/2026 LTSC, so the pattern still compiles. Changes that might affect you: 64-bit API declarations (use PtrSafe), blocked macros in downloaded files (Mark-of-the-Web), and the shift toward Office Scripts for web-first workflows.

help
Frequently Asked Questions

Why does this sit in the top quartile of Word VBA answers?
expand_more

Answer score +12 vs the Word VBA archive median ~4; this entry is strong. The score plus 7 supporting upvotes on the question itself (+7) means the asker and 11 subsequent voters all validated the approach.

Does the 6-line snippet run as-is in Office 2026?
expand_more

Yes. The 6-line pattern compiles on Office 365, Office 2024, and Office LTSC 2026. Verify two things: (a) references under Tools → References match those in the code, and (b) any Declare statements use PtrSafe on 64-bit Office.

Published around 2015 — what’s changed since?
expand_more

Published 2015, which is 11 year(s) before today’s Office 2026 build. The Word VBA object model has had no breaking changes in that window. Three things to re-test: (1) blocked macros on downloaded files (Mark-of-the-Web), (2) 64-bit API declarations (PtrSafe, LongPtr), (3) any shift toward Office Scripts for web scenarios.

Which Word VBA pattern ranks just above this one at #8?
expand_more

The pattern one rank above is “Word VBA – Eliminate Floating Object Tables”. If your use case overlaps, compare both before committing.

Data source: Community-verified Q&A snapshot. Q-score 7, Answer-score 12, original post 2015, ranked #9th of 32 in the Word VBA archive. Last regenerated April 14, 2026.