The Problem (Q-score 4, ranked #26th of 32 in the Word VBA archive)
The scenario as originally posted in 2012
Possible Duplicate:
What is the best free way to clean up Word HTML?
PHP to clean-up pasted Microsoft input
I allow clients to enter notes in a rich text editor, and have only recently upgraded to ckEditor 3x, which strips MS word classes, styles, and comments by default (when users paste into the editor object). So moving forward I’m all set.
I’ve recently had a need to clean up 5 years worth of notes some of which have MS word generated HTML embedded. I need to loop through this body of text and clean it.
I do not need to strip out all span tags, only those identified as written by Microsoft.
I’ve tried using HTMLCleaner, but it is not removing the MS generated HTML. http://word2cleanhtml.com does exactly what I want, however the developers are currently not offering the API for public use (as of July 9, 2012).
I’ve looked for such a class off and on for the last few weeks and am not having much luck. Have any of you found a useful class you’d like to share?
Why community consensus is tight on this one
Across 32 Word VBA entries in the archive, the accepted answer here holds solid answer (above median) status — meaning voters are unusually aligned on the right fix.
The Verified Solution — solid answer (above median) (+7)
Advisory answer — community consensus with reference links
Note: the verified answer below is a reference / advisory response rather than a copy-ready snippet.
This will do what you want.
When to Use It — vintage (14+ years old, pre-2013)
Ranked #26th in its category — specialized fit
This pattern sits in the 63% tail relative to the top answer. Reach for it when your scenario closely matches the question title; otherwise browse the Word VBA archive for a higher-consensus alternative.
What changed between 2012 and 2026
The answer is 14 years old. The Word VBA object model has been stable across Office 2013, 2016, 2019, 2021, 365, and 2024/2026 LTSC, so the pattern still compiles. Changes that might affect you: 64-bit API declarations (use PtrSafe), blocked macros in downloaded files (Mark-of-the-Web), and the shift toward Office Scripts for web-first workflows.