Question posted 2009 · +5 upvotes
I’m wondering how you clean the special characters that MS Word as, such as m- and n-dashes and curly quotes?
I often find myself copying content from clients from Word and pasting into a static HTML page, but the content ends up with weird characters because the special characters are not converted to their correct ACSII codes and therefore show up as garbled text. (For these basic websites, I’m using Dreamweaver.)
I have seen a lot of similar problems when clients copy content from Word into text only fields (mostly textareas). When I put this into a PDF (through PHP) or it shows up on the page it too has garbled text.
How do you deal with this? Is there a cleaning service or program you use?
Accepted answer +7 upvotes
With regards to clients posting copy/pasted text from Word in textareas:
The most reliable way to ensure that the client sends you text in any particular encoding (thus hopefully doing any conversion from CP-1252 [or whatever Word uses] for you), is to add the accept-charset="..." attribute to all your <form>s. E.g.:
<form ... accept-charset="UTF-8">
...
</form>
Most browsers will obey that and make sure any “Word-specific” characters are converted to the appropriate character set before it gets to your website.
Once invalid text gets to your website, there’s very little you can do to fix it reliably, so it’s best to simply check all input for being valid in whatever character set you use, and discard any requests that have invalid text. This is necessary even with accept-charset, because undoubtedly there are some clients out there that will ignore it.
Top ms-word Q&A (6)
- XML – adding new line +19 (2012)
- How to open and manipulate Word document/template in Java? +18 (2012)
- Why does the file utility identify Microsoft Word files as CDF? What is this CDF? +15 (2011)
- Version Control for word documents +13 (2008)
- programatically convert word docx to doc without using ole automation +13 (2008)
- What makes Microsoft-Word-generated HTML documents so large in code? +12 (2015)
ms-word solutions on this site
.