Question posted 2011 · +15 upvotes
I have some old Microsoft Word files (probably Word 97) lying around here and noticed that the standard Unix file utility identifies such files as “CDF”. It is actually more precise, dumping detailed meta data, for example:
CDF V2 Document,
Little Endian,
Os: Windows,
Version 4.0,
Code page: 1252,
Title: ...,
Author: ...,
Template: Normal.dot,
Last Saved By: ...,
Revision Number: 1,
Name of Creating Application: Microsoft Word 8.0,
Create Time/Date: ...,
Last Saved Time/Date: ...,
Number of Pages: 1,
Number of Words: 95,
Number of Characters: 542,
Security: 0
What does that CDF stand for? Is that kind of a general container format, like RIFF for media files? I can’t find anything useful on the web. “Channel Definition Format” and “Compound Document Format” are clearly not meant, as those Microsoft Word files are completely binary. For Common Data Format I can’t find a connection. I tried to find something in the sourcecode of the file util (in the version which comes with FreeBSD), but I could only find out that it has a dedicated readcdf.c which deals with this format.
Accepted answer +15 upvotes
Compound Documents format is related to OLE/COM. It refers to linking and embedding objects, for example, Excel charts in Word documents.
See the historical (pre-XML) document specifications for MS Office, and the specific file format description is “Windows Compound Binary File Format Specification”.
Top ms-word Q&A (6)
- XML – adding new line +19 (2012)
- How to open and manipulate Word document/template in Java? +18 (2012)
- Version Control for word documents +13 (2008)
- programatically convert word docx to doc without using ole automation +13 (2008)
- What makes Microsoft-Word-generated HTML documents so large in code? +12 (2015)
- SaveAs vs SaveAs2 in the Microsoft Office Word object model +11 (2010)
ms-word solutions on this site
— top 10%.