Why does the file utility identify Microsoft Word files as CDF? What is this CDF?

calendar_today Asked Feb 6, 2011
thumb_up 15 upvotes
history Updated April 16, 2026

Question posted 2011 · +15 upvotes

I have some old Microsoft Word files (probably Word 97) lying around here and noticed that the standard Unix file utility identifies such files as “CDF”. It is actually more precise, dumping detailed meta data, for example:

CDF V2 Document,
Little Endian, 
Os: Windows, 
Version 4.0, 
Code page: 1252, 
Title: ..., 
Author: ..., 
Template: Normal.dot, 
Last Saved By: ..., 
Revision Number: 1, 
Name of Creating Application: Microsoft Word 8.0, 
Create Time/Date: ..., 
Last Saved Time/Date: ..., 
Number of Pages: 1, 
Number of Words: 95, 
Number of Characters: 542, 
Security: 0

What does that CDF stand for? Is that kind of a general container format, like RIFF for media files? I can’t find anything useful on the web. “Channel Definition Format” and “Compound Document Format” are clearly not meant, as those Microsoft Word files are completely binary. For Common Data Format I can’t find a connection. I tried to find something in the sourcecode of the file util (in the version which comes with FreeBSD), but I could only find out that it has a dedicated readcdf.c which deals with this format.

Accepted answer +15 upvotes

Compound Documents format is related to OLE/COM. It refers to linking and embedding objects, for example, Excel charts in Word documents.

See the historical (pre-XML) document specifications for MS Office, and the specific file format description is “Windows Compound Binary File Format Specification”.

Top ms-word Q&A (6)

+15 upvotes ranks this answer #3 out of 31 ms-word solutions on this site — top 10%.