How to parse mathML in output of WordOpenXML?

calendar_today Asked May 26, 2013
thumb_up 6 upvotes
history Updated April 16, 2026

Question posted 2013 · +4 upvotes

I want to read only the xml used for generating equation, which i obtained by using Paragraph.Range.WordOpenXML. But the section used for the equation is not as per MathML which as i found that the Equation of microsoft is in MathML.

Do I need to use some special converter to get desired xmls or are there any other methods?

Accepted answer +6 upvotes

You could use the OMML2MML.XSL file (located under %ProgramFiles%Microsoft OfficeOffice15) to transform Microsoft Office MathML (equations) included in a word document into MathML.

The code below shows how to transform the equations in a word document into MathML using the following steps:

  1. Open the word document using OpenXML SDK (version 2.5).
  2. Create a XslCompiledTransform and load the OMML2MML.XSL file.
  3. Transform the word document by calling the Transform() method on the created XslCompiledTransform instance.
  4. Output the result of the transform (e.g. print on console or write to file).

I’ve tested the code below with a simple word document containing two equations, text and pictures.

using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;

public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
    string officeML = string.Empty;
    using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
    {
        string wordDocXml = doc.MainDocumentPart.Document.OuterXml;

        XslCompiledTransform xslTransform = new XslCompiledTransform();

        // The OMML2MML.xsl file is located under 
        // %ProgramFiles%Microsoft OfficeOffice15
        xslTransform.Load(@"c:Program FilesMicrosoft OfficeOffice" + officeVersion + @"OMML2MML.XSL");

        using (TextReader tr = new StringReader(wordDocXml))
        {
            // Load the xml of your main document part.
            using (XmlReader reader = XmlReader.Create(tr))
            {
                using (MemoryStream ms = new MemoryStream())
                {
                    XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

                    // Configure xml writer to omit xml declaration.
                    settings.ConformanceLevel = ConformanceLevel.Fragment;
                    settings.OmitXmlDeclaration = true;

                    XmlWriter xw = XmlWriter.Create(ms, settings);

                    // Transform our OfficeMathML to MathML.
                    xslTransform.Transform(reader, xw);
                    ms.Seek(0, SeekOrigin.Begin);

                    using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
                    {
                        officeML = sr.ReadToEnd();
                        // Console.Out.WriteLine(officeML);
                    }
                }
            }
        }
    }
    return officeML;
}

To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML property of this node. The code below shows how to query for the first math paragraph:

string mathParagraphXml = 
      doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;

Use the returned XML to feed the TextReader.

2 code variants in this answer

  • Variant 1 — 48 lines, starts with using System.IO;
  • Variant 2 — 2 lines, starts with string mathParagraphXml =

Word VBA objects referenced (5)

Top ms-word Q&A (6)

+6 upvotes ranks this answer #25 out of 31 ms-word solutions on this site .