how to convert word to XML on server side automatically?

Here is my problem: My organization wants to upload word documents from users to the server. On the server side, the word dcoument (enforced with styles) needs to be converted to XML format files. Next,I need to use php to parse the open xml formats files and put the content into the database. Does anyone know how to convert word to XML on server side automatically?Is there any API or sample codes for php to parse Open XML Formats? Your suggestions are appreciated.

[580 byte] By [ecihewu] at [2008-2-15]
# 1

Hi,

what do you mean? Are your documents in the old binary format? If so, why? You can have the users save the document in the XML format and you will need no conversion. Microsoft Office supports this from 2000 onwards.

Wouter

WoutervanVugt at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...
# 2
Hi, thanks for your reply. It is just regular .doc document. However, we don't want users to save the document into XML format. We want to convert the .doc file to XML format on the web server side after users submit their .doc file. Is there any way to programmatically convert the .doc file to XML format on the web server side?

ecihewu at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...
# 3
Did you come up with a solution you yet? I'm looking to do the same kind of thing.

barsh at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...
# 4

If the Web Server can host the Microsoft Word process (Winword.exe) you can - using only one instance of it - to open and save the document from .doc to .docx or .xml efficiently.

All you need is to monitor the folder/s, open, save (& perhaps delete useless files).

Open and parse the binary seems insane to me.

MauricioG at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...
# 5

Funny one of my Comapny's Clients also has the same problem - there must be an epidemic - but I was directed to Doug Mahugh's blog posting http://blogs.msdn.com/dmahugh/archive/2007/02/09/converting-office-documents-to-open-xml.aspx

wherein he decribes how to use the OFC.exe command line utility to convert from doc to docx format.

What I intend doing is writing a windows service to wait on delivery of any doc files - intercept them - run the converter - use a small utilty to extract the document.xml file out as text using the new SDK ( a technique I gleamed from perusing Wouter's Blog and his Great Book). Then I'll be able to do anything I like with the resultant XML - I know nothing about PHP so I'd love to chat about that - in my case I'm going to be using the XLinq capability of VB9.

MDLohan at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...
# 6

Great!

OFC would be an option, and a windows service built with .Net should pack it all. Smile

MauricioG at 2007-10-2 > top of Msdn Tech,Office Live Development,Microsoft SDK for Open XML Formats...