Who never had the urge to convert one or more MS Word DOC and DOCX files into a PDF at least once? Truth to be told, it wasn't that trivial back in the day: until the release of Office 2010, when the PDF extension appeared among the various formats supported by the Save As... command, using Ghostscript-based software or installing PDF printer drivers was the only way to go.
After Office 2010 the problem was finally solved even for the average user, with the sole exception that he still has to have MS Office installed on his machine. Those who didn't have it can continue to use the aforementioned free alternatives ond purchase a software that will take care of the job for them.
What about doing that in a programmatic approach? What if we are developing a C# application and we need to convert some DOC or DOCX files into PDF, thus making then available to download without giving the source document to the users, possibly without having to waste an Office license to our web server/web publishing machine?
The answer, still MS-branded, comes by the name of Microsoft Office primary interop assemblies (PIAs), aka Microsoft Office Interop. Specifically, to work with Word files, you're going to need the Microsoft.Office.Interop.Word.dll. If you're using Visual Studio, you can get it from NuGet and attach to your application using the Package Explorer, otherwise you will have to download and install the official distribution package.
As soon as you do that, you'll be able to open and edit any MS Word document from the FileSystem or from a Byte Array, as explained in this post. Here's a brief example showing what you can do:
1 2 3 4 5 6 7 8 9 10 11 |
// NS alias to avoid writing the required namespace all the time using word = Microsoft.Office.Interop.Word; // [...] Application app = new word.Application(); Document doc = app.Documents.Open(filePath); doc.SaveAs2("path-to-pdf-file.pdf", word.WdSaveFormat.wdFormatPDF); doc.Close(); app.Quit(); |
Alternatively, if you don't like the SaveAs2 method, you can use the ExportAsFixedFormat() method instead and achieve a nearly identical result:
1 |
doc.ExportAsFixedFormat(tmpFile, WdExportFormat.wdExportFormatPDF); |
It's worth noting that everything we said about MS Word can also be done with the other software contained within the MS Office bundle such as MS Excel, MS Powerpoint and more.
IMPORTANT: Do not underestimate the call to
app.Quit()! If you don't do that, the MS Word instance will be left open on your server (see this thread on StackOverflow for more info on that issue). If you want to be sure to avoid such dreadful scenario entirely you should strengthen the given implementation adding a try/catch fallback strategy such as the follow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Application app = null; Document doc = null; try { app = new word.Application(); doc = Document doc = app.Documents.Open(filePath); // .. do your stuff here ... doc.Close(); app.Quit(); } catch (Exception e) { if (doc != null) doc.Close(); if (app != null) app.Quit(); } |
Unfortunately these objects don't implement
IDisposable, otherwise it would've been even easier.
That's about it: happy converting!