February 1st, 2012, 07:59 PM
Microsoft Word Document Manipulation
I was wondering if there is a better way of working with .doc/.docx Microsoft Word documents with ColdFusion 9. I want to be able to manipulate wildcards inside of the documents and also fill in checkboxes and such. I know all of this can be done with Microsoft Access via using VBA.
I know how to do a file read of an .rtf document and finding/replacing wildcards within the document, then outputting the document via cfheader/cfcontent. But this is extremely tedious and a pain to do, also converting the documents from .doc to .rtf makes the file three times as big.
Also, does anybody know if ColdFusion X (10) will have better integration with Microsoft Office?
February 1st, 2012, 11:57 PM
ColdFusion Zeus is still in closed alpha so I'm afraid can't comment on what may be coming in the next version yet (partly because it isn't complete yet). But I don't believe CF 9 has this built into it. As you mentioned, you can use the Apache POI library to do things like this. Java experience would help, but most of the effort would be in learning and using the POI API. There are also a number of .NET libraries that will do this, but in the same vein as POI, it would probably be difficult without at least a bit of C# experience and the ability to use the library APIs.
February 2nd, 2012, 02:52 PM
.. and aside from the learning curve, POI's word package is not nearly as mature as the excel package. So I probably would not recommend it for this task.
February 2nd, 2012, 07:58 PM
I think what I am going to have to end up doing is converting all of these word docs into HTML (and trying to mimic their original format as best as I can). Then manipulating them with Coldfusion and shooting them out as PDF documents. This will probably take me a long time to do, but I don't see any better alternative.
I don't think this is at fault to ColdFusion, since I don't think any server-side language has really any prebuilt features when it comes to Office Document manipulation besides the .Net languages (since it's in their family of products like PDF is in Adobe's).
February 2nd, 2012, 08:38 PM
Just to expand on that, even in .NET you'll likely end up falling back to third-party libraries for this. I've actually done exactly what you're talking about in a .NET web application (using C#) and used Aspose.Words.
February 7th, 2012, 06:05 PM
Looks like converting doc/docx into HTML is destroying the format. So can Apache POI manipulate Word docs (such as marking a checkbox marked)? I was looking on their website and they do not have much information in regards to HWPF/XWPF... even their project plan link is dated 2003... any other open source alternative libraries/API's out there?
Aspose.Words looks like it can do what I want, but no way would my company pay $3,000 for a Java library to be used on one project.
I wonder if it is possible to read/write to placeholders if I convert the documents into PDF via CFDocument/CFPDF?
February 7th, 2012, 09:10 PM
I don't think CFPDF goes to that level of detail but I believe something like iText probably would.
February 10th, 2012, 09:45 AM
Looks like the open source Java library Docx4J can do what I need (it can even convert to PDF as well).
Does anybody know how I would go about using this Java library in ColdFusion? Or would I need to do most of it in Java then load it up in ColdFusion? I'm completely new at using external Java libraries, and I'm willing to learn enough Java to get this thing to work.
February 10th, 2012, 10:45 AM
I tried docx4J a while back and was never able to get it to work from CF. Nothing against the library. However, it unfortunately uses the jaxb library (which is built into CF already) and I could never get past all the class loader conflicts.
February 10th, 2012, 11:05 AM
Was this with ColdFusion 9?
If ColdFusion can't play well with Docx4J, and Docx4J is based on Microsoft's OpenXML SDK (which I'm guessing is .Net), is there a way to use OpenXML instead with ColdFusion?
February 10th, 2012, 11:52 AM
No, it was with CF8. I never bothered re-testing under CF9 because it includes jaxb too, and I decided I was not up for another round of jar hell.
I have only used OpenXML tangentially (old document viewer). So I do not know if it supports what you need to do.
February 10th, 2012, 01:41 PM
February 10th, 2012, 02:13 PM
Oh believe me I tried the javaLoader too, but no joy. If memory serves the tricky part was there were multiple versions of jaxb in the class path to begin with - 1.x/2.x. Something about it being pre-bundled with sun's jvm after a certain point. I cannot remember all the details. Just that it was not the typical class path conflict I was used to dealing with.
February 10th, 2012, 06:48 PM
Yeah that sounds like no fun. I think I'll go with using the Open XML SDK. I've been reading MSDN's website and it looks like it can do everything I need (find/replace placeholders, and the ability to check checkboxes). Even Aspose.Words is based off it (albeit, it makes it simpler).
CFObject can interact with .Net but I'm not sure if I'll have to do all the coding in C#, then load up the class files inside ColdFusion. I never done anything like this before, but I'm out of options.
February 11th, 2012, 08:13 PM
I am not sure. Depends on the code. In theory you can invoke most .net code from CF. But there are some exceptions.
Yep. Since the later *.docx format is basically just a zip file theoretically you could do it yourself with just file, string and xml functions. But since the overall schema is so incredibly complicated using a wrapper library makes more sense.