November 23rd, 2011, 04:10 AM
Cfdocument Word to pdf missing text
I'm having a problem converting WORD files to PDF files.
Sometimes in some .doc files after the conversion is done there is some text missing in the PDF. As far as I've noticed it's usually in the bottom of the page.
Also there is a problem with the text overlapping the document's footer.
After doing some research it turned out the problem existed since CF7 and there was a hotfix but for some reason the problem reappeared in CF8 and CF9.
There is even an open bug report on Adobe without any solution.
I'm posting some of the articles and posts I've found to help anyone not familiar with the problem understand it better.
So my questions are:
1. is this bug really my problem or is anyone sure it's 100% fixed and i should look somewhere else for a solution.
2.if the bug is not fixed then is there a way of converting Word files to PDF without the use of cfdocument?
November 23rd, 2011, 02:00 PM
CFDOCUMENT uses Apache POI to perform the conversion, so if there is a problem it would probably be in that library rather than CF itself. Generating PDFs can be tricky and almost all problems end up being related to the way the elements are styled or positioned. Is there anything unusual about the styling of this Word document? And are you using CF9 with the latest updates applied?
November 23rd, 2011, 06:09 PM
... Also the ability to convert MS Word documents to pdf was only introduced in CF9 (and utilizes OpenOffice for the conversion ). So I have a feeling you are talking about converting HTML to pdf format, not MS Word. Can you clarify?
Originally Posted by Abnaxus
Last edited by cfSearching; November 23rd, 2011 at 06:13 PM.
November 24th, 2011, 01:04 AM
Sorry guys i had some forgot password issues and even thought i tried a couple of times it refuses to sent me a new one.
so i created a new account.
The only reason i mentioned 7 was the post in the forum i linked in my first post
i haven't used Coldfusion's previous version so i took the guys post as true.
HTML to PDF works fine and I'm using it at another part of the same site.
I am talking about straight Word to PDF conversion.
Well there are some tables,footers,headers in the word files.
Since they are for a client i won't have any way of checking what they try to convert.
Since i can't link to the client's files I'll try to create a word file to replicate the problem and post it here along with the pdf output.
it's not easy since it doesn't happen in every word document they try to convert.
November 24th, 2011, 02:05 AM
ok guys i've managed to reproduce the problem
Original Word File
w w w.mediafire.com/?m1t4se5dz1c67e6
w w w.mediafire.com/?z36fi20ywfvkcw5
It seems the problem occurs when you have a table inside the footer and a table in the content that stretches in more that one page.
This cause some of the table content to disappear and sometimes the content table to overlap the footer table
November 24th, 2011, 03:02 PM
It may be true, but my reason for asking was that cfdocument uses a different technology for html => pdf conversions than for ms word => pdf. The former uses adobe and iText libraries and the latter uses OpenOffice. So you are probably encountering a different issue altogether. Have you checked the OpenOffice site to see if this is a known OOO issue?
Originally Posted by Abnaxus78
November 25th, 2011, 02:10 PM
It's also totally possible that, given the more complex formatting in the Word document, you may just be running into a limitation of POI or iText. It obviously works well in most situations, but since these aren't Microsoft products and are essentially reverse-engineering the Word document, it wouldn't surprise me to find cases that will trip it up.
Unfortunately, if you can't tweak the Word doc's formatting, the only other option would be to pull in the very latest versions of the POI and/or iText jars and try to manipulate them yourself. But it probably wouldn't be easy and will definitely give you a new perspective on just how much CF is doing under the hood to simplify interaction with those libraries.
Either way you could also create a new issue in the CF bug tracker and include these two files, which might help the CF engineers see if they can come up with a solution that could be included in CF 10.
Originally Posted by Abnaxus78
November 27th, 2011, 01:56 PM
Actually CF uses OpenOffice for MS Word conversions, not POI.
Originally Posted by kiteless
November 28th, 2011, 01:34 AM
Ah, right, I forgot that POI doesn't support Word very well for some reason (maybe the difficulty in getting it to work well in a range of situations).
November 28th, 2011, 01:47 AM
ok guys thanks for the replies.
i knew it was gonna be hard to find a solution. it always is when a problem occurs only sometimes.
and you're right since i don't have any access to the way the client will format the Word files then my hands are tight.
Thanks for your help.
I'll get back to you in case i manage to find a solution or a different way of doing what i want to do.
November 30th, 2011, 07:13 AM
Guys another question.
Is there a way to make coldfusion run an application installed on the server in order to sent commands ( like command prompt commands) ?
In other words can i run a different converter program through coldfusion and send to it some parameters?.
( source file,destination .etc) so that i get the same result as cfdocument?
If there is such a capability then which program would you recommend i use?
I've taken a look at http://w w w.verypdf.com/pdfcamp/convert-doc-pdf.html but i couldn't get it to work even straight from the command prompt since it kept telling me it's a trial version without giving me the opportunity to create a single file in order to see if the output is correct
November 30th, 2011, 08:36 AM
November 30th, 2011, 08:04 PM
Checking the CF bug database first is always a good idea. But afaik, CF just passes the Word conversion request off to OpenOffice. Since it is an external program, you are far more likely to find issues/patches/updates on the OpenOffice site itself.
Originally Posted by Abnaxus78
Last edited by cfSearching; November 30th, 2011 at 08:25 PM.