#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    1
    Rep Power
    0

    Perl PDF Convertor Need Help


    Need some help. We built a web application in pearl online, and the output generation right now is in JPEG or PNG, we need to get the final result to a 300 dpi print ready PDF. If anyone has done this or know how, please email me.
    Last edited by Atomic Solution; July 11th, 2009 at 02:15 PM. Reason: typo
  2. #2
  3. wizard
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    The Great White North
    Posts
    83
    Rep Power
    142
    The module to use for creating and manipulating PDFs is PDF::API2 available from CPAN.

    There are some auxiliary modules for it:

    There is a Yahoo! group, perl-text-pdf-modules, for discussions of PDF::API2 and related modules.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Posts
    147
    Rep Power
    125
    The PDF:API2 is really good, but it's not easy and you would have to construct the PDF from scratch.

    The others like report (and I think table too) use PDF::API2 but they haven't been updated (and there has been a bunch of updates to PDF:API2 since). PDF::API2::Simple is pretty good, but I don't know if it will convert a image.

    I went though this myself because I wanted the PDF to be formatted a specific way and the easy thing for me was to output HTML and then use HTMLDoc to convert to PDF (http://www.htmldoc.org/), I've been pretty impressed at how easy and good htmldoc works.

    Comments on this post

    • keath agrees
  6. #4
  7. wizard
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    The Great White North
    Posts
    83
    Rep Power
    142
    Originally Posted by Analog
    The others like report (and I think table too) use PDF::API2 but they haven't been updated (and there has been a bunch of updates to PDF:API2 since). PDF::API2::Simple is pretty good, but I don't know if it will convert a image.
    PDF::API2::Simple is built on top of PDF::API2 and gives access to the PDF::API2 object via its pdf() method. See the documentation for details.

    Comments on this post

    • keath agrees
  8. #5
  9. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,274
    Rep Power
    0
    PDF-API2-Simple does look like a big improvement over some of the other PDF modules I had tried in the past.

    There was a recent thread where the question was about the best way to split lines for PDF output. It was really a question about typesetting, because the basic PDF modules allow you to write to a page, but the typesetting problem remains for each programmer to solve.

    The problem is complicated. How big is a line? Depends on the font size, kerning and tracking. Where's the best place to split a line? What are the proper places to split a word? In order to really fill a page with text, the line spacing should be adjusted as well. The problem in many cases is not creating a PDF, but doing quality typesetting.

    Once I recognized that, I made up my mind to learn about LaTeX , which had solved that problem about 25 years ago. I've only been using it a short while, but getting quality results quickly is not hard.

    It's a markup language, and though there are GUI tools available for LaTeX to help build pages, at its heart it's a command line tool you feed a text file to, and receive PDF output. This works with perl without needing a special LaTeX module (though there are a few). You can mark up a document template, add your text, and hand it off to the system to process.

    LaTeX scales up to book size documents, but it also has built in classes for articles and letters. You choose a document class, mark some parts up with sectioning commands if desired, and then just pour your text in. It will handle the layout.

    There are several TeX engines available. I am using XeTeX , because it is able to use the native fonts on my system. Another nice thing about XeTeX is that most of the command line tools (XeLaTeX) output PDF directly (some other engines output postscript by default).

    This is a really excellent tutorial site: Getting to grips with LaTeX

    LaTeX has a learning curve, no doubt. The basics are easy, it's just that it has so much more power like internal-references, bibliographies, table of content generation, etc. I am satisfied with the basics for now, for simple reports and the like, and will grow in knowledge as the need arises.

    One of the neat things about Tex is the CTAN . Yup, it's a module repository for TeX, modelled after our own CPAN. You'll find additional document classes, alternate table environments, fonts, etc.

    I agree that something like HTMLDoc that Analog mentioned, might be more along the lines of what Atomic Solution is looking for
    (though it is a separate component; therefore not truly atomic).

    Anyway, just wanted to throw this out there again because you won't beat the quality of the output, and it's free.

    If you don't want to bother learning to write your own LaTeX templates, google 'html to latex'.
    Last edited by keath; July 12th, 2009 at 12:47 PM.
  10. #6
  11. wizard
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    The Great White North
    Posts
    83
    Rep Power
    142
    Yes, PDF::API2 is not a word processor.

    Some of the issues you raised have been answered by PDF::TextBlock. It uses in-line markup similar to XML to change style within a paragraph. It only allows one level of nesting and its documentation is sparse.

    I have write a module PDF::Kit that allows unbounded nested but it uses nested data structures; something most Perl programmers regard as a black art. It is EXPERIMENTAL, that means subject to change without notice. When I get its documentation completed to my satisfaction, I'll put it in CPAN.

    It is available for experimentation at http://sites.google.com/site/shawnhcorey/Home/downloads/PDF-Kit.tar.gz?attredirects=0

    It comes with a script, eg_format_paragraph.pl, as an example on how to use it.

    ###

    What the OP has not made clear is whether this process is fully automatic or some of it is done by hand. An automatic process can complete with a human for creating an aesthetically-pleasing layout since they can't see and have no taste. If it's done by hand, output the text and use something like Scribus to do the layout.
  12. #7
  13. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Location
    Atlanta, GA
    Posts
    44
    Rep Power
    10
    Originally Posted by keath
    Once I recognized that, I made up my mind to learn about LaTeX , which had solved that problem about 25 years ago. I've only been using it a short while, but getting quality results quickly is not hard....
    So Im curious if you would recommend this as an alternate report format for my web based perl-CGI application. I currently use an Excel module for reports. The customer likes this because they can directly copy the report data in a commonly used format. I do however have reports which lend themselves to a simple presentation format. I dont want to waste company dev time learning something that wont help. I have another web application which does use PDF format. So how difficult would it be to take my html presented data and send it to a pdf?

    JOhn
  14. #8
  15. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,274
    Rep Power
    0
    I probably wouldn't use it in the case of the Excel report, though that is possible. I've never actually tried the html to latex scripts either. (There are many.) I've looked at some examples.

    I'm not thinking in terms of taking HTML, and running a converter on it in order to get a printed facsimile of the page. Users could just print the browser window and get that much. I mean to create custom templates for printed reports and use those instead of the HTML.

    I'm only suggesting LaTeX as an approach to solve a few problems. First, it flows text wonderfully, so it handles layout difficulties for text-heavy documents, like letters.

    Another good source of introductory information is at Wikibooks. Checkout the letter page there:
    http://en.wikibooks.org/wiki/LaTeX/Letters

    It should give you an idea that you could have a simple template, and just add your text or variable target data, and produce form letters, envelopes, etc.

    So it solves the problem of text flow, sectioning, page breaks, etc; in a much easier way than you will find in the current PDF modules. Another thing I like about it, is XeTeX is one of the few open-source projects I've seen that allows access to the native font capabilities on my OS.

    For those curious, checkout the variants on page 19 of the fontspec.pdf available here: fontspec for an example. I'm not a graphics designer. It's not all that likely that I'll create anything beautiful using that package. I do appreciate that it is available. Maybe someday the agency will assign a graphics designer to work with me and create a nice header I can include at the top of my reports.

    Whether LaTeX is worth it for your company, I can't answer. What I would recommend for those interested is trying it out for yourself. I installed it at home. I wrote a few reports on the weekend using LaTeX as markup; and submitted those to work. I rewrote my resume using LaTeX, and submitted a bid package for promotion where I created all the documents the same way.

    I'm enjoying learning and using it in the same way perl is fun for me. I no longer use Microsoft Word at all.

IMN logo majestic logo threadwatch logo seochat tools logo