Sunday, April 6, 2014

Creating paginated table with PDFBox

I've recently needed to modify a file generation that was offered in CSV files to PDF format. I'll enumerate some of challenges on doing that:

Finding the right API to do the heavy work
PDF is a complex format created to display documents. It supports texts, graphics and a whole set of features, and has an 747 pages specification that can be bought here: ISO 32000-1 specification. One does not simply start writing to a PDF file, as unlike a text file, they usually contain non-ASCII binary characters and should always be considered as binary files.

Using an off the shelf API can greatly reduce the burden of trying to do that kind of file creation or modification manually. After some quick research I've realized that most o the available libs on the web were paid. The most used standard is iText, which needs a commercial license if being used for commercial purposes (http://itextpdf.com/salesfaq) The best free solution I found was PDFBox, which immediately drew my attention for being an Apache project. It is currently on the 1.8.4 release, is stable and has a fairly extensive amount of documentation on it's website and forums.

Now the thing is, I went through mailing lists and documentation and it doesn't come with any ready-made feature for tables generation. That requires for the developer to handle the drawing of the table's columns and rows. The following code performs that task and also handles paginating the table to multiple pages in case it doesn't fit.

Output sample:


I've meant to do just an essay here, so there are many optimizations to be done, but you can get some ideas from it to adapt to your needs. (full working code: https://github.com/eduardohl/Paginated-PDFBox-Table-Sample)


10 comments:

  1. Saved me a lot of time, thank you very much indeed!

    ReplyDelete
  2. Why not create it as an opensource plugin to be used as a dependency? I may be using it soon and probably do some optimizations that if worth it and I am able to contribute back I will

    ReplyDelete
    Replies
    1. Hi,

      Unfortunately I don't have available time to dedicate to this project, but you can always fork it and commit to it at https://github.com/eduardohl/Paginated-PDFBox-Table-Sample

      Also, feel free to copy or branch it if you feel like it.

      By the way, if you want to collaborate with the official project: https://pdfbox.apache.org/

      Delete
  3. can u have any idea if the text what we are using in the content can be wrap??? means if the content is much than the row width then how to set the content into it?

    ReplyDelete
    Replies
    1. I'm pretty sure it's doable, all you really need to do is iterate and replace de y-axis pointers down.

      Delete
  4. i have tried to implement it but some how it wasn't work for me do u have any idea than please give me one example

    ReplyDelete
  5. Thank you very much for your code! It helped me a lot. Doing the LineWrapping was fiddling for me, but considering your original code it as possible for me.

    ReplyDelete
  6. Hi

    I downloaded code from github, I get below exception when I tried running the code. Suggest

    Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.pdfbox.pdmodel.graphics.color.PDDeviceGray.()V from class org.apache.pdfbox.pdmodel.edit.PDPageContentStream

    ReplyDelete
  7. Ah, I see, so "Table" is your custom data type. For people getting errors, maybe you didn't created all necessary classes for this code to work? Here are all necessary classes, following the given github link from Eduardo given at the end of above article:
    https://github.com/eduardohl/Paginated-PDFBox-Table-Sample/tree/master/src/pdftablesample

    ReplyDelete