Sunday, April 6, 2014

Creating paginated table with PDFBox

I've recently needed to modify a file generation that was offered in CSV files to PDF format. I'll enumerate some of challenges on doing that:

Finding the right API to do the heavy work
PDF is a complex format created to display documents. It supports texts, graphics and a whole set of features, and has an 747 pages specification that can be bought here: ISO 32000-1 specification. One does not simply start writing to a PDF file, as unlike a text file, they usually contain non-ASCII binary characters and should always be considered as binary files.

Using an off the shelf API can greatly reduce the burden of trying to do that kind of file creation or modification manually. After some quick research I've realized that most o the available libs on the web were paid. The most used standard is iText, which needs a commercial license if being used for commercial purposes (http://itextpdf.com/salesfaq) The best free solution I found was PDFBox, which immediately drew my attention for being an Apache project. It is currently on the 1.8.4 release, is stable and has a fairly extensive amount of documentation on it's website and forums.

Now the thing is, I went through mailing lists and documentation and it doesn't come with any ready-made feature for tables generation. That requires for the developer to handle the drawing of the table's columns and rows. The following code performs that task and also handles paginating the table to multiple pages in case it doesn't fit.

Output sample:


I've meant to do just an essay here, so there are many optimizations to be done, but you can get some ideas from it to adapt to your needs. (full working code: https://github.com/eduardohl/Paginated-PDFBox-Table-Sample)


package littleproject;
import java.io.IOException;
import java.util.Arrays;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
public class PDFTableGenerator {
// Generates document from Table object
public void generatePDF(Table table) throws IOException, COSVisitorException {
PDDocument doc = null;
try {
doc = new PDDocument();
drawTable(doc, table);
doc.save("sample.pdf");
} finally {
if (doc != null) {
doc.close();
}
}
}
// Configures basic setup for the table and draws it page by page
public void drawTable(PDDocument doc, Table table) throws IOException {
// Calculate pagination
Integer rowsPerPage = new Double(Math.floor(table.getHeight() / table.getRowHeight())).intValue() - 1; // subtract
Integer numberOfPages = new Double(Math.ceil(table.getNumberOfRows().floatValue() / rowsPerPage)).intValue();
// Generate each page, get the content and draw it
for (int pageCount = 0; pageCount < numberOfPages; pageCount++) {
PDPage page = generatePage(doc, table);
PDPageContentStream contentStream = generateContentStream(doc, page, table);
String[][] currentPageContent = getContentForCurrentPage(table, rowsPerPage, pageCount);
drawCurrentPage(table, currentPageContent, contentStream);
}
}
// Draws current page table grid and border lines and content
private void drawCurrentPage(Table table, String[][] currentPageContent, PDPageContentStream contentStream)
throws IOException {
float tableTopY = table.isLandscape() ? table.getPageSize().getWidth() - table.getMargin() : table.getPageSize().getHeight() - table.getMargin();
// Draws grid and borders
drawTableGrid(table, currentPageContent, contentStream, tableTopY);
// Position cursor to start drawing content
float nextTextX = table.getMargin() + table.getCellMargin();
// Calculate center alignment for text in cell considering font height
float nextTextY = tableTopY - (table.getRowHeight() / 2)
- ((table.getTextFont().getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * table.getFontSize()) / 4);
// Write column headers
writeContentLine(table.getColumnsNamesAsArray(), contentStream, nextTextX, nextTextY, table);
nextTextY -= table.getRowHeight();
nextTextX = table.getMargin() + table.getCellMargin();
// Write content
for (int i = 0; i < currentPageContent.length; i++) {
writeContentLine(currentPageContent[i], contentStream, nextTextX, nextTextY, table);
nextTextY -= table.getRowHeight();
nextTextX = table.getMargin() + table.getCellMargin();
}
contentStream.close();
}
// Writes the content for one line
private void writeContentLine(String[] lineContent, PDPageContentStream contentStream, float nextTextX, float nextTextY,
Table table) throws IOException {
for (int i = 0; i < table.getNumberOfColumns(); i++) {
String text = lineContent[i];
contentStream.beginText();
contentStream.moveTextPositionByAmount(nextTextX, nextTextY);
contentStream.drawString(text != null ? text : "");
contentStream.endText();
nextTextX += table.getColumns().get(i).getWidth();
}
}
private void drawTableGrid(Table table, String[][] currentPageContent, PDPageContentStream contentStream, float tableTopY)
throws IOException {
// Draw row lines
float nextY = tableTopY;
for (int i = 0; i <= currentPageContent.length + 1; i++) {
contentStream.drawLine(table.getMargin(), nextY, table.getMargin() + table.getWidth(), nextY);
nextY -= table.getRowHeight();
}
// Draw column lines
final float tableYLength = table.getRowHeight() + (table.getRowHeight() * currentPageContent.length);
final float tableBottomY = tableTopY - tableYLength;
float nextX = table.getMargin();
for (int i = 0; i < table.getNumberOfColumns(); i++) {
contentStream.drawLine(nextX, tableTopY, nextX, tableBottomY);
nextX += table.getColumns().get(i).getWidth();
}
contentStream.drawLine(nextX, tableTopY, nextX, tableBottomY);
}
private String[][] getContentForCurrentPage(Table table, Integer rowsPerPage, int pageCount) {
int startRange = pageCount * rowsPerPage;
int endRange = (pageCount * rowsPerPage) + rowsPerPage;
if (endRange > table.getNumberOfRows()) {
endRange = table.getNumberOfRows();
}
return Arrays.copyOfRange(table.getContent(), startRange, endRange);
}
private PDPage generatePage(PDDocument doc, Table table) {
PDPage page = new PDPage();
page.setMediaBox(table.getPageSize());
page.setRotation(table.isLandscape() ? 90 : 0);
doc.addPage(page);
return page;
}
private PDPageContentStream generateContentStream(PDDocument doc, PDPage page, Table table) throws IOException {
PDPageContentStream contentStream = new PDPageContentStream(doc, page, false, false);
// User transformation matrix to change the reference when drawing.
// This is necessary for the landscape position to draw correctly
if (table.isLandscape()) {
contentStream.concatenate2CTM(0, 1, -1, 0, table.getPageSize().getWidth(), 0);
}
contentStream.setFont(table.getTextFont(), table.getFontSize());
return contentStream;
}
}

10 comments:

  1. Saved me a lot of time, thank you very much indeed!

    ReplyDelete
  2. Why not create it as an opensource plugin to be used as a dependency? I may be using it soon and probably do some optimizations that if worth it and I am able to contribute back I will

    ReplyDelete
    Replies
    1. Hi,

      Unfortunately I don't have available time to dedicate to this project, but you can always fork it and commit to it at https://github.com/eduardohl/Paginated-PDFBox-Table-Sample

      Also, feel free to copy or branch it if you feel like it.

      By the way, if you want to collaborate with the official project: https://pdfbox.apache.org/

      Delete
  3. can u have any idea if the text what we are using in the content can be wrap??? means if the content is much than the row width then how to set the content into it?

    ReplyDelete
    Replies
    1. I'm pretty sure it's doable, all you really need to do is iterate and replace de y-axis pointers down.

      Delete
  4. i have tried to implement it but some how it wasn't work for me do u have any idea than please give me one example

    ReplyDelete
  5. Thank you very much for your code! It helped me a lot. Doing the LineWrapping was fiddling for me, but considering your original code it as possible for me.

    ReplyDelete
  6. Hi

    I downloaded code from github, I get below exception when I tried running the code. Suggest

    Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.pdfbox.pdmodel.graphics.color.PDDeviceGray.()V from class org.apache.pdfbox.pdmodel.edit.PDPageContentStream

    ReplyDelete
  7. Ah, I see, so "Table" is your custom data type. For people getting errors, maybe you didn't created all necessary classes for this code to work? Here are all necessary classes, following the given github link from Eduardo given at the end of above article:
    https://github.com/eduardohl/Paginated-PDFBox-Table-Sample/tree/master/src/pdftablesample

    ReplyDelete