jPDFProcess

Code Sample: Getting links from a PDF

posted Sep 24, 2013, 11:52 AM by Leila Holmann   [ updated Dec 5, 2013, 7:00 AM ]

Links are saved as part of the annotations in a PDF document even though they are not really considered annotations.

Links in a PDF document do not specifically have a destination.  Instead, they are more general, they hold a list of actions that can really be anything.

We define actions that we support in the com.qoppa.android.pdfViewer.actions package.  You can get the list of actions for an annotation by calling:

    annot.getActions();

 
This returns a List object that will hold objects that extend

    com.qoppa.android.pdfViewer.actions.Action

Most inks will normally have a single action in the list and it will be one of these type:

  •   GotoPageAction - Go to a page within the same document
  •   GotoPageRemoteAction - Go to a page in another document
  •   URLAction - Open a web browser to a URL

Each of these classes has methods to get to the relevant data. i.e. the URLAction class has a method called getURL() that returns the target URL.

The java code sample below shows how to loop through the annotations in a PDF document and print out the URL for all links found.

import com.qoppa.pdf.annotations.Annotation;
import com.qoppa.pdf.annotations.Link;
import com.qoppa.pdfProcess.PDFDocument;
import com.qoppa.pdfProcess.PDFPage;
import com.qoppa.pdfViewer.actions.Action;
import com.qoppa.pdfViewer.actions.URLAction;

...

            // Load the document
            PDFDocument pdfDoc = new PDFDocument ("input.pdf", null);

            // Loop through pages
            for(int count = 0; count < pdfDoc.getPageCount(); count++)
            {
           
                PDFPage page = pdfDoc.getPage(count);
                if(page != null)
                {
                    // Get list of annotations
                    Vector annots = page.getAnnotations();
           
                    // Write the annotations
                    if(annots != null)
                    {
                        for (int aCount = 0; aCount < annots.size(); ++aCount )
                        {
                            Annotation annot = (Annotation) annots.get (aCount);
                            if(annot != null)
                            {
                                if((annot instanceof Link) == false)
                                {
                                    java.util.List actions = ((Link) annot).getActions();
                                    if(actions != null)
                                    {
                                        for (int j = 0; count < actions.size(); ++j)
                                        {
                                            Action action = (Action)actions.get (j);
                                            if (action instanceof URLAction)
                                            {
                                                String url = ((URLAction)action).getURL();
                                                System.out.println(url);
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
           }


          

How to add OCR to jPDFProcess

posted Aug 12, 2013, 8:08 AM by Leila Holmann   [ updated Nov 12, 2013, 6:24 PM ]

As of version 2013R2, jPDFProcess, Qoppa's java PDF creation and manipulation library, has an optional OCR function available. Please contact us regarding licensing this additional feature.

How to Activate / Implement OCR

To get started, you can download:

  • the latest jPDFProcess version from our standard download page:
  • the JNI native bridge files from here:
http://www.qoppa.com/files/pdfprocess/ocr/libtessjni.zip

The JNI zip file contains the native libraries builds for Windows, Linux and Mac OS X, all in 32 and 64 bits. At runtime, these native libraries will  need to be in the machine that is running the software.

If you are running in an application, you can bundle the native libraries in your installation.  If you are running in an applet, you probably want to get these files from a server on demand:  When the user chooses to use OCR, you can have the applet download the appropriate file for the OS and bitness to a local folder from your web server.

  • The OCR language files from here:

The language zip file contains language files for English, German, French, Spanish and Italian. The files inside the zip file are directly from the Tesseract project site, they are archive files for each of the languages which you will need to un-compress so that jPDFProcess can use them.

You only need to have the language files for the languages that you want to support in the local machine.  You should also probably install these on demand by having the applet download the files from your server when necessary.


To activate the OCR functionality, call OCRBridge.initialize() with the path to these directories.

    OCRBridge.initialize(String libraryPath, String dataPath);
  • libraryPath is the path to the folder where the native libraries are located
  • dataPath is the path to the folder where the OCR language files (uncompressed)
You can then make calls to it and feed the results to jPDFProcess.

TessJNI ocr = new TessJNI();
for (int count = 0; count < pdf.getPageCount(); ++count)
{
      String pageOCR = ocr.performOCR("eng", pdf.getPage(count), 300);
      pdf.getPage(count).insert_hOCR(pageOCR, true);
}



Download OCR sample that shows this.

Additional languages can be downloaded from http://code.google.com/p/tesseract-ocr/downloads/list. We currently only support Latin languages so you may download Dutch and Portuguese in addition to the languages above.

Additional Languages

Additional languages can be downloaded from http://code.google.com/p/tesseract-ocr/downloads/list. We currently only support Latin languages so you may download Dutch and Portuguese in addition to the languages above.

Extract the archives and place all files for a language in the “tessdata” directory. Add entries to languages.xml to convert the language prefix in the language combo box.



Related Post:

Qoppa's Java PDF OCR Solution 


Streaming large PDF documents from java servlet one page at at a time

posted Jun 7, 2012, 2:22 PM by Leila Holmann   [ updated Jun 7, 2012, 2:27 PM ]

Q: My primary goal from this library is to allow the PDF to stream from java servlet to jsp page. Many of the  pdfs are large document and contain tiffs. It takes a while for the user to see anything display on the screen. Is there a way to bring back the 1st pages of the pdf & immediately display them to the user screen WHILE the rest of the PDF loads ?

A:
Yes, you can do this with our PDF library jPDFProcess:  You will just need to tell jPDFProcess to save the document as a "linearized" document.  Linearized documents contain all the information to render the first page of a PDF document at the top of the file.  If the end user is opening the document with Adobe Reader in the browser, Reader will display the first page as soon as it downloads its data, and then continue to download the rest of the document in the background.

To do this, instead of calling one of the saveDocument() methods, you will need to call the saveDocumentLinearized() method:               

PDFDocument.saveDocumentLinearized (OutputStream outStream)

Read more information about PDF linearization.

PDF overlay and imposition

posted Mar 27, 2012, 10:49 AM by Leila Holmann   [ updated Mar 27, 2012, 11:22 AM ]

Q:How can I generate a PDF document from overlaying one pdf document on another? We're trying to overlay our company's letterhead onto existing documents.

A:Our Java PDF library jPDFProcess can overlay one document on top of another. 

The function to look at is called appendPageContent and is found at the page level.

Note that the same function can be used for many purposes:

  • overlay one page on top of another

  • but also draw or impose multiple pages on a big page  (this process is also called imposition in the pre-press / print industry).


The following lines of code illustrate how to load two documents, overlay the first page from one document onto the first page of the first document, and then save the resulting document:

            PDFDocument pdf1 = new PDFDocument("input1.pdf", null);

            PDFDocument pdf2 = new PDFDocument("input2.pdf", null);


            PDFPage page1 = pdf1.getPage(0);

            PDFPage page2 = pdf2.getPage(0);

           

            page1.appendPageContent(page2, 0, 0, 1, 1);

           

            pdf1.saveDocument("output.pdf");

 

The arguments are the x, y positions on the target page and the x and y scales.  You can find the details in the Javadoc documentation.

Linearizing existing PDF documents

posted Mar 19, 2012, 12:04 PM by Leila Holmann   [ updated Oct 25, 2013, 8:39 AM ]

Q: Can your Java PDF library, jPDFProcess, create linearized PDF documents / save existing PDF documents as linearized?

A: Yes, as of version 4.70, released in March 2012, jPDFProcess can create linearize PDF documents. 

To linearize a PDF document with jPDFProcess, 2 simple lines of code do the trick:

PDFDocument myPDF= new PDFDocument(“file.pdf”);
myPDF
.saveDocumentLinearized(new FileOutputStream(“linearizedfile.pdf”));

What is PDF linearization and why are PDF documents linearized?

A linearized PDF, also called sometimes "Web Optimized" or "Fast Web View" enabled PDF, is a PDF file that has all the objects ordered in a specific way and with a couple of additional special objects added. The linearized PDF format is completely compatible with the regular pdf format, and a viewer does not need to know anything about linearization to process a linearized PDF .

The purpose of linearization is so that a viewer that does understand the linearization format can display the first page in the document as quickly as possible over a potentially slow network connection and then to subsequently jump to any other individual page requested by the user as quickly as possible, without ever having to download data that is only required for other pages.

A linearized PDF file starts with a linearization dictionary, a cross reference table for all of the first page objects, a special PDF stream object called the hint stream,  and then  all of the objects needed to render the first page. After that all of the objects for all the other pages appear in the file grouped by the page they belong to or, if used by more than one page, in the shared object group. And finally any objects not necessarily needed for page rendering appear followed by a cross reference table for the non-first page objects.

The hint stream is a compact table that can tell a linearization aware viewer which objects are required for any one page, and the file offsets of each of those objects. That way the viewer need only download the first part of the file up to the end of the hint stream, and then send download requests for specific file segments to a web server to be able to display any other page the user may wish to view (e.g. by following a link in the bookmarks, etc.)

Code Sample: Create PDF with embedded files

posted Jan 26, 2012, 8:10 AM by Leila Holmann   [ updated Feb 28, 2012, 7:48 AM ]

This java program create a single page PDF document and adds 3 embedded files within the PDF document as file attachments using http://www.qoppa.com/pdfprocess.

http://www.qoppa.com/pdfprocess/guide/sourcesamples/FileAttachments.java

jPDFProcess java API

posted Nov 10, 2011, 7:20 AM by Leila Holmann   [ updated Jul 19, 2013, 9:39 AM ]

Q: Where can I find jPDFProcess javadoc API?

A: You can find the API specification for the latest version of our library jPDFProcess on our website at this link. jPDFProcess is a java library to modify and manipulate PDF documents in Java.

Add a Table of Content (TOC) to a PDF document with Java

posted Aug 23, 2011, 12:56 PM by Leila Holmann

This Java program creates a table of contents at the head of the document using Qoppa's PDF library jPDFProcess. This sample adds a page at the top of the document and then adds a page title and a link to every page in the document.

Click here to view the java code.

Code Sample: Add a bookmark for every page of a PDF document with java

posted Aug 23, 2011, 12:52 PM by Leila Holmann

This Java programs adds a bookmark for every page in a PDF document using Qoppa's library jPDFProcess.

Click here to view the java code.


Code Sample: Flatten PDF Interactive Form Fields in Java

posted Aug 23, 2011, 12:44 PM by Leila Holmann   [ updated Aug 23, 2011, 12:52 PM ]

This Java program "flattens" field data into the PDF content layer using Qoppa's PDF library jPDFProcess.

This means that the field contents will become part of the PDF content and so the document will not be editable anymore.

Click here to view java code.

jPDFProcess supports many PDF functions to work with PDF documents within Java. If you're only looking to work with interactive PDF Forms, please look at our library jPDFFields.


1-10 of 14