Here we will have a brief look at the PDF Parser node. Very useful if we would like to convert PDF files into KNIME documents.
Downstream you can treat the PDFs as any other KNIME documents.
The usage of the node is very straightforward, you just indicate the location where your PDF files reside and execute it.
Below the workflow and some example document output.
In the workflow below I have used the Tika Parser node to parse a file in EPUB format. The String to Document node is needed, because the Tika parser returns the content in String format.
on the next page we will have a further look on how to clean and transform our text data