Working with PDFs Using Command Line Tools in Linux
PDF data extraction on Linux
How to extract all text from PDFs (including on images) [using a combination of Ghostscript and a command line OCR tool called tesseract-ocr]