Dr.-Ing. Syed Saqib Bukhari
Senior Researcher
Email: saqib.bukhari@dfki.de


Multimedia Analysis and Data Mining (MADM) Reserch Group

Computer Science Department
German Research Center for Artificial Intelligence - DFKI
Room 3.05
Trippstadter Straße 122, 67663, Kaiserslautern, Germany. ++49631205753760

Last updated: July 3rd, 2015

Short Curriculum Vitae:

Commercial and Academic Experience:   
  • [Internship: August 2011 - January 2012]: Social Innovation Group of Computer and Communications Innovation Research Labs (CCIL), NEC, Japan

Research Interests:
  • General research interests: Digital Image Processing, Pattern Recognition and Machine Learning, Document Image Analysis, and Geographic Information System (GIS) Processing.
  • Current Industrial research areas: business forms classification, text/email classification, mobile document image processing.
  • Current Academic research areas: generic layout analysis of a diverse collection of document images, optical character recognition (OCR) of historical manuscripts, hand-drawn architectural drawing analysis, cadastrial maps processing.
  • Published more than 30 scientific papers (Journals, Conferences, Workshops, Book Chapters) in the area of Document Image Processing. Won two best papers awards!

The Best Papers Awards:
  • DAS 2014 The Best Paper Award: "Business Forms Classification using Earth Mover's Distance" [reference]
  • ICFHR 2012 The Best Student Paper Award: "Layout Analysis for Arabic Historical Document Images Using Machine Learning"
  • Interview in IAPR (International Association of Pattern Recognition) newsletter: "IAPR...The Next Generation" [link]
  • Article on my project "anyOCR" at DFKI in "DIE RHEINPFALZ" newspaper [Link]


Journal Papers: Book Chapters:
Conference Papers:
  • [31] Syed Saqib Bukhari, Andreas Dengel, "Visual Appearance based Document Classification Methods: Performance Evaluation and Benchmarking", 13th International Conference on Document Analysis and Recognition, ICDAR'15. Tunisia, August 2015. [accepted for Oral presentation]
  • [30] Johannes Bayer, Syed Saqib Bukhari, Andreas Dengel, Christoph Langenhan, Klaus-Dieter Althoff, Frank Petzold and Marcus Eichenberger-Liwicki, "Migrating the Classical Pen-and-Paper based Conceptual Sketching of Architecture Plans Towards Computer Tools - Prototype Design and Evaluation", 11th IAPR International Workshop on Graphics Recognition - GREC'15, Tunisia, August 2015.
  • [29] Seyyed Saleh Mozafari Chanijani, Syed Saqib Bukhari, Andreas Dengel, "Analysis if Text Layout Quality Using Wearable Eye Trackers", Te First International Workshop on Wearable and Ego-vision Systems for Augmented Experience (WEsAX'15), in conjunction with the IEEE International Conference on Multimedia & Expo (ICME 2015), Turin, Italy, 2015.
  • [27] Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breuel, "Towards Generic Text-Line Extraction", 12th International Conference on Document Analysis and Recognition, ICDAR'13. Washungton, DC, USA, August 2013.

Conference/Invited Talks and Presentations:

  • Kallimachos: OCR tools for Historical (Latin/Fraktur/French.. scripts) Documents [Project Description]
  • OCRopus OCR System: The OCRopus project is an on-going effort to create a high-performance OCR system for both printed and handwritten text, and to develop novel and robust algorithms for document image preprocessing, page segmentation, text recognition, and statistical language modeling. http://code.google.com/p/ocropus/
  • DECAPOD:  Decapod project will be an inexpensive attaché case sized hardware/software solution that can be readily procured and assembled and taken into the stacks or out into the field by local staff or volunteers to quickly and unobtrusively capture the material and deliver it in usable format. http://sites.google.com/site/decapodproject/

We have developed a new dataset (the IUPR dataset) of camera-captured document images. As compared to the previous (DFKI-I CBDAR 2007) dataset, the new dataset contains images from different varieties of technical and non-technical books with more challenging problems, like different types of layouts, large variety of curl, wide range of perspective distortions, and high to low resolutions. The dataset contains ground-truth information for text-lines, text-zone, and zone-type, dewarped images (scanned documents), and ASCII text for all documents. The new dataset will help research community to develop robust camera-captured document processing algorithms in order to solve the challenging problems in the dataset and to compare different methods on a common ground. For more details, please refer to paper [14]. [download]

Extracurricular Activities:
  • President Pakistan Student Association (PSA) Technical University Kaiserslautern, Germany (January 2012 - December 2012) (http://www.uni-kl.de/psa/about.php)

  • An active member of advisory board in Islamishce Zentrun Kaiserslautern (IZK), Kaiserslautern, Germany (2010 - To-Date) (http://izkl.de/)
  • "Khateeb" in Muslim Students Group (MSG), Technical University Kaiserslautern, Germany (2009 - to-Date) (http://www.uni-kl.de/msg/)
  • "Khateeb" in Muslim Students Group, Nara Institute of Science and Technology (NIST), Japan (August 2011 - 2012 January) (http://www.naist.jp/en/)
  •  Active participation in Inter-Religious and Inter-Cultural activities
  • Traveling (so far explored Japan, Singapore, Indonesia, China, Saudi Arabia, Syria, Turkey, Germany, Spain, Italy, France, Switzerland, Holland, Ireland and Egypt... and wish to explore the rest of the world!)

