Xiaofan Lin's homepage

I am currently the Chief Scientist with Vobile Inc., a leading provider of video content identification and management services. Before that I was a Research Architect with Like.com, a visual shopping comparison company that was acquired by Google. I was a Senior Research Scientist with HP Labs, working on high-volume document digitization and analysis, digital publishing, and speech signal processing. Before that I worked on the recognition engine of the most popular OCR software OmniPage in Caere/Scansoft (now part of Nuance) after receiving my Ph.D. degree in Electronic Engineering from Tsinghua University in 1999. My graduate research became key parts in TH-OCR software (including handwritten digits table recognition, business card recognition, Chinese OCR) of Wintone Inc., a leading document recognition company in China.

In 1989, I participated in the 20th International Physics Olympiad as one of the five high school students selected from China, and I won a Third Prize in that competition.

Research Interests

  • Content-based Image/Video Retrieval and Fingerprinting
  • Algorithm Combination, Information Fusion
  • Pattern Recognition, Optical Character Recognition, Speech Recognition
  • Document Analysis 
  • Image Processing
  • Natural Language Processing

Projects in Vobile

Industry-leading MediaDNA content-based multimedia fingerprinting engine. I'm responsible for driving core technology to the next level in terms of capability, speed, accuracy, and scalability. 

Projects in Like.com

Next-generation text recognition and content-based image retrieval for consumer digital images. I've initiated and created a number of core technologies behind http://www.like.com . Designed and implemented the working prototype for both general image search and shopping image search. The feature extractor and search engine become the essential piece of Like.com.

Projects in HP Labs

Digital Publishing

Automatic layout algorithms and systems relieve the creation side bottleneck in the end-to-end digital publishing pipeline. I have been leading the design and implementation of Active Layout Engine (ALE), which supports non-rectangular text wrapping, simultaneous optimization of text block width and height, graphic element scaling, and image cropping. ALE enables the automatic adjustment and creation of high quality graphic designs with variable text, image, and graphic data. I am also working on content adaptation techniques, such as automated image cropping. An interesting solution is the Active Document Versioning system, which combines ALE with the layout/constraint extraction algorithms from Hui Chao and the Web-based user interface by Elsa Durante. This system can intelligently adjust a layout (for example, a PDF) to accommodate new text and image contents while keeping the look and feel of the original design.  

Digital Content Re-Mastering (DCRM)

This is a large-scale, distributed document analysis and recognition system. My  contribution is mainly in the area of parameter-free document structure understanding and robust multi-engine OCR solution. MIT's Cognet is hosting books and journals processed by our system. Please notice the fact that the books/journals are spitted into chapters/articles.  See our paper for the tricks. Chapter/article bookmarks and links are fully automatically done on the books without parameter tuning! We call this technology "Book Mapping." The other features of the PDFs:

  • Embedded with text generated from OCR engine. 
  • Trim adjusted to the predominant size (The original scanned pages have different size from page to page).
  • Linearized to improve browsing experience. The PDF is served incrementally so that you can view the first pages while downloading the remaining pages.

PDFs processed by "Book Mapping" are now also hosted on Internet Archive www.archive.org or www.openlibrary.org under Open Content Alliance.

Robust Automatic Speech Recognition (ASR)

It's a lot of fun to play with leading-edge ASR engines and to evaluate each engine first-hand. It's even better to beat the best one through combination technology...

VoiceSmart

Whenever you speak, the VoiceSmart system can know a lot about you. We call the process speech metadata extraction. It can infer things like gender and accents.  

Patents (Google Patents Search)

17 US patents and 20+ pending applications in image and shopping search, document understanding, speech processing, document rendering and optimization

  • Xiaofan Lin, Tong Zhang, Brian Atkins, Gary Vondran, Mei Chen, Charles A. Untulis, Stephen Philip Cheatle, Dominic Lee, System and method for producing a page using frames of a video stream, US Patent 7760956
  • Xiaofan Lin,  Automatically layout of document objects using an approximate convex function model , US Patent 7434159
  • Xiaofan Lin, Method for determining a logical structure of a document, US Patent 6907431
  • Xiaofan Lin, Hui Chao, Jian Fan, Non-rectangular image cropping methods and systems, US Patent 7151547
  • Xiaofan  Lin, Igor Boyko, System and method for combining text summarizations, US Patent 7292972
  • Xiaofan Lin, Removal of extraneous text from electronic documents, US Patent 7310773
  • Xiaofan Lin, Selective sampling for sound signal classification, US Patent 7340398  
  • Yuhong Xiong, Xiaofan Lin, James A Rowson, Apparatus and method for estimating device availability, US Patent 7342900 
  • Sherif Yacoub, Xiaofan Lin,  Steve Simske, System and method for prioritizing contacts, US Patent 7013005
  • Sherif Yacoub, Steve Simske, Xiaofan Lin, Francois Vincent, System and method for extracting demographic information, US Patent 7349527
  • Hui Chao, Xiaofan Lin, Greg Nelson, Generating a text layout boundary from a text block in an electronic document, US Patent 7555711
  • Brian C. Atkins, Xiaofan Lin, Mihaela Irina Enachescu, Constraint-based albuming of graphic elements, US Patent 7644356
  • Salih Burak Gokturk,  Baris Sumengen, Diem Vu Sumengen, Navneet Dalal,  Danny Yang, Xiaofan Lin, Azhar Khan, Munjal Shah, Dragomir Anguelov, Vincent Vanhoucke, System and method for enabling image recognition and searching of images, US Patent 7657100
  • Salih Burak Gokturk,  Baris Sumengen, Diem Vu Sumengen, Navneet Dalal,  Danny Yang, Xiaofan Lin, Azhar Khan, Munjal Shah, Dragomir Anguelov, Lorenzo Torresani, Vincent Vanhoucke, System and method for search portions of objects in images and features thereof, US Patent 7657126
  • Salih Burak Gokturk,  Baris Sumengen, Diem Vu Sumengen, Navneet Dalal,  Danny Yang, Xiaofan Lin, Azhar Khan, Munjal Shah, Dragomir Anguelov, Lorenzo Torresani, Vincent Vanhoucke, System and method for enabling image searching using manual enrichment, classification, and/or segmentation, US Patent 7660468
  • Steven Simske, John R. Burns, Xiaofan Lin, Sherif Yacoub, Email application with user voice interface, US Patent 8055713
  • Hui Chao, Menaka Indrani, Gary Vondran, Xiaofan Lin, Parag M. Joshi, Dirk M. Beyer, Brian C. Atkins, Pere Obrador, Alex Xin Zhang, Producing marketing items for a marketing campaign, US Patent 8090612

Recent Publications (Google Scholar citations)

  • Xiaoyan Lin, Liangcai Gao, Zhi Tang, Xiaofan Lin, Xuan Hu, Performance Evaluation for Mathematical Formula Identification, accepted by DAS 2012
  • Xiaoyan Lin, Liangcai Gao, Zhi Tang, Xuan Hu, Xiaofan Lin, Identification of embedded mathematical formulas in PDF documents using SVM, SPIE Conference on Document Recognition and Retrieval, San Jose, Jan 2012.
  • Xiaoyan Lin, Liangcai Gao, Zhi Tang, Xiaofan Lin, Xuan Hu, Mathematical Formula Identification in PDF Documents, ICDAR 2011, September 2011, Beijing, China.
  • Liangcai Gao, Yuan Zhong, Yingmin Tang, Zhi Tang, Xiaofan Lin, Xuan Hu, Metadata Extraction System for Chinese Books, ICDAR 2011, September 2011, Beijing, China.
  • Liangcai Gao, Zhi Tang, Xiaofan Lin, Ying Liu, Ruiheng Qiu, Yongtao Wang, Structure Extraction from PDF-based Book Documents, pp. 11-20, ACM/IEEE Joint Conference on Digital Libraries, June 2011, Ottawa, Canada.
  • Xiaofan Lin, Mobile Multimedia Understanding Applications: An Overview,  Proceedings of SPIE Conference on Imaging and Printing in a Web 2.0 World II, Jan 2011 (Invited).
  • Liangcai Gao, Zhi Tang, Jing Fang, Xiaofan Lin, Multi-page Document Analysis Based on Format Consistency and Clustering, no 4, vol 38, pp. 306-315, International Journal of Computer Applications in Technology (IJCAT).
  • S. Bock, S. Newsome, Q. Wang, W. Zeng, X. Lin, J. Lu, An iPhone Image Based Information Retrieval Application, demo session of IEEE CCNC 2010. 
  • Xiaofan Lin, Comparative Study of Content-based Image Retrieval and Video Fingerprinting,  Proceedings of SPIE Conference on Multimedia Content Access: Algorithms and Systems IV, Jan 2010.
  • Liangcai Gao, Zhi Tang, Xiaofan Lin, Xin Tao, Yimin Chu, Analysis of Book Documents’ Table of Content Based on Clustering, pp. 911-915, ICDAR, 2009.
  • Liangcai Gao, Zhi Tang, Xiaofan Lin, CEBBIP: A Parser of Bibliographic Information in Chinese Electronic Books, pp. 73-76, ACM/IEEE Joint Conference on Digital Libraries, June 2009, Texas, USA.
  • Zhi Tang, Liangcai Gao, Aixia Jia, Xiaofan Lin, XEB: A Markup Language Document Container Format Suitable for Handheld Devices, demo session of Joint Conference on Digital Libraries, June 2009, Texas, USA.
  • Liangcai Gao, Zhi Tang, Xiaofan Lin, Ruiheng Qiu, Comprehansive Global Typography Information Extraction System for Book Docuemnts, DAS 2008, Japan
  • Xiaofan Lin, Burak Gokturk, Baris Sumengen, Diem Vu, Visual Search Engine for Product Images,  CID No: 68200M, Proceedings of SPIE Conference on Multimedia Content Access: Algorithms and Systems II, Jan 2008.
  • Xiaofan Lin, Predictive Text Fitting, pp. 13-23, Proc. 6th International Symposium on Smart Graphics, 23- 25 July 2006, Vancouver, Canada (PDF available on Springer website).
  • Xiaofan Lin, Quality Assurance in High Volume Document Digitization: A Survey, pp. 312-319, Proc. 2nd International Workshop on Document Image Analysis for Libraries, Lyon, France, April 2006.
  • Elisa H. Barney Smith, H. Baird, W. Barrett, F. Le Bourgeois, X. Lin, G. Nagy and S. Simske, DIAL 2004 Working Group Report on Acquisition Quality Control, pp. 373-376, Proc. 2nd International Workshop on Document Image Analysis for Libraries, Lyon, France, April 2006.
  • Xiaofan Lin, Active Layout Engine: Algorithms and Applications in Variable Data Printing, no 5, vol 38, pp. 444-456,  Computer-Aided Design.
  • Xiaofan Lin, Active Document Layout Synthesis, pp. 86-90, Proc. ICDAR 2005, Seoul, South Korea.
  • Jian Fan, Xiaofan Lin, Steven Simske, A Comprehensive Image Processing Suite for Book Re-mastering, pp. 447-451, Proc. ICDAR 2005, Seoul, South Korea
  • Hui Chao, Xiaofan Lin, Capture the Layout of Electronic Documents for Reuse in Variable Data Printing, pp. 940-944, Proc. ICDAR 2005, Seoul, South Korea.
  • Xiaofan Lin, Hui Chao, Greg Nelson, Elsa Durante, Active Document Versioning: From Layout Understanding to Adjustment, CID No: 60670E, Proceedings of SPIE Document Recognition and Retrieval Conference XIII, San Jose, Jan 2006
  • Xiaofan LinIntelligent Content Fitting for Digital Publishing, CID No: 60760J,  Proceedings of SPIE Digital Publishing Conference, , San Jose, Jan 2006
  • G. L. Vondran, H. Chao, X. Lin, P. Joshi, D. Beyer, C. B. Atkins, P. Obrador, Automated campaign system, CID No: 607605, Proceedings of SPIE Digital Publishing Conference, San Jose, Jan 2006
  • Xiaofan Lin, Yan Xiong, Detection and Analysis of table of contents based on content association , no 2-3, vol 8, pp. 132-143, International Journal on Document Analysis and Recognition, 2006
  • Hector J. Santos-Villalobos, Xiaofan Lin, Benchmarking of Automated Image Cropping Technology, HP Technical Report, 2005
  • Xiaofan Lin, DRR Research Beyond COTS OCR Software: A Survey, SPIE Conference on Document Recognition and Retrieval XII, pp 1-9, San Jose, 2005
  • Xiaofan Lin, Steven Simske, Phoneme-less Hierarchical Accent Classification, 38th Asilomar Conference on Signals, Systems and Computers, pp 1801-1804, Pacific Grove, CA, November 2004
  • Simon M. Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong and Robert Young, Kazuki Ashida, Hiroki Nagai, Masayuki Okamoto, Hiroaki Yamamoto, Hidetoshi Miyao, JunMin Zhu, WuWen Ou, Christian Wolf, Jean-Michel Jolion, Leon Todoran, Marcel Worring, Xiaofan Lin,  ICDAR 2003 Robust Reading Competitions: Entries, Results and Future Directions, International Journal on Document Analysis and Recognition (contributed the combination of multiple text locating algorithms), vol 7, pp 105-122, 2005
  • Xiaofan Lin, Sherif Yacoub, Burns John, Steven Simske, Performance Analysis of Pattern Classifier Combination by Plurality Voting, Pattern Recognition Letters 24(12), 1959-1969, 2003
  • Xiaofan Lin, Decision Combination in Speech Metadata Extraction, Proc. 37th Asilomar Conference on Signals, Systems and Computers, p 560-564, Pacific Grove, CA, November 2003
  • Xiaofan Lin, Sherif Yacoub,  Burns John, Steven Simske, Evaluation and Combination of Automatic Speech Recognition Engines for Telephony-based Applications, HP Labs Technical Report
  • Xiaofan Lin, Text-mining Based Journal Splitting, Proc. ICDAR 2003,  pp. 1075-1079, Edinburgh, UK, August 2003
  • Xiaofan Lin, Impact of Imperfect OCR on Part-of-speech Tagging, Proc. ICDAR 2003, pp. 284-288, Edinburgh, UK, August 2003
  • Xiaofan Lin, Reliable OCR for Digital Content Re-mastering, Document Recognition and Retrieval IX, Proceedings of SPIE 4670,  p 223-231, San Jose, 2002
  • Xiaofan Lin, Header and Footer Extraction by Page-Association, Document Recognition and Retrieval X, Proceedings of SPIE 5010, p 164-171,  Santa Clara, 2003
  • Sherif Yacoub, Steven Simske, Xiaofan Lin, John Burns,  Recognition of Emotions in Interactive Voice Response Systems, Proceedings of. EuroSpeech, p 729-732, Geneva, September 2003
  • Sherif Yacoub, Xiaofan Lin, John Burns, Steven Simske, Automating the Analysis of Voting Systems, Proc. International Symposium on Software Reliability Engineering 2003, p 203-214, Denver, Colorado, November 2003 
  • Xiaofan Lin, Steven Simske, Automatic Document Navigation for Digital Content Remastering, Proc. SPIE Conference on Document Recognition and Retrieval XI, p 66-73, San Jose, 2004
  • Steven Simske, Xiaofan Lin, Creating Digital Libraries: Content Generation and Re-Mastering, Proc. International Workshop on Document Image Analysis for Libraries, p 33-45, Palo Alto, January 2004.
  • Yuhong Xiong, Xiaofan Lin, and James A. Rowson, Estimating Device Availability in Pervasive Peer-to-Peer Environment, 10th IEEE International Workshop on Future Trends in Distributed Computing Systems, p 254-260, Suzhou, China, May 2004

Earlier Publications

  • Adaptive confidence transform based classifier combination for Chinese character recognition, Pattern Recognition Letters 19(10), 975-988,1998
  • Handwritten numeral recognition using MFNN based multiexpert combination strategy, Proceedings of ICDAR'97, Ulm, Germany
  • Automatic Input System for Chinese Business Cards, Proceedings of ICCPOL'97, Hongkong
  •  Linear Regression Based Combination of Neural Classifiers, Proceedings of ICONIP'97,Dunedin, New Zealand
  • Evaluation and Application of Recognition Confidence in OCR, Proceedings of ACCV'98, Hongkong (Lecture Notes in Computer Science, 1351, Springer)
  • Consensus Network for Multi-expert Combination, Proceedings of ICNN&B'98, Beijing
  • Knowledge Extraction Based Multilayer Feedforward Neural Networks, Proceedings of ICONIP'98, Kitakyushu, Japan (coauthor)

  • Theoretical Analysis of the Confidence Metric for Nearest Neighbor Classifier, Chinese Science Bulletin 43(3), 1998

  • Combination of Independent Classifiers and Its Application in Character Recognition, Pattern Recognition and Artificial Intelligence11(3), 1998 [in Chinese]

  • A New Multiple Neural Networks Combination Method, Proceedings of    Chinese Congress on Neurocomputing Science '97(CCNS' 97), Oct. 1997, Nanjing, China [in Chinese]

  • Combination of Independent Classifiers: Model and Application, Proceedings of Annual Symposium on Information and Communication Theory, Beijing, Dec. 1997 [in Chinese]

  • Confidence Analysis in Character Recognition, Journal of Tsinghua University, Sept. 1998 [in Chinese]

Professional Activities

  • Senior Member of IEEE
  • Technical Program Committee of the 2nd International Workshop on Document Image Analysis for Libraries (DIAL 2006), ICPR 2010, Workshop on Visual Content Identification and Search at ICME 2010 (VCIDS'10) and 2011 (VCIDS’11), 2010 Conference on Document Information Processing (DIP 2010)
  • Program Committee of SPIE Conference on Document Recognition and Retrieval (DRR)  since 2005,  and Conference Co-chair of DRR 2006 and 2007.
  • Program Committee of SPIE Conference on Imaging and Printing in a Web 2.0 World III
  • Reviewer of "IEEE Transactions on Image Processing", "IEEE Transactions on Circuits and Systems for Video Technology", "IEEE Multimedia", "Pattern Recognition Letters", "International Journal on Document Analysis and Recognition", " Signal, Image and Video Processing", etc.

I can be reached at xiaofan dot lin at ieee dot org.


Sign in  |  Recent Site Activity  |  Terms  |  Report Abuse  |  Print page  |  Powered by Google Sites