Abstract: In this keynote, I explore a pivotal challenge in document knowledge extraction: the lack of structural information, such as paragraphs, tables, and logical structures, in untagged PDFs, which significantly hampers the efficacy of Retrieval-Augmented Generation (RAG) systems.
The presentation will focus on integrating accurate document structure recognition into the RAG process, showcasing how this method notably outperforms traditional rule-based PDF segmentation techniques. Empirical studies will demonstrate the enhanced quality of knowledge question-answering when RAG systems accurately identify and interpret the missing structural elements in PDFs.
My goal is to highlight the need for language models to evolve from one-dimensional sequential language processing to a more nuanced two-dimensional, multimodal language understanding. This advancement is essential for truly grasping the meanings and insights contained in richly-formatted documents, paving the way for future research in document intelligence.
Biography: Yixuan Cao is a associate researcher at the Institute of Computing Technology, Chinese Academy of Sciences, and Master's supervisor. He received his Ph.D. from the Institute of Computing Technology, Chinese Academy of Sciences. His main research area is Natural Language Processing and Document Intelligence. He has published more than ten papers in conferences and journals such as KDD, NeurIPS, WWW, AAAI. His findings are applied to products like ChatDOC, serving over 500,000 users globally, and have been implemented by several domestic financial institutions.
Abstract: In this keynote, the speaker proposes a way for large language models to contribute to data mining: a judgment on the logical association between variables, which is crucial for reliable correlation and lead-lag effect in the future.
The presentation first tries to frame the input of a large language model as the “justification” in “justified true belief” definition of knowledge, which elevates the likelihood of future correlation, when paired with past correlation as “reality check”. A case study is also presented, which demonstrates such an effect based on a full set of PMI data, with the large language model as a judge of lead-lag effect from a common-sense perspective.
The objective here is to offer a solution that utilizes a large language model to generate usable predictions on a large scale. It is potentially generalizable in data mining scenarios in which common sense or domain-specific knowledge is considered essential.
Biography: Zhengfei Li is currently a Senior Research Analyst at E Fund Intelligent Solutions. His current focus is the application of Generative AI in financial institutes. His team has offered several customized large language models for internal use, along with a dozen applications. Previously, his general research direction was scientific investing and AI applications.