Schedule

Saturday, April 27 , 2024 (in Japan Standard Time)

Friday, April 26 (in EST, PST)

Session 1: Recent Advances in Large Language Models and Conversational Agents

11:00-12:00, JST

10:00-11:00, UTC+8

11:00 - 11:30, JST

10:00 - 10:30, UTC+8

21:00-21:30, EST

18:00-18:30, PST

Leveraging LLMs for Automated Grading of Student Assignments: A Case Study
利用大型語言模型自動評分學生作業：案例研究

Hung-Yi Lee（李宏毅）

Abstract: This semester, I offered a course with over 1,000 registered students, necessitating the exploration of innovative solutions to manage the grading workload effectively. I employed large language models (LLMs) to automate the grading process to address this challenge. This case study aims to share valuable insights from utilizing LLMs for automated grading in a large-scale educational setting.

摘要：這個學期，我開設了一門課程，註冊的學生超過一千人，這促使我尋找新的解決方法來有效率地評分。為了面對這個挑戰，我用大型語言模型（LLMs）進行自動化評分。本次演講我將分享在註冊人數眾多的課程中，如何使用LLMs自動評分的寶貴經驗。

11:30 - 12:00, JST

10:30 - 11:00, UTC+8

21:30-22:00, EST

18:30-19:00, PST

From Bots to Buddies: Making Conversational Agents More Human-Like
從機器人到閨蜜：使聊天機器人的會話更具人性

Yun-Nung (Vivian) Chen（陳縕儂）

Abstract: While today's conversational agents are equipped with impressive capabilities, there remains a clear distinction between the intuitive prowess of humans and the operational limits of machines. An example of this disparity is evident in the human ability to infer implicit intents from users' utterances, subsequently guiding conversations toward specific topics or recommending appropriate tasks or products. This talk aims to elevate conversational agents to a more human-like realm, enhancing user experience and practicality. By exploring innovative strategies and frameworks that leverage commonsense knowledge, we delve into the potential ways conversational agents can evolve to offer more seamless, contextually aware, and user-centric interactions. The goal is to not only close the gap between human and machine interactions but also to unlock new possibilities in how conversational agents can be utilized in our daily lives.

摘要：雖然當今聊天機器人的能力已經讓人印象深刻，但與人類的直覺相比，機器人在運作上仍有許多限制。這種差異的一個例子是，人類能從使用者的言語中推斷出隱含意圖，進而引導對話朝特定主題發展或推薦適當的任務或產品。本次演講介紹如何將會話模型提升到更接近人類的程度，提升使用者體驗和實用性。藉由利用常識和知識的創新策略與框架，我們將深入探討如何改良聊天機器人，以提供更順暢、更具情境感知和以用戶為中心的互動。目標不僅是縮小人與機器互動的差距，還要讓聊天機器人在我們日常生活中有更多的可能性。

Session 2: Recent Advances in Machine Learning from Quantum and Theoretical Perspectives

12:00-13:00, JST

11:00-12:00, UTC+8

12:00 - 12:30, JST

11:00 - 11:30, UTC+8

22:00-22:30, EST

19:00-19:30, PST

Advances in Hybrid Quantum-classical Machine Learning
結合量子與傳統機器學習的進展

Samuel Yen-Chi Chen (陳彥奇)

Abstract: The evolution of machine learning (ML) and quantum computing (QC) hardware has sparked significant interest in the development of quantum machine learning (QML) applications. This presentation aims to offer a comprehensive insight into the hybrid quantum-classical machine learning approach, encompassing pivotal aspects such as quantum gradient calculation. Furthermore, it will delve into recent strides made in QML across diverse domains, including distributed or federated learning, natural language processing, reinforcement learning, sequential learning, and classification. The presentation will also address the potential advantages, scalability, and application scenarios of QML in the NISQ era.

摘要：機器學習（ML）和量子運算（QC）硬體的演進造成量子機器學習（QML）應用開發受到許多關注。本次演講旨在全面探討混合量子-經典機器學習方法，涵蓋量子梯度計算等關鍵方面。此外，演講還將深入探討QML在分佈式或聯邦式（distributed or federated learning）學習、自然語言處理、強化學習、時序學習（sequential learning）和分類等不同領域取得的最新進展。演講還將討論QML在NISQ時代的潛在優勢、可擴展性和應用情境。

12:30 - 13:00, JST

11:30 - 12:00, UTC+8

22:30-23:00, EST

19:30-20:00, PST

Exploring the Theoretical Foundations of Deep Learning
探索深度學習的理論基礎

Mark Chang

Abstract: Recent breakthroughs in deep learning have propelled a myriad of groundbreaking applications that were once inconceivable. However, the development of these applications often relies heavily on trial-and-error approaches, lacking robust theoretical foundations. Despite ongoing efforts to advance the theoretical understanding of these applications, progress on the theoretical side still lags behind practical applications. This presentation aims to elucidate the theoretical underpinnings of deep learning across three key dimensions: how to optimize loss functions, how to generalize by using the training dataset, and how to find the optimal policy in bandits and reinforcement learning. Additionally, this talk will highlight the publications MediaTek Research in these areas. By bridging theory and practice, this presentation seeks to foster a deeper appreciation for the essential role of theoretical research in understanding deep learning methodologies.

摘要：近年來在深度學習領域的突破，推動了許多曾經難以想像的創新應用。然而，這些應用的開發往往依賴大量實驗嘗試，缺乏理論基礎。儘管許多研究嘗試提出針對這些應用的理論解釋，但在理論方面的進展仍然落後於實際應用。本次演講主要講解深度學習理論基礎的三個方向：如何優化損失函數、如何以訓練數據集進行泛化、以及如何在多臂老虎機（bandits）和強化學習中找到最優策略（optimal policy）。此外，本次演講還將介紹聯發創新基地（MediaTek Research）在這些領域的研究發表。通過結合理論與實踐，本次演講將讓聽眾了解對理論研究在理解深度學習方法中，扮演著不可或缺的角色。

Session 3: Recent Advances in Large Language Models and Multimodal Models

13:00-15:00, JST

12:00-14:00, UTC+8

13:00 - 13:30, JST

12:00 - 12:30, UTC+8

23:00-23:30, EST

20:00-20:30, PST

Multimodal Document Understanding
多模態文本理解

Leo hsiu-Wei Yang (楊修維)

Abstract: Documents are not merely text. Business documents, such as legal contracts, invoices, and financial statements, also contain a lot of useful information through their visual design. We can classify documents to a certain extent without reading the text, by identifying the layout patterns of the content. The choice of fonts and colors can also help readers better understand the key points of the content. Therefore, in recent years, document understanding models have tended to introduce multimodal architectures to improve task performance. Mainstream models usually incorporate text, layout, and image as the three modes of inputs. In this talk, I will briefly introduce the field of document understanding and its main tasks, such as key information extraction and document layout analysis, and share our papers and practical applications at Thomson Reuters Labs.

摘要：文件不僅僅是文字。商業文件，如法律合約、請款單和財務報表，其內容的視覺設計也包含了許多有用的資訊。我們可以透過識別內容的排版模式，在不需閱讀文字的情況下進行一定程度的文件分類。根據字體和顏色的選擇，讀者也能更好地理解內容的重點。因此，近年來文件理解模型傾向於引入多模態的架構以提升任務表現。主流模型通常會引入文字、排版和圖像三個模態作為輸入。在這次演講中，我將簡單介紹文件理解這個領域及其主要的任務，如資訊擷取和版面分析，並分享我們實驗室發表的論文和實際應用。

13:30 - 14:00, JST

12:30 - 13:00, UTC+8

23:30-00:00, EST

20:30-21:00, PST

Building a Large Language Model for Chinese
打造中文化的大型語言模型

Yi-Chang Chen (陳宜昌)

Abstract: This presentation will delve into the process of building a Chinese language model. We will explore the mathematical foundations of Transformer, revealing how it has become a core structure for language processing and its application in the Chinese context. Further, we will discuss the processes of pre-training and fine-tuning, gradually constructing practical chatbots. Additionally, we will explore the utilization of RLHF (Reinforcement Learning from Human Feedback) to align with human conversations. Finally, we will introduce the model's scaling law, explaining the challenges and advantages brought by scaling and discussing potential unique scenarios encountered in the Chinese context. This presentation aims to provide the audience with a deeper understanding of the process of constructing Chinese language models, their applications in the Chinese context, and the mathematical foundations behind them.

摘要：本次演講將深入探討如何打造中文化的大型語言模型。我們將深入研究Transformer模型的數學基礎，揭示其如何成為語言處理的核心結構，並將其應用於中文語境。進一步探討預訓練和微調的過程，逐步打造實用的聊天機器人。此外，我們將討論如何利用RLHF（Reinforcement Learning from Human Feedback）來對齊人類對話。最後，我們將介紹模型的scaling law，解釋模型隨著規模擴大所帶來的挑戰和優勢，並討論在中文語境中可能遇到的特殊情況。這次演講將使聽眾對於建構中文化大型語言模型的過程、中文語境下的應用以及其背後的數學基礎有更深入的了解。

14:00 - 14:30, JST

13:00 - 13:30, UTC+8

00:00-00:30, EST

21:00-21:30, PST

Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4

具有猜疑心的代理人：利用具有心理意識理論的GPT-4玩部分資訊遊戲

Jiaxian Guo

Abstract: Unlike perfect information games, where all elements are known to every player, imperfect information games emulate the real-world complexities of decision-making under uncertain or incomplete information. GPT-4, the recent breakthrough in large language models (LLMs) trained on massive passive data, is notable for its knowledge retrieval and reasoning abilities. This paper delves into the applicability of GPT-4's learned knowledge for imperfect information games. To achieve this, we introduce Suspicion-Agent, an innovative agent that leverages GPT-4's capabilities for performing in imperfect information games. With proper prompt engineering to achieve different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable adaptability across a range of imperfect information card games. Importantly, GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it can understand others and intentionally impact others' behavior. Leveraging this, we design a planning strategy that enables GPT-4 to competently play against different opponents, adapting its gameplay style as needed, while requiring only the game rules and descriptions of observations as input. In the experiments, we qualitatively showcase the capabilities of Suspicion-Agent across three different imperfect information games and then quantitatively evaluate it in Leduc Hold'em. The results show that Suspicion-Agent can potentially outperform traditional algorithms designed for imperfect information games, without any specialized training or examples. In order to encourage and foster deeper insights within the community, we make our game-related data publicly available.

摘要：相較於完全資訊遊戲，每位玩家都已知所有資訊，部分資訊遊戲模擬在不確定或不完整資訊下做出決策的現實情況複雜性。GPT-4最近在大規模被動資料上訓練的大型語言模型（LLMs）有重大突破，並有廣為人知的知識檢索和推理能力。本文深入探討了GPT-4學到的知識在不完全信息遊戲中的適用性。為此，我們引入了具有猜疑心的代理人（Suspicion-Agent），一個具有創新性的代理，利用GPT-4的能力在部分資訊遊戲中運作。通過適當的提示工程（prompt-engineering）實現不同功能，基於GPT-4具有猜疑心的代理人在一系列部分資訊卡牌遊戲中表現出顯著的適應性。重要的是，GPT-4展示了強大的高階心理理論（ToM）能力，這意味著它可以理解他人並有意影響他人的行為。利用這一點，我們設計了一種規劃策略，使GPT-4能夠與不同的對手競爭，根據需要調整其遊戲風格，同時只需要遊戲規則和對觀察的描述作為輸入。在實驗中，我們定性展示了猜疑心代理人在三種不同的部分資訊遊戲中的能力，然後在Leduc Hold'em中對其進行定量評估。結果表明，猜疑心代理人有潛力超越專為部分資訊遊戲設計的傳統演算法，而無需任何專門的訓練或示範。為了促進研究社群能有更深入的洞見，我們有公開我們遊戲的相關資料。

14:30 - 15:00, JST

TBD

Abstract: TBD