
隨著全球資訊網快速蓬勃的發展,各式各樣的資訊內容和服務不斷地擴增,人類的生活和互動方式逐漸轉移到網路平台,並且伴隨著無線網路和多媒體技術的快速進展,傳統的資訊檢索技術也不斷地和這些新的資訊媒體和平台結合,產生許多創新的研究,這些創新研究和重要應用議題仍然受到學術界和產業界重視和熱烈討論。因此,本研討會將邀請國內外相關學者專家進行觀念和技術交流。本研討會係繼2002年「資訊自動分類技術研討會」、2003年「資訊檢索與電腦輔助語言教學研討會」、2004年「文件探勘技術研討會」、2005年「網路資訊檢索技術與趨勢研討會」、2006年「網路探勘技術與趨勢研討會」、2007年「Web 2.0技術與應用研討會」、2008年「網路社群服務計算暨探勘技術研討會」、2009年「行動資訊檢索暨行動定位服務技術研討會」、2010年「2010資訊檢索創新技術研討會」、2011年「音樂資訊檢索暨社群服務技術研討會」、2013、2014年「資訊檢索頂尖論文研討會」、2015年「跨領域自然語言處理與資訊檢索技術新趨勢」、2016年「資訊檢索大未來」及2018年「資訊檢索與人工智慧」後續的年度會議活動,每年研討會的主題都獲得廣大迴響。



張詠淳教授 (臺北醫學大學大數據科技及管理研究所)

古倫維 副研究員 (中央研究院資訊科學研究所)










IR Workshop 2021 Agenda

2021/03/10 (星期三)

March 10, 2021 (Wednesday)





🔗 Video🎞


Keynote Speech 1 - Artificial intelligence for data retrieval in medical applications, now and the future

杜奕 執行長 (台灣人工智慧實驗室 Taiwan AI Labs )

🔗 Video🎞


There is an increasing interest in exploiting the vast amount of rapidly growing content related to health using information retrieval and Deep Learning strategies. The Real-World-Evidence health-related content retrieval by AI in medicine-related applications is particularly challenging. Implicit differences in language characteristics depend on the content type. The difference comes from a different content format such as healthcare documentation and clinical records, professional or scientific publications, clinical trials documentation..., etc. Moreover, it is also critical to provide search solutions for non-English content and cross-language or multilingual IR solutions to overcome the challenge from the language mixture of Chinese, Taiwanese, Chinese-English. This talk will briefly introduce how we are currently applying AI-based information retrieval to diverse applications in the medical area.


Ethan Tu,其實就是台灣人熟悉的「PTT 之父」杜奕瑾。PTT 正是由他在 22 年前,就讀台灣大學資工系大二時,於宿舍內用 486 電腦所架設出來的。

杜奕瑾也因此被無數曾經、或至今仍熱切使用這個匿名言論平台的「鄉民」們,暱稱為「PTT之父」、「杜老爹」甚至「上古神獸」、「創世神」。(編按:PTT 是全華文世界最大的 BBS 網路社群,在 BBS 的全盛時期,15 年前全台共有超過 400 個 BBS 站,如今卻僅剩下 PTT 屹立不搖。其留下來的「鄉民」組織規範,社群的習性與動員力,深深影響台灣社會與中國大陸 BBS 社群的發展,就連臉書有的推讚推文習慣多年前在 PTT 年即已呈現。)

「創世神」向來不喜在幕前曝光,近年更極少出現在台灣的公開場合。在台灣,只有部分業界人士清楚知道,當年台大資工系畢業,並參與台灣第一代網路公司的蕃薯藤創立後,杜奕瑾便前往美國,先在美國菁英齊聚的國家衛生研究院(NIH)從事基因序列與癌症自動化檢測研究,接著於十一年前加入當時全球的科技巨擘微軟(Microsoft),在美國西雅圖的微軟總部,進行搜尋引擎 bing 的開發,以及擔任微軟人工智慧超過 11 年以上的研究工作,並當上微軟人工智慧團隊(AI.R.)首席亞太區研發總監。

「台灣孕育著國際級的教授,頂尖的軟體人才,我計畫召集《台灣 AI 實驗室》,實實在在地與台灣領頭企業合作 AI 實驗,配合國際科技巨擘與人才,願以台灣在地的體驗與創意,培養台灣的軟體實力,行銷國際。」

「台灣的 AI 元年,從此刻開始」

(source: https://crossing.cw.com.tw/article/7805)


Coffee break


Invited Talk 1 - Neural Structured Learning: Theory, Framework and Applications

阮大成 博士 Technical Lead Manager (Google Research )


Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph or implicit as induced by adversarial perturbation. Structured signals are commonly used to represent relations or similarity among samples that may be labeled or unlabeled. Therefore, leveraging these signals during neural network training harnesses both labeled and unlabeled data, which can improve model accuracy, particularly when the amount of labeled data is relatively small. Additionally, models trained with samples that are generated by adding adversarial perturbation have been shown to be robust against malicious attacks. NSL has been open sourced as part of the TF ecosystem, and we will also introduce several of industrial applications enabled by NSL, such as learning state-of-the-art image semantic embeddings and learning knowledge graph embeddings.


Machine learner, software developer, and researcher: Da-Cheng Juan is a tech lead and engineering manager at Google Research, leading a research group working on graph learning, adversarial learning, and their real-world applications. Da-Cheng also holds the position of adjunct faculty in the Department of Computer Science, National Tsing Hua University. Previously, he received his Ph.D. from the Department of Electrical and Computer Engineering and his Master’s from the Machine Learning Department, both at Carnegie Mellon University. Da-Cheng has published more than 50 research papers and has repetitively served as a program committee in top conferences and workshops in machine learning, computer vision, natural language processing and related fields; in addition to research, he also enjoys algorithmic programming and has won several awards in major programming contests. Da-Cheng was the recipient of the 2012 Intel PhD Fellowship. His current research interests span across machine learning, convex optimization, and energy-efficient computing.


Invited Talk 2 - Towards Conversational AI

陳縕儂 副教授 (國立臺灣大學資訊工程學系 )

🔗 Slide
🔗 Video🎞


Even conversational systems have attracted a lot of attention recently, the current systems sometimes fail due to the errors from different components. This talk presents potential directions for improvement: 1) we first focus on learning language embeddings specifically for practical scenarios for better robustness, and 2) secondly we propose a novel learning framework for natural language understanding and generation on top of duality for better scalability. Both directions enhance the robustness and scalability of conversational systems, showing the potential of guiding future research areas.


Yun-Nung (Vivian) Chen is currently an associate professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems, language understanding, natural language processing, and multimodality. She received Google Faculty Research Awards, Amazon AWS Machine Learning Research Awards, MOST Young Scholar Fellowship, and FAOS Young Scholar Innovation Award. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond. (http://vivianchen.idv.tw/)




Keynote Speech 2 - 一個全新的自然語言模型 Principle-based Approach

許聞廉 特聘研究員 (中央研究院資訊科學研究 )


統計式機器學習在語言理解上有下列致命傷: 1. 學到的「知識」(大量的參數)很難表達給人看懂,有錯誤很難修正;2. 統計式的機器學習以「分類」辨識為主,很難融入「規則」;3. 「純文本」的字面學習無法解決問題。有許多的external knowledge 必需在適當時機加入,才有可能讓電腦合理地運作下去(end-to-end不work)。我們提出一個新的model:Principle-based Approach(PBA),可綜合統計和rule-based兩者的優點,而且符合機器學習 N-fold training & test 的原則。PBA有幾個要素: 1. 事先統計每個詞 X 的「修飾語」,稱之為 X 的FB。將修飾語和 X 合併後的短語稱為 「 X 的概念」; 2. 利用FB的簡化法自動將句子或片語表達成概念的N-gram,存成pattern(又稱principle)辭典; 3. PBA以pattern matching 作為 similarity 的比對依據。Pattern inference可解決許多自然語言的疑難雜症(尤其是目前機器學習難以做到的部分),我們將在這次的talk中詳細說明。


Wen-Lian Hsu (F'06) is a Distinguished Research Fellow of the Institute of Information Science, Academia Sinica, Taiwan. He received Ph.D. in operations research from Cornell University in 1980. Dr. Hsu's earlier contribution was on graph algorithms and he has applied similar techniques to tackle computational problems in biology and natural language. In 1993, he developed a Chinese input software, GOING, which has since revolutionized Chinese input on computer. He later applied similar semantic analysis techniques to question answering system and biological literature mining. Dr. Hsu received numerous awards both from academia and from industry. Recently, he developed an interpretable machine learning technique based on reduction, which takes advantage of the idea of context representation from word embedding, and performs better than BERT in several applications.


Invited Talk 3 - All the Wiser: Fake News Intervention towards Effective Clarification

古倫維 副研究員 (中央研究院資訊科學研究所 )


Fake News has been shown to have a significant impact on people's daily life. Governments and research communities propose many approaches to stop the fake news dissemination. However, the effectiveness is limited and some side effects have been observed. This talk will introduce a news reading platform in which we propose an implicit approach to reduce people’s belief in fake news. Specifically, it touches on how we leverage reinforcement learning to learn an intervention module on top of a recommender system (RS) such that the module is activated to replace RS to recommend news toward the verification once users touch the fake news. The effectiveness of the proposed approach is shown by automatic evaluation and user study. Moreover, the comparisons to other commonly adopted methods will be discussed. The deployment, related applications and the future goal of the proposed concept will conclude this talk.


Prof. Lun-Wei Ku, IIS, Academia Sinica

Lun-Wei Ku is now an associate research fellow in Institute of Information Science, Academia Sinica, adjunct associate professor of national Chiao-Tung university (NCTU), and the secretary-general of Association for Computational Linguistics and Chinese Language Processing (ACLCLP). She received her M.S. and Ph.D. degrees from Department of Computer Science and Information Engineering, National Taiwan University. Her research interests include natural language processing, information retrieval, and computational linguistics. She has been working on sentiment analysis since year 2005 and was the co-organizer of NTCIR MOAT Task (Multilingual Opinion Analysis Task, traditional Chinese side) from year 2006 to 2010. Her international recognition includes Good Design Award Selected (2012), CyberLink Technical Elite Fellowship (2007), IBM Ph.D. Fellowship (2008), and ROCLING Doctorial Dissertation Distinction Award (2009). Other professional international activities she involved include: General Chair, StarSem 2021, Program Chair, StarSem 2019 and ARIS 2019, Best Paper Committee, ACL 2019; Student Workshop Chair, AACL-IJCNLP; Area Chair, ACL 2021, NAACL 2021, ACL 2020, COLING 2020, EMNLP 2019, ACL 2017, CCL 2016, NLPCC 2016, ACL-IJCNLP 2015 and EMNLP 2015; Financial Chair, IJCNLP 2017; Publication Co-Chair, IJCNLP 2013; Publicity Chair, AIRS 2010. She is also active in industrial collaborations and currently working with companies like E-Sun Bank and WinGene.


Coffee break


Invited Talk 4 - Artist Interpersonal Relation Enhanced Graph Model for Recommendation

黃瀚萱 助理教授 (國立政治大學資訊科學系 )


Music recommendation is a hot research topic in both academics and industry. Existing approaches to music recommendation are mostly based on structural information such as collaborative filtering and graph modeling. In this work, we propose a multi-modal heterogeneous graph (MMHG) model for leveraging both content-based and structure-based information for music recommendation. We train our MMHG to capture the relations among different kinds of vertices including users, music items, genres, moods, and artists' social network to enrich the features with the acoustic and the textual information of music contents and the social network of artists. By incorporating sophisticated relations among the different concepts in addition to enriched features, the effectiveness of our approach is confirmed in the experiments.


Dr. Hen-Hsen Huang is an assistant professor in the Department of Computer Science at the National Chengchi University. His research interests include natural language processing and information retrieval. His work has been published in ACL, SIGIR, WWW, IJCAI, CIKM, COLING, and so on. Dr. Huang’s award and honors include the Honorable Mention of Doctoral Dissertation Award of ACLCLP in 2014 and the Honorable Mention of Master Thesis Award of ACLCLP in 2008. He served as the registration chair of TAAI 2017, the publication chair of ROCLING 2020, and as PC members of representative conferences in computational linguistics including ACL, COLING, EMNLP, and NAACL. He was one of organizers of FinNum Task at NTCIR-2014 and FinNLP Workshop at IJCAI 2019.


Invited Talk 5 - Information extraction from unstructured text data in the electronic medical records-status quo and challenges

許明暉 數據長 (臺北醫學大學數據處 )


In the past ten years, with the continuous advancement and adoption of health information technology, medical institutions around the world have acquired a large amount of electronic medical record (EMR) data after long-term collection. For clinical scientific research, EMR data has the advantages of low cost and timely compared with data obtained in the clinical trial. At present, more and more studies have used EMR data in clinical research such as efficacy analysis and outcome analysis.

Health data includes personal health data from mobile devices, hospital clinical data, genetic data, and public data for disease prevention and control. The integration of data from the multiple sources can provide the fundamentals for health promotion, disease prevention, and national health strategies.

EMR has the characteristics of diversity, incompleteness and redundancy, which make it difficult to carry out data analysis directly. It is necessary to preprocess the source data to improve data quality. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semi-structured or unstructured data, such as medical text, it requires more complex and challenging processing methods. This presentation will focus on information extraction text data in EMR. Text in EMRs is accessible, especially with open-source information extraction algorithms, and significantly improves case detection when combined with codes. However, more harmonization of reporting within EMR studies is needed.


Min-Huei Marc Hsu is a Professor of the Graduate Institute of Data Science at Taipei Medical University. Dr. Hsu has dedicated himself to the adoption of health information technology. He has been involved deeply in digital health projects in Taiwan. He is one of the essential promoters of Taiwan's National EMR exchange program. Dr. Hsu was appointed as Director of Medical Informatics Center at the Ministry of Health and Welfare of Taiwan in March 2011. Before the MOHW appointment, Dr. Hsu served as CIO at Taipei Medical University and also a Consultant Neurosurgeon at Wanfang Hospital (a 746-bed hospital affiliated to Taipei Medical University). Besides, he is author and co-author of more than 80 papers and articles in international conferences and scientific journals, focusing on health data, health information technology, e-health, electronic medical record system, hospital information management, and patient safety.





一般人士:會員 NT$700,非會員 NT$900

  生:會員 NT$500,非會員 NT$700

繳費截止:即日起至 2021 03 03 日(現場報名加收 NT$200元)。




(劃撥通訊欄內請註明「IR Workshop以及Registration ID.;同單位多位報名可合併劃撥)












搭捷運:搭乘捷運文湖線至(六張犁站)下車,單一出口基隆路走往台北市政府方向步行 300 公尺5 分鐘)可抵統一超商(7-ELEVEN)喬治門市對面臺北醫學大學大安校區


(國道3號)由信義快速道路下來走左側 2 條車道下出口,進入信義路五段直走往基隆路/市政中心方向行進約1.1公里後,左轉基隆路二段,沿基隆路二段直走1公里後,右側即可見臺北醫學大學大安校區。

(環東大道)沿著基隆路的路標走,靠左繼續走基隆路地下道,繼續直行基隆路一段,接續直行基隆路二段 1 公里後,右側即可見臺北醫學大學大安校區。



B2 會議廳



電 話:(02)6638-2736 分機 1105

Email chenyu@tmu.edu.tw

辦公室106 台北市大安區基隆路二段172-1號11樓 (臺北醫學大學 大安校區)