CAI + CAI = CAI^2

When creative AI meets conversational AI (CAI + CAI = CAI^2)

Welcome! ようこそ！

CAI+CAI first workshop will be co-held at Natural Language Processing 2021 Conference (言語処理学会年次大会2021), in Japanese and English.

本ワークショップは 言語処理学会第27回年次大会 の一部として開催されます．ワークショップのみ参加される場合でも年次大会への参加登録が必要です．早期登録(2/15までに入金)と，直前登録(3/12まで)では価格が異なりますので，早期の登録をおすすめいたします．

For paper submission and any workshop related issues, please connect us: cainlp2021 at gmail.com

Creative AI, training generative deep neural networks for NLP (such as poems, Haiku, stories), image (such as painting, animation), and speech (such as classic and popular music generation, singing), has achieved impressive milestones during recent years, thanks to deep networks such as attentive encoder-decoder architectures (Transformers), generative-discriminative frameworks (GANs) and self-supervised encoders (VAEs). On the other hand, conversational AI products, text and speech based multi-modal communication between chatbots and human beings, have obtained million-level users in Japan and globally. In order to construct the strong persona of conversational AI products, chatbots are enhanced to be able to interactively write poems, create songs, sing, and even tell stories, through multi-turn communications. Furthermore, QA-style and IR-oriented chatbots of general domains and vertical domains such as finance, healthcare and even emotional pure-chatting are also requiring generative, creative and explainable AI models to support the multi-modal and multi-turn interactions with human beings. In this workshop, we are aiming at collecting, sharing, and discussing state-of-the-art research on creative AI and conversational AI, empowered by large-scale open datasets, open-source architectures, and distributed GPU platforms.

Most importantly, with creative AI combined with conversational AI, we are aiming at bringing AI to help assisting under-represented groups’ learning and communicating with the world, such as interactive music therapy, children’s painting guiding, emotional caring for social phobia and elderly cognition guarding.

[The same content in Japanese]

クリエイティブAIと呼ばれる自然言語処理（詩，俳句，物語文など），映像（絵画，アニメーションなど），音声（古典か現代的な音楽の生成，歌など）のニューラルネットベースの自動生成技術は，Attentionベースのエンコーダ・デコーダモデル（Transformer）や敵対的生成ネットワーク（GAN），変分オートエンコーダ（VAE）等の新しいディープラーニング手法により近年目覚ましい進化を遂げている．その一方，人とチャットボット間のマルチモーダルなコミュニケーションをテキストまたは音声ベースで提供する会話型AI製品は，日本だけでなく世界的に数百万レベルのユーザを獲得している．会話型AI製品の強力なペルソナを構築する為，チャットボットはマルチターンの会話を通じ対話的に詩の生成，歌の生成，歌唱，物語生成が可能なように強化・拡張をされてきている．さらにQAスタイルおよびIR（情報検索）指向のチャットボットの，一般的ドメインおよび特定ドメイン（金融，ヘルスケア，感情的な純粋なチャット等）も，マルチモーダルかつマルチターンの相互作業をサポートする為に生成的かつ創造的で説明可能なAIモデルを必要としている．本ワークショップでは，大規模データセット，オープンソースアーキテクチャ，分散型GPUプラットフォームを活用したクリエイティブAIや会話型AIに関係する最先端の研究を収集，共有，議論する事を目的とし，10〜15件のプレゼンテーションを招待することを目標とする．

クリエイティブAIと会話型AIを組み合わせることで，対話的な音楽療法，子供の描画補助，社交不安障害や高齢者の認知に対する感情的なケア等，マイノリティグループの学習や世界とのコミュニケーションを支援するためにAI技術の提供を目指す．

Note that the images come from the following references:

Music Generation

References:

Deep Learning Techniques for Music Generation--A Survey. https://arxiv.org/pdf/1709.01620.pdf
Music Transformer: Generating Music with Long-Term Structure. https://magenta.tensorflow.org/music-transformer
Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes. https://arxiv.org/pdf/2007.07244.pdf

AI Painting

References:

CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms. https://arxiv.org/pdf/1706.07068.pdf
Progressive Growing of GANs for Improved Quality, Stability, and Variation. https://arxiv.org/pdf/1710.10196.pdf
Self-Attention Generative Adversarial Networks. https://arxiv.org/pdf/1805.08318.pdf
https://www.fastcompany.com/90376689/what-you-look-like-as-an-renaissance-painting-according-to-ai
https://topten.ai/ai-painting-generators/

Conversational AI

References:

Schedule [ZOOM]

JST (GMT+9 timezone) 9:00AM - 18:00PM

[Yi Zhao, NII, Invited Talk] 30m Talk + 5m QA. Chair: Xianchao Wu【语音/Speech・音声】
- [JST 9:00AM - 9:35AM].
- https://www.google.com/url?q=https%3A%2F%2Fresearchmap.jp%2Fzhaoyi&sa=D
- Modeling and evaluation methods in current voice conversion tasks
- Voice conversion aims at changing the speaker identity from one to another, while keeping the linguistic content unchanged. In 2020, we organized the third edition of the voice conversion challenge and observed that VC methods have progressed rapidly thanks to advanced deep learning methods. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. However, the cross-lingual conversion task is, as expected, a more difficult task, and the overall naturalness and similarity scores were lower than those for the intra-lingual conversion task. In this talk, we will introduce the state-of-the-art of voice conversion techniques, performance and evaluation methods. We will also provide a summary of the available resources for voice conversion research.
[Yifan Jiang, University of Texas at Austin, Texas, Invited Talk] 45m Talk + 10m QA. [图像生成、GAN、画像生成] Chair=Xianchao Wu
- [JST 9:35AM - 10:30AM. Texas Local Time: 18th March 18:35 - 19:30 PM.]
- TransGAN: Two Transformers Can Make One Strong GAN https://arxiv.org/pdf/2102.07074.pdf
- https://github.com/VITA-Group/TransGAN
- The recent explosive interest on Transformers has suggested their potential to become powerful universal models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? In this talk, I will introduce TransGAN, a new GAN architecture completely free of convolution, using pure transformer-based networks.
- The talk will include some interesting questions such as how transformer-based GANs perform on various datasets; How fast TransGAN can be compared to ConvNets-based GAN; How it can be scaled-up to higher-resolution image generation tasks.

[break 10 minutes] 10:30AM - 10:40 AM

[Gamar Azuaje, NAIST, research paper presentation] 30m + 5m QA 【图像生成+NLP, Text-to-Image Generation】Chair: Haitao Yu, Xianchao Wu
- 10:40AM - 11:15AM
- Birdscribe: A Semantic Writing Assistant Employing Text-based Image Generation and Modification
- Paper URL: https://drive.google.com/file/d/18UaCLZNRCf9-17z1q7q_Gxd_6fYo8J4F/view?usp=sharing
- Authors= Gamar Azuaje, Kongmeng Liew, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki (NAIST), Dawar Khan (University of Haripur).
[森友亮/上原康平, The University of Tokyo, research paper presentation] 30m + 5m QA 【图像->文本的生成/Image-to-Text Generation】Chair: Haitao Yu, Xianchao Wu
- 11:15Am - 11:50AM
- 視覚・言語融合 Transformer モデルによる画像からの物語文生成
- [English Title:] VisualNT-BART: Image to Narrative Generation with Vision and Language Transformer
- Paper URL: https://drive.google.com/file/d/1WipLdMwbNs12Tj5A6_G7X3zzHLdCSo_n/view?usp=sharing
- PPT URL: https://drive.google.com/file/d/1LNA_aoIAlXuTnZJqZ-3MlB4qj4vTpmvm/view?usp=sharing
- Authors=森友亮 1, 上原康平 1, 原田達也 2,3
- 1 東京大学大学院情報理工学系研究科, 2 東京大学先端科学技術研究センター, 3 RIKEN
- Authors: Yusuke Mori*, Kohei Uehara*, Tatsuya Harada (* indicates Equal Contribution)

Lunch Break 11:50AM - 13:00PM

[Haitao Yu, The University of Tsukuba, Invited Talk], 30m Talk + 5m QA. Chair = Xianchao Wu TBD 【NLP，信息检索、情報検索】Chair=Xianchao Wu, Peiying Ruan
- 13:00 - 13:35
- https://www.google.com/url?q=https%3A%2F%2Fii-research-yu.github.io%2F&sa=D
- Neural Learning-to-Rank
- Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been strong and continuing interest in exploring how to deploy neural networks based techniques, such as reinforcement learning, adversarial learning and BERT, to solve ranking problems. This talk will provide an overview of the recent neural learning-to-rank models. Based on the open-source project PT-Ranking which is maintained by my group, a comprehensive comparison of representative neural learning-to-rank models will be conducted over widely used benchmark datasets.
[Noa Garcia,http://noagarciad.com/, Osaka University, Invited Talk] 45m + 10m. Chair=Dr. Peiying Ruan (Colleen), Xianchao Wu [艺术，视觉，语言、美術、画像、言語]
- 13:35 - 14:30
- Title: Understanding Fine-Art Paintings through Visual and Language Representations
- Abstract: In computer vision, visual arts are often studied from a purely aesthetics perspective, mostly by analyzing the visual appearance of an artistic reproduction to infer its attributes, its representative elements, or to transfer the style across different images. However, understanding an artistic representation involves mastering complex comprehension processes, such as identifying the socio-political context of the artwork or recognizing the artist main influences. In this talk, we will explore fine-art paintings from both a visual and a language perspective. The aim is to bridge the gap between the visual appearance of an artwork and its underlying meaning, by jointly analyzing its aesthetics and its semantics. We will explore the use of multi-modal techniques in the field of automatic art understanding as well as potential tasks in which these techniques are applied, including information retrieval, automatic description generation, or visual question answering.

[Break 10 minutes] 14:30 - 14:40

[Jiaxian Guo, The University of Sydney, Invited Talk] 30m Talk + 5m QA; Afternoon preferred. [图像General、画像翻訳] Chair=Lin Gu/Peiying Ruan
- 14:40 - 15:15
- https://scholar.google.com/citations?user=wQgPocEAAAAJ&hl=en
- https://openreview.net/pdf?id=R5M7Mxl1xZ
- Minimal Geometry Constraint for Unsupervised Image to Image Translation
- Unsupervised image-to-image (I2I) translation, which aims to learn a domain mapping function without paired data, is very challenging because the function is highly under-constrained. Despite the significant progress in constraining the mapping function, current methods suffer from the geometry distortion problem: the geometry structure of the translated image is inconsistent with the input source image, which may cause the undesired distortions in the translated images. To remedy this issue, we propose a novel I2I translation constraint, called Minimal Geometry-Distortion Constraint (MGC), which promotes the consistency of geometry structures and reduces the unwanted distortions in translation by reducing the randomness of color transformation in the translation process. To facilitate estimation and maximization of MGC, we propose an approximate representation of mutual information called relative Squared-loss Mutual Information (rSMI) that can be efficiently estimated analytically. We demonstrate the effectiveness of our MGC by providing quantitative and qualitative comparisons with the state-of-the-art methods on several benchmark datasets.
[Xianchao Wu, NVIDIA, Invited Talk] 45mTalk + 10mQA 【Creative AI, Conversational AI】. Chair=Lin Gu/Peiying Ruan
- 15:15 - 16:10
- https://developer.nvidia.com/nvidia-jarvis
- Painting, Singing, Poeming and open-source Jarvis for Conversational AI
- Video is here at NVIDIA's GTC2021 (April) as well.
- PPT/PDF is here.
- We cover creative AI in the following directions, AI painting that takes textual messages interactively from chatbot's users and then draws a paint inspired by users' inputs; pure music generation with steps of melody, lyrics, rhythm, pitch, musical instruments, singing voice synthesis; textual content generation that includes poem, story generation with or without the hint from multi-modal inputs (such as images, voices and so on). Example outputs and neural network architectures are illustrated in the talk. In addition, an open-source conversational AI platform for developers, NVIDIA Jarvis, will be briefly introduced as well.
- [apologize that the original speaker could not give a talk due to some issues, we switched to this topic and express our thankfulness for your understanding]

[Break 10m] 16:10 - 16:20

[Lin Gu, RIKEN, Invited Talk] 45m + 10m QA. 【医学图像・医学画像処理、Machine Learning, Deep Learning, General Models, Computer Vision】Chair=Lin Gu/Peiying Ruan
- 16:20PM - 17:15PM
- https://scholar.google.com/citations?user=gIEZe5IAAAAJ&hl=zh-CN
- Limited Data and Interpretability of Medical Image Analysis: The Challenge of Chance
- Though deep learning has shown successful performance in the medical image analysis in the tasks of classifying the label and severity stage of certain disease. However, the CNN based methods suffer the bottleneck of lacking training label and interpretability. For example, most of them give few evidence on how to make prediction. To make it worse, the ubiquitous adversarial attack has posed even more serious challenge on its real application. This talk would introduce the recent progress for these challenges on various medical image domains.
- Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch’s Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.
[Peiying Ruan, NVIDIA, invited talk] 30m Talk + 5m QA. 【医学+AI、医学画像処理】Chair=Lin Gu/Peiying Ruan
- 17:15 - 17:50
- https://www.google.com/url?q=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FPeiying_Ruan&sa=D
- Title: COVID-19 Multi-Modal Learning and Federated Learning Powered by NVIDIA Clara
- Abstract: Since the first report in December 2019, the COVID-19 has been spreading and become a global pandemic across the world. Due to the limitation of medical resources, how to predict disease progress has become a challenging problem. Multi-modal models that use multiple data have been shown to significantly improve the performance compared to using one type of data. In this talk, we will introduce our work on COVID-19 using multi-modal techniques. Besides, as we know, AI requires massive amounts of data. This is particularly true for industries such as healthcare, financial, or fields such as ASR. In order to build robust AI algorithms, hospitals and medical institutions often need to collaboratively share and combine their local knowledge. However, this is challenging because patient data is private by nature. Federated learning enables different sites to securely collaborate, train, and contribute to a global model without sharing data. We will introduce federated learning powered by NVIDIA Clara, which can be applied in many fields including conversational AI.

All presentations will be given in ZOOM.

Connect: cai2021@gmail.com for obtaining ZOOM URL.

Submission Guidance

注意：本ワークショップは 言語処理学会第27回年次大会 の一部として開催されます．ワークショップのみ参加される場合でも年次大会への参加登録が必要です．早期登録(2/15までに入金)と，直前登録(3/12まで)では価格が異なりますので，早期の登録をおすすめいたします．

Both Japanese and English are WELCOME!

You can submit your PDF paper following ACL format, NeurIPS format or anyother formats you prefer. Also, you can refer the 2020 paper examples from https://www.anlp.jp/proceedings/annual_meeting/2020/.

We prefer the paper is 2 to 4-page (include references). However, you can make it arbitrarily long (or short) as far as you think the content is complete enough.

For paper submission and any workshop related issues, please connect us:

cainlp2021 at gmail.com

Schedule:

2020.Nov.24 First Call for Papers

~~2021.Feb.15~~ (Extended!!! 2021.Feb.28) Deadline for papers

2021.Mar.01 Schedule Open

2021.Mar.19 [JST timezone, GMT+9] Online (Zoom) workshop [9:00AM - 18:00 PM]

一般発表募集中（下記のテーマにも関わらず！）

We welcome your submissions to this workshop of and not limited to the following fields:

Conversational AI, chatbots, dialog systems [対話AI、チャットボート、対話システム]
- single-turn/multi-turn 「シングルターン・マルチターン対話」
- question-answering　「質問応答」
- dialog management　「対話管理」
- task-oriented conversational AI　「タスク向け対話AI」
- multi-modal conversational AI　「マルチモーダル対話AI、音声、テキストやビデオ」
Creative AI of NLP, image, speech and related fields　[クリエイティブAI、自然言語処理、画像、音声と関連する分野]
- music generation　「音楽生成」
- AI painting　「AI絵画」
- video parsing/understanding　「ビデオ解析・理解」
- multi-modal creative AI　「マルチモーダルクリエイティブAI」
- text generation　「テキスト生成、俳句、小説、物語文など」
All types of scenarios of creative AI + conversational AI　「クリエイティブAI＋対話AIのあたゆるシナリオ」
- interactive AI creation　「インタラクティブAIものつくり」
- deep learning algorithms/frameworks of CAI+CAI　「深層学習アルゴリズム・フレームワーク」
- information retrieval (example HERE) with conversations　「対話による情報検索」
- information retrieval of AI creative contents　「AIが作ったコンテンツの情報検索」
All other topics related to AI that you prefer to share [他のAIに関するテーマ、分野は自由に選べるのです]

予稿の掲載

本ワークショップでは予稿の提出は任意としますが，提出された予稿については本Webサイトに掲載します．

予稿については本会議の様式に準拠して作成してください（ただしページ数制限なし）．参考: https://www.anlp.jp/nlp2021/#application_notice
掲載にあたり問題が生じる懸念がある場合は掲載をお断りしたり，修正をお願いすることがあります．ご了承ください．
予稿の著作権は著者に帰属しますが，Creative Commons Attribution 4.0 International License で公開することに同意していただきます．