Instructor: Asst Prof. Mike Shou
Format: Hybrid & LumiNUS platform
Description:
This course briefly recaps deep learning basics, covers robotic perception systems including audio, language, vision and then dives into the learning across different modalities e.g. vision-language grounding, vision-language navigation, etc.
No specific textbook
Assessment
20% open-book take-home assignments
20% project
60% closed-book final exam
TA
Eric Zhongcong Xu (zhongcongxu@u.nus.edu)
Stan Weixian Lei
David Junhao Zhang
QA
For questions, pls posted in this google doc:
Online office hours: biweekly on Fri 2-3pm (even weeks), check luminus for the zoom link
Topics we will cover:
Review deep learning, numpy, pytorch
Robotic auditory system
Robotic vision system
Spoken dialogue system
Range sensor & Touch sensor
Video: action and event, video grounding, etc.
Visual-language, audio-visual applications: grounding, captioning, retrieval, QA, etc.