Past Seminars: 2023

To find AI Seminar Recordings, go to the Amii Presents: AI Seminar 2023 YouTube playlist.

December 22
NO SEMINAR - U OF A WINTER CLOSURE

December 15
Recurrent Linear Transformers
Subhojeet Pramanik, University of Alberta

Abstract:
The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks.

Presenter Bio:
Subhojeet Pramanik works in deep learning and reinforcement learning. His expertise lies in building scalable algorithms, specifically in designing efficient transformer architectures. He completed his MSc at the University of Alberta under the supervision of Adam White and Marlos C. Machado. During his masters, he worked in representation learning in the partially observable reinforcement learning setting. He was also an intern at the Huawei Research Lab in Edmonton working on applying reinforcement learning techniques to deep learning operator fusion. Previously, Subhojeet worked at IBM Cloud for two years as an ML Engineer in topics such as deep learning and NLP.

December 8
Exploring 3D Human Motions with Deep Learning: Generation, Animation and Understanding
Chuan Guo, Department of Electrical and Computer Engineering, University of Alberta

YouTube link

Abstract:
Our research focuses on leveraging expressive deep learning tools to model and comprehend human actions, involving diverse areas such as motion synthesis from action categories (action2motion) or textual descriptions (text2motion), image-based character animation (motion2video), generative motion stylization, and reciprocal generation of motions and texts (motion2text-2-motion). To approach those goals, we introduce two extensive multimodal human action datasets: HumanAct12, with 1,191 motion clips and 90,099 frames annotated for 12 coarse-grained and 34 fine-grained action classes, and HumanML3D, comprising 44,970 textual descriptions and 14,616 motions. Technically, our investigation incorporates advanced deep generative models, such as variational autoencoders, GPT, and masked generative models, resulting in state-of-the-art performance in motion analysis.

Presenter Bio:
Chuan Guo is a final-year Ph.D. student in the ECE department at the University of Alberta. His research interests mainly revolve around 3D human motion synthesis, stylization, understanding, and character animation. He has a series of works published on top-tier computer vision and machine learning venues, including CVPR, ECCV, IJCV, ICCV, ICLR. Webpage: https://ericguo5513.github.io/

December 1
Presenting Multiagent Challenges in Team Sports Analytics
Dr. David Radke, Senior Research Scientist, Chicago Blackhawks

YouTube link

Abstract:
This talk will present several challenges and opportunities within the area of team sports analytics and key research areas within multiagent systems (MAS). We specifically consider invasion games, where players invade the opposing team’s territory and can interact anywhere on a playing surface (ice hockey or soccer). We discuss how MAS is well-equipped to study invasion games and will benefit both MAS and sports analytics fields. We highlight topics along two axes: short-term strategy (coaching) and long-term team planning (management).

Presenter Bio:
Dr. David Radke is a Senior Research Scientist with the Chicago Blackhawks of the National Hockey League (NHL). He holds a PhD from the University of Waterloo where he was advised by Dr. Kate Larson and Dr. Tim Brecht. His research areas include artificial intelligence (AI) and ice hockey analytics, specifically focusing on the areas of multiagent systems and reinforcement learning. He has published several papers at several top AI conferences, including IJCAI and AAMAS. He has also published papers at LINHAC, a hockey analytics conference, where he received a best paper award in 2022 for his work with NHL tracking data. Dr. Radke's graduate work was supported by the Vector Institute computing resources and NSERC, Ontario Graduate Scholarship, and Type 1 Cheriton awards.

Special Monday Seminar

November 27
Language Models as Lego Blocks of Reasoning
Dr. Hongyuan Mei, Toyota Technological Institute at Chicago (TTIC)

YouTube link

Abstract:
In recent years, language models (LMs) have emerged as transformative tools in the realm of artificial intelligence. They show strong language understanding and reasoning capabilities, presenting a wealth of opportunities for solving challenging problems. However, deploying them as independent problem solvers---even with sophisticated prompting techniques---often ends up with unsatisfactory results.

In this talk, I will introduce an alternative approach, which incorporates LMs within a larger framework for complex reasoning. Here, LMs propose solutions or logical pathways, which are then analyzed and utilized by the framework. We will showcase two challenging problems effectively addressed using this paradigm. The first is text-based logical reasoning, in which one has to determine the truth value of a statement given a set of rules and facts, expressed in human natural language. The second is event prediction, the task of reasoning about future events given the past. For both problems, our LM-in-the-loop frameworks learn to provide high-quality output beyond what an LM can offer as a standalone problem solver.

I will sketch a few future research directions for improving the fundamental reasoning capabilities of LMs, including embedding an LM within a reinforcement learning framework to develop foundation world models.

Presenter Bio:
Dr. Hongyuan Mei is currently a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC). He obtained his PhD from the Department of Computer Science at Johns Hopkins University (JHU), where he was advised by Jason Eisner. Hongyuan's research spans machine learning and natural language processing. Currently, he is most interested in harnessing and improving the reasoning capabilities of large language models to solve challenging problems such as event prediction. His research has been supported by a Bloomberg Data Science PhD Fellowship, the 2020 JHU Jelinek Memorial Award, and research gifts from Adobe and Ant Group. His technical innovations have been integrated into real-world products such as Alipay, the world’s largest mobile digital payment platform, which serves more than one billion users. His research has been covered by Fortune Magazine and Tech At Bloomberg.

November 24
The Benefits of Model-Based Generalization in Reinforcement Learning
Kenny Young, University of Alberta

YouTube link

Abstract:
It is widely believed that model-based reinforcement learning can improve sample efficiency by synthesizing imagined experience which generalizes beyond the data. However, learned value functions also generalize. Why should we expect model generalization to be inherently better? In this talk, I address this question with a simple theoretical result. I also present extensive empirical results which demonstrate that the intuition behind the theory leads to significant practical benefits in environments with underlying structure that allows learned models to generalize.

Presenter Bio:
Kenny Young is a PhD student at the University of Alberta, supervised by Richard Sutton. He has worked on a variety of topics in and around reinforcement learning, including developing the popular MinAtar testbed. Most recently he is interested in model-based reinforcement learning as a means to build agents that are capable of learning, planning and thinking about their world.

November 17
NO SEMINAR - U OF A READING WEEK

November 10
Ranking and Representing Hockey Players with Deep Reinforcement Learning
Dr. Oliver Schulte, Simon Fraser University

YouTube link

Abstract:
This talk develops the idea that reinforcement learning can solve the problems of sports analytics. I focus on ranking player performance. We measure the value of a player’s actions by how much they increase their team’s chance of success (i.e., scoring the next goal). I describe deep reinforcement learning techniques for building success probability models from sports data. The resulting action values and player rankings are illustrated with data from the National Hockey League comprising over 3M events.

Presenter Bio:
Dr. Oliver Schulte is a Professor in the School of Computing Science at Simon Fraser University, Vancouver, Canada. He received his Ph.D. from Carnegie Mellon University in 1997. His current research focuses on machine learning for structured data, such as sports events, networks, and relational databases. He has given several invited talks on sports analytics and spent two years at Sportlogiq, a sports analytics company. He has published papers in leading AI and machine learning venues on a variety of topics, including sports analytics, learning Bayesian networks, game theory, and scientific discovery. While he has won some nice awards, his biggest claim to fame may be a draw against chess world champion Gary Kasparov.

November 3
Visual Human Motion Analysis

Dr. Li Cheng, Electrical and Computer Engineering, University of Alberta

YouTube link

Abstract:
Recent advancement of imaging sensors and deep learning techniques has opened door to many interesting applications for visual analysis of human motions. In this talk, I will discuss our research efforts toward addressing the related tasks of 3-D human motion syntheses, pose and shape estimation from images and videos, visual action quality assessment. Looking forward, our results could be applied to everyday life scenarios such as natural user interface, AR/VR, robotics, and gaming, among others.

Presenter Bio:
Dr. Li Cheng is a professor at the Department of Electrical and Computer Engineering, University of Alberta. He is associate editor of IEEE Trans. Multimedia. Prior to joining University of Alberta, He worked at A*STAR, Singapore, TTI-Chicago, USA, and NICTA, Australia. His current research interests are mainly on human motion analysis, mobile and robot vision, and multimedia data analytics. More details can be found at http://www.ece.ualberta.ca/~lcheng5/.

October 27
Learning Sparse-Reward Tasks on Real Robots From Scratch
Gautham Vasan, University of Alberta

YouTube link

Abstract:
Learning from experience and continual adaptation to changing environments is crucial for intelligent robots to solve an open-ended sequence of increasingly complex tasks. In this talk, I’ll outline some oft-ignored, practical challenges of continual learning on real-world robots and address two such issues: (i) How to specify reinforcement learning tasks?, and (ii) How to set up a real-time learning agent? Our findings helped us produce the first demonstration of pixel-based control by real-time model-free learning on four different kinds of real robots from scratch in just a few hours.

Presenter Bio:
Gautham Vasan is a PhD student at the University of Alberta, advised by Dr. Rupam Mahmood. He is interested in building machines with human-like intelligence. His research focuses on policy gradient methods, real-time learning architectures and temporal abstraction in reinforcement learning. At Kindred Systems Inc, he worked on developing deep reinforcement learning techniques for an automated put wall robot, known as SORT, which efficiently identifies apparel items, picks them, places them, and sorts them into complete end-customer orders. His website: https://gauthamvasan.github.io/

October 20
Balancing Bias and Variance in Emphatic Off-policy Reinforcement Learning
Eric Graves, University of Alberta

YouTube link

Abstract:
Emphatic algorithms are an under-explored approach to off-policy reinforcement learning that emphasize or de-emphasize learning updates in a way that preserves many of the benefits of on-policy algorithms. However, existing methods for estimating how much to emphasize each update suffer from either extreme variance or extreme bias that can both lead to failure of the learning process in different situations. In this talk I discuss how these disparate algorithms can be combined to balance their strengths and weaknesses and successfully learn in situations where one or both of the constituent algorithms fails.

Presenter Bio:
Eric Graves is a PhD candidate at the University of Alberta, advised by Martha White and Richard Sutton. His research has focused on off-policy learning and policy gradient methods with the goal of improving the reliability, practicality, and range of applications of reinforcement learning algorithms. Prior to graduate school, Eric earned a BSc in Computing Science from the University of Alberta and worked in industry for several years as a software and video game developer.

Special Tuesday Seminar

October 17
AI-derived annotations for lung cancer collections using NCI Imaging Data Commons
Dr. Deepa Krishnaswamy, Brigham and Women's Hospital, Boston, MA

YouTube link

Abstract:
The development of tools for cancer imaging is dependent on the availability of public imaging datasets, many of which do not include annotations or derived features. We have generated AI-derived annotations for lung cancer imaging collections within the National Cancer Institute Imaging Data Commons (IDC), where the largest collection contains over 26,000 patients. Our work demonstrates how one can harmonize their data and make it publicly available along with code, to promote reproducibility and transparency of AI workflows.

Presenter Bio:
Dr. Deepa Krishnaswamy is a postdoctoral research fellow at Brigham and Women’s Hospital, working with Dr. Andrey Fedorov in the Surgical Planning Laboratory. She completed her PhD at the University of Alberta under the supervision of Dr. Kumaradevan Punithakumar and Dr. Michelle Noga, in the field of registration and segmentation of the left ventricle in ultrasound volumes. Over the past two years of her postdoc, she has focused on developing publicly available, and reproducible methods, for enhancing lung cancer imaging collections in NCI’s Imaging Data Commons. Additionally, she has developed and contributed to a 3DSlicer plugin for multi-parametric MRI annotation of the prostate. Her other interests include prostate data curation, deep learning approaches for MRI segmentation, and open-source tools for visualization.

October 13
Elephant Neural Networks: Born to Be a Continual Learner
Qingfeng Lan, University of Alberta

YouTube link

Abstract:
The input data in continual learning (CL) is non-iid, making it fundamentally different from traditional supervised learning. New optimizers, loss functions, and training strategies are developed for CL. Yet the interactions between CL and neural network architectures remain to be under-explored. We investigate the impact of activation functions in CL, revealing that both sparse gradients and representations are pivotal. Then we introduce elephant activation functions which produce both sparse values and gradients, significantly reducing catastrophic forgetting of neural networks.

Presenter Bio:
Qingfeng Lan is a PhD student at the University of Alberta, supervised by Rupam Mahmood. He is interested in designing simple and efficient algorithms, supported by sound theories and verified by rigorous experiments. In particular, his research focuses on continual reinforcement learning.

October 6
Empowering Investment Decisions: The AI Revolution at Lawtiq.com
Ali Salman - Founder of AI Powered Immigration & Settlement Platform Lawtiq.com

Co-hosted by Technology Alberta

YouTube link

Abstract:
In this seminar presented by the Alberta Machine Intelligence Institute (Amii) and Technology Alberta, Ali Salma, CMO of Lawtiq.com, shows that in the traditional investment landscape, investors face challenges in finding the right opportunities because of the role of intermediaries and the limitations of human judgment. This talk details the AI Advantage at Lawtiq.com, including an introduction to Lawtiq.com and its mission, how AI is integrated into Lawtiq.com's platform, and the benefits of AI-driven investment matching. These benefits include efficiency, accuracy, and personalization.

Presenter Bio:
Ali Salman is a problem solver, growth strategist, and fractional CMO who has worked with Remax, WFG, Subway, Manpower, YMCA, The Brick, Coca-Cola, Columbia University, Maxwell, Mucho Burrito, KIA, Honda, Samsung and other Fortune 500 and INC 5000 companies. Currently he heads Lawtiq.com

September 29
NO SEMINAR - SPEAKER CANCELLATION

September 22
Coordination of Robotic Swarms
Dr. Luiz Chaimowicz, Universidade Federal de Minas Gerais, Brazil

YouTube link coming soon!

Abstract:
Robotic Swarms are groups consisting of a large number of simpler robots, which don’t have many individual capabilities, but together can perform various types of tasks. Oftentimes inspired by their biological counterparts, robotic swarms normally rely on distributed and simple actions, which combined result in complex and emergent behaviors. In this talk, we will present some of the works that we have been developing at VeRLab - UFMG in the areas of navigation, segregation, transportation, localization and mapping with robotics swarms.

Presenter Bio:
Luiz Chaimowicz is a Full Professor in the Computer Science Department at the Universidade Federal de Minas Gerais (UFMG) - Brazil. He received his Ph.D. degree in Computer Science from UFMG in 2002, and from 2003 to 2004, he held a Postdoctoral Research appointment in Robotics with the GRASP Laboratory – University of Pennsylvania. Professor Chaimowicz co-directs UFMG’s Computer Vision and Robotics Laboratory (VeRLab) and the Multidisciplinary Research Laboratory in Digital Games. His research encompasses various aspects of artificial intelligence applied to mobile robotics and digital games. In addition to his research and teaching, he has held different administrative roles at UFMG, including serving as the Chair of the Department of Computer Science from 2021 to 2023, Computer Science Graduate Program Coordinator from 2014 to 2016, and Undergraduate Program Coordinator from 2009 to 2011.

September 15
Learning Models that Predict Objective, Actionable Labels
Dr. Russ Greiner, University of Alberta & Amii

YouTube link

Abstract:
Many medical researchers want a tool that “does what a top medical clinician does, but does it better”. This presentation explores this goal. This requires first defining what “better” means, leading to the idea of outcomes that are “objective” and then to ones that are actionable, with a meaningful evaluation measure. We will discuss some of the subtle issues in this exploration – what does “objective” mean, the role of the (perhaps personalized) evaluation function, multi-step actions, counterfactual issues, distributional evaluations, etc. Collectively, this analysis argues we should learn models whose outcome labels are objective and actionable, as that will lead to tools that are useful and cost-effective.

Presenter Bio:
Russ Greiner worked in both academic and industrial research before settling at the University of Alberta, where he is now a Professor in Computing Science and the founding Scientific Director of the Alberta Machine Intelligence Institute. He has been Program/Conference Chair for various major conferences, and has served on the editorial boards of a number of other journals. He was elected a Fellow of the AAAI, has been awarded a McCalla Professorship and a Killam Annual Professorship; and in 2021, received the CAIAC Lifetime Achievement Award and became a CIFAR AI Chair. In 2022, the Telus World of Science museum honored him with a panel, and he received the (UofA) Precision Health Innovator Award, then in 2023, he received the CS-Can | Info-Can Lifetime Achievement Award. For his mentoring, he received a 2020 FGSR Great Supervisor Award, then in 2023, the Killam Award for Excellence in Mentoring. He has published over 300 refereed papers, most in the areas of machine learning and recently medical informatics, including 5 that have been awarded Best Paper prizes. The main foci of his current work are (1) bio- and medical- informatics; (2) survival prediction; and (3) formal foundations of learnability.

Special Monday Seminar

September 11
Experiential Learning with Partition-Tree Weighting and MAML
Anna Koop, University of Alberta & Google DeepMind, Montreal

YouTube link

Abstract:
Learning from experience means remaining adaptive and responsive to errors over time. However, gradient-based deep learning can fail dramatically in the continual, online setting. In this work we address this shortcoming by combining two meta-learning methods: the purely online Partition Tree Weighting (PTW) mixture-of-experts algorithm, and a novel variant of the Model-Agnostic Meta-Learning (MAML) initialization-learning procedure. We demonstrate our approach, Replay-MAML PTW, in a piecewise stationary classification task. In this continual, online setting, Replay-MAML PTW matches and even outperforms an augmented learner that is allowed to pre-train offline and is given explicit notification when the task changes. Replay-MAML PTW thus provides a base learner with the benefits of offline training, explicit task sampling, and boundary notification, all for a O(log2 (t)) increase in computation and memory. This makes deep learning more viable for fully online, task-agnostic continual learning, which is at the heart of general intelligence.

Presenter Bio:
Anna Koop recently started as a Research Engineer at Google DeepMind in Montreal, having just about completed her PhD on experiential learning for AI, supervised by Michael Bowling at the University of Alberta. She's been a teacher, programmer, consultant, manager and student in almost every permutation, including Managing Director of the Applied Machine Learning Team at Amii. She is fascinated by learning, can't stay away from research, and is passionately curious about nearly everything.

September 8
Toward a Concept-Based Theory of Lexical Semantics
Bradley Hauer, University of Alberta

YouTube link

Abstract:
In this presentation, I argue for a novel theory of lexical semantics. I begin by presenting a theoretical model for wordnets, a class of resources commonly used in lexical semantics. This is followed by a novel analysis of semantic tasks themselves, wherein I propose a first-of-its-kind taxonomy of semantic tasks. For both parts of the presentation, I demonstrate, via experimental evidence, that such theoretical developments are an important contribution to contemporary semantics research.

Presenter Bio:
Bradley Hauer is a PhD candidate at the University of Alberta. He has published more than 30 papers in refereed venues, earning the NAACL 2013 Best Student Paper Award, and a nomination for the IEEE ICSC 2019 Best Paper Award. His work covers a variety of subjects in natural language processing, including multilingual semantics, classical decipherment, and cognate identification. His present research is motivated by a strong interest in theoretical models and explainable methods.

September 1
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Montes Casper, MIT

YouTube link

Abstract:
Reinforcement Learning from Human Feedback (RLHF) has emerged as the central alignment technique used to finetune state-of-the-art AI systems such as GPT-4, Claude, Bard, and Llama-2. Given RLHF's status as the default industry alignment technique, there is a need to carefully study how we got here and what challenges persist in today's state of the art. We review open challenges and fundamental limitations with RLHF with a focus on applications in large language models. Technical progress in some respects is tractable, and this should be seen as a cause for concerted work and optimism. However, other problems with RLHF cannot fully be solved and instead must be avoided or compensated for with non-RLHF approaches.

Presenter Bio:
Stephen "Casp" Casper is a Ph.D student at MIT in Computer Science (EECS) in the Algorithmic Alignment Group advised by Dylan Hadfield-Menell. His research focuses include interpreting, diagnosing, debugging, and auditing deep learning systems.

August 25
Efficient Focus for Autonomous Agents
Bram Grooten, Eindhoven University of Technology (TU/e), Netherlands

YouTube link

Abstract:
The world provides an abundance of information, but often the majority of possible inputs to an RL agent are irrelevant or even distracting to its current task. We cover two approaches aimed at enhancing performance in challenging environments.

First, the Automatic Noise Filtering (ANF) method is introduced, which learns to focus its efficient sparse connectivity on task-relevant features, outperforming dense algorithms with significantly fewer weights. Second, we highlight MaDi, a method that masks distracting visuals in image-based reinforcement learning, improving focus and generalization. We will conclude with possible extensions of these works, providing an opportunity for discussion and potential collaborations.

Presenter Bio:
Bram Grooten is a PhD student at the TU Eindhoven in the Netherlands, supervised by Decebal Mocanu and Mykola Pechenizkiy. Currently he is visiting the IRL Lab of Matthew Taylor at the University of Alberta until December. His research focuses on Sparse Training, Reinforcement Learning, Generalization, and Autonomous Driving.

August 18
Adventures of AI Directors Early in the Development of Nightingale
Kristen Yu, University of Alberta

YouTube link

Abstract:
When players engage with parts of a video game they do not enjoy, one proposed solution is to adapt the game to the player’s preference. In industry, this adaptation is called “AI directors”, but these are uncommon, domain-specific, and rule-based. In this talk, I will present Nightingale, a reinforcement learning-based AI director developed for the industry game. My team evaluated the effectiveness of this AI director, but found inconclusive evidence. I will present the results and their implications for future AI directors.

Presenter Bio:
Kristen Yu is a PhD Candidate at the University of Alberta supervised by Nathan Sturtevant and co-supervised by Matthew Guzdial. Her research focuses on applying AI techniques to video game design to create a better player experience.

Special Tuesday Seminar

August 15
What Forms of Collusion Can Be Avoided in Sample-Efficient Machine Teaching?
Sandra Zilles, University of Regina & Amii

YouTube link

Abstract:
Machine teaching deals with the problem of encoding a hypothesis in a small number of labeled examples, called a teaching set. Thus a machine learner can infer the target hypothesis from a small amount of high-quality data, as opposed to a large amount of random data. However, the interaction between teacher and learner is constrained by formal rules, so as to prevent unfair collusion. One central goal is to establish that the same structural properties that allow for learning from a certain number of random examples also allow for (proportionally smaller) collusion-free teaching sets. This presentation introduces the first teaching model achieving this goal, using a new approach to modelling collusion-freeness. (Joint work with Farnam Mansouri, Hans Simon, Adish Singla.)

Presenter Bio:
Dr. Sandra Zilles is a Professor of Computer Science at the University of Regina, where she holds a Tier 1 Canada Research Chair in Computational Learning Theory. She is also a Canada CIFAR AI Chair at the Alberta Machine Intelligence Institute (Amii). Her research concerns mostly theoretical foundations of AI and machine learning, and has been recognized with multiple awards. In 2017, she was inducted into the College of New Scholars, Artists and Scientists of the Royal Society of Canada. Sandra has served on the editorial boards for the Journal of Computer and System Sciences and for the IEEE Transactions on Pattern Analysis and Machine Intelligence. On the Board of Directors for the Pacific Institute for the Mathematical Sciences (PIMS) and on the Board of Directors for Innovation Saskatchewan, she provided guidance on policies for research, innovation, and education in computer science and mathematics.

August 11
Grounding Concepts to Vision with Visual Descriptions
Mike Ogezi, University of Alberta

YouTube link

Abstract:
This talk presents the approach to grounding concepts to vision using text descriptions that focus on visual attributes. The research includes two studies. The first explores visual word sense disambiguation, a task that involves selecting the image that best represents the contextual meaning of a word. The second study focuses on producing visual descriptions for arbitrary, concrete concepts for use in downstream tasks such as zero-shot image classification and generation. Both studies show that conditioning a pre-trained large language model (LLM) with lexico-semantic knowledge empirically improves visual descriptions, thus confirming the utility of LLM-produced visual descriptions in grounding lexical concepts to the visual domain.

Presenter Bio:
Mike Ogezi is completing his MSc at the University of Alberta, focusing on the intersection of natural language processing and computer vision (supervisor: Dr Greg Kondrak). He is particularly interested in enhancing large language models with visual knowledge. Before his MSc, he was a software engineer working on optimal order matching for cryptocurrency trading. In his downtime, he enjoys playing board and video games.

August 4
AI and autonomy-based multitasking robots for Industry 4.0
Faheem Khan, CEO, ARO Robotics

Co-hosted by Technology Alberta

YouTube link

Abstract:
Industrial environments, such as manufacturing and warehousing, are quickly becoming complex. There are various manual and repetitive tasks which, if not automated, will drastically impact the company's operational efficiency. At Aro, we have developed a novel technology of AI-based multitasking robots. These robots are autonomous and, unlike competitors, can perform more than one task. They heavily rely on various aspects of AI to make real-time decisions.

Presenter Bio:
Dr. Faheem Khan has a PhD in Electrical Engineering and an MBA in strategy. He has been involved with technology start ups for more than ten years. Before starting Aro, he founded Fourien, which develops micro sensors and analytical instruments for the diagnostics industry. Currently he is the CEO at both Aro and Fourien.

July 28
Explaining Autonomous Driving Actions with Visual Question Answering
Shahin Atakishiyev, University of Alberta

YouTube link

Abstract:
The end-to-end learning ability of self-driving vehicles has achieved significant milestones over the last decade owing to rapid advances in deep learning and computer vision algorithms. However, as autonomous driving technology is a safety-critical application of artificial intelligence (AI), road accidents and established regulatory principles necessitate the need for the explainability of intelligent action choices for self-driving vehicles. To facilitate interpretability of decision-making in autonomous driving, we present a Visual Question Answering (VQA) framework, which explains self-driving actions with question-answering-based causal reasoning. To do so, we first collect driving videos in a simulation environment using reinforcement learning (RL) and extract consecutive frames from this log data uniformly for five selected action categories. Further, we manually annotate the extracted frames using question-answer pairs as justifications for the actions chosen in each scenario. Finally, we evaluate the correctness of the VQA-predicted answers for actions on unseen driving scenes. The empirical results suggest that the VQA mechanism can provide support to interpret real-time decisions of autonomous vehicles and help enhance overall driving safety.

Presenter Bio:
Shahin Atakishiyev received a BSc Computer Engineering degree from Qafqaz University, Azerbaijan, in June 2015 and an MSc Computer Engineering degree with a specialization in Software Engineering and Intelligent Systems from the University of Alberta, Canada, in January 2018. Currently, he is a PhD student in the Department of Computing Science at the University of Alberta under the supervision of Prof. Randy Goebel. Shahin's research interests include Safe, Ethical, and Explainable Artificial Intelligence and its applications to real-world problems.

July 21
RoboCLIP: Watching One Video to Learn Robot Policies
Sumedh A Sontakke, University of Southern California, Microsoft Research

YouTube link

Abstract:
Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations instead of using an extrinsic reward function but typically require a large number of in-domain expert demonstrations.

Inspired by advances in the field of Video-and-Language Models (VLMs), we present RoboCLIP, an online imitation learning method that uses a single demonstration (overcoming the large data requirement) in the form of a video demonstration or a textual description of the task to generate rewards without manual reward function design.

Additionally, RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains.

RoboCLIP utilizes pretrained VLMs without any finetuning for reward generation. Reinforcement learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher zero-shot performance than competing imitation learning methods on downstream robot manipulation tasks, doing so using only one video/text demonstration.

Presenter Bio:
Sumedh A Sontakke is a PhD student at the University of Southern California (working with Prof Laurent Itti and Prof Erdem Biyik) and a research intern at Microsoft Research working on robotics from human feedback. He works on building RL agents that can utilize human demonstrations for efficient policy learning. He also works on formulations of curiosity-based reward functions to enable the self-supervised discovery of complex behaviors. In the past, he has spent time at Google Deepmind, Bell Labs, Adobe Research and the Max Planck Institute for Intelligent Systems.

July 14
Confidently Incorrect - and other challenges in experiment design for RL
Andrew Patterson, University of Alberta

Abstract:
This talk laments some challenges and pitfalls in the design of reinforcement learning experiments. We will draw parallels between animal learning research and RL and reflect on how animal learning research can influence our own empirical strategies. We will dive into the difference between an agent and an algorithm, and discuss how this pedantic nuance changes how we might analyze our data. Finally, we will explore a common empirical methodology where increasing the number of seeds makes us increasingly confidently incorrect.

Presenter Bio:
Andy Patterson is a PhD student at the University of Alberta working with Martha White on reliable reinforcement learning methods. His focus on reliability has led to studying robust statistics for RL, using gradient-TD strategies in deep RL learning systems, and most recently revisiting commonplace empirical methodologies for evaluating RL methods.

July 7
Combining Cameras, Lidar and Artificial Intelligence Software to Mitigate Operational Risks while Improving Efficiency on Industrial Worksites
Siamak Akhlahgi, CEO of Correct-AI

Co-hosted by Technology Alberta

Abstract:
Correct-AI is a rapidly growing industrial robotics and artificial intelligence company which develops products that support the safe and efficient operation of heavy-duty equipment. I will talk about Correct-AI's PROX-EYETM Vision Guidance System which combines cameras, Lidar and proprietary artificial intelligence software into a tool for managing industrial vehicles in challenging worksite situations. Our technology improves operator visibility, eliminates false alarms with intelligent threat detection, eliminates blind spots, and reduces insurance risk with onboard storage and analysis of equipment activity.

Presenter Bio:
Dr. Siamak Akhlaghi, with a Ph.D. in Materials Engineering, has more than 25 years of experience in microelectronics, navigation, computer vision, and artificial intelligence including developing autonomous vehicles for security applications.

June 30
Solving Complex Reasoning Problems with Adaptive Subgoal Search
Michał Zawalski and Michał Tyrolski, University of Warsaw, Poland

YouTube link coming soon!

Abstract:
Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we present Adaptive Subgoal Search, a trainable search method that plans using high-level steps, adjusted to the local complexity of the environment. Adaptive Subgoal Search benefits from the efficiency of planning with longer subgoals and the fine control with the shorter ones, and thus scales well to difficult planning problems.

Presenter Bio:
Michał Zawalski is a PhD student at the University of Warsaw. His primary research interests revolve around reinforcement learning, planning algorithms, goal-conditioned problems, and multi-agent RL. He's especially interested in finding simple yet effective algorithms.

Michal Tyrolski is currently a student researcher at the University of Warsaw and a Machine Learning Researcher at DeepFlare. He holds a degree from the same university and is currently engaged in work on planning in reinforcement learning in Piotr Miłos' research lab. Michal's areas of interest encompass both meta and model-based reinforcement learning. In the past, he has worked at Nvidia, Microsoft, and Samsung. Since 2020 he has been one of the organizers of the ML in PL Conference, the biggest research-oriented machine learning conference in Poland.

Special Monday Seminar - June 26
A Model-Based Reinforcement Learning Wishlist
Erin Talvitie, Harvey Mudd College, Claremont, California

Abstract:
Model-based reinforcement learning (MBRL), where an agent learns to make predictions about its environment and then uses those predictions to make decisions, has the potential to dramatically improve the sample complexity of reinforcement learning agents. In practice, MBRL agents tend to perform poorly. This talk will brainstorm, at a high level, what model properties and capabilities would be desirable or even necessary in order for a reinforcement learning agent to leverage a learned model to its fullest potential.

Presenter Bio:
Erin J. Talvitie is an associate professor of Computer Science at Harvey Mudd College. She graduated from Oberlin College in 2004 with majors in Computer Science and Mathematics and received her Ph.D. from the University of Michigan in 2010. She was a founding member of the Department of Computer Science at Franklin & Marshall College before moving on to Harvey Mudd College in 2019. Her research interests focus on reinforcement learning, specifically with the aim of understanding how autonomous agents can learn to act flexibly and competently in complicated, unknown environments. Her NSF CAREER grant "Using Imperfect Predictions to Make Good Decisions" has funded recent work studying model-based reinforcement learning in the case where the agent's model class is insufficient to capture the true environment dynamics.

June 23
An Exploration of Dialog Act Classification in Open-domain Conversational Agents and the Applicability of Text Data Augmentation
Maliha Sultana, University of Alberta

YouTube link

Abstract:
Recognizing dialog acts of users is an essential component in building successful conversational agents. In this work, we propose a dialog act (DA) classifier for two of our open domain conversational agents. For this, we curated a high-quality, multi-domain dataset with ∼24k user utterances labelled into 8 suitable DAs. Our fine-tuned BERT-based model outperforms the baseline SVM classifier by achieving state-of-the-art accuracy on the proposed dataset. Moreover, it generalizes well on unseen data. To address the issue of data scarcity when training DA classifiers, we implemented different data augmentation techniques and compared their performance. Our extensive experiments show that, in a simulated low data regime with only 10 examples per label, methods as simple as synonym replacement can double the size of the existing training data and boost accuracy of our DA classifier by ∼8%. Lastly, we demonstrate how our proposed classifier and augmentation techniques can be adapted to effectively detect dialog acts in languages other than English.

Presenter Bio:
Maliha is a recent Masters graduate from the University of Alberta in the field of Natural Language Processing. She has extensively worked with Large Language Models for detecting user dialog acts in chatbots. Moreover, to tackle the scarcity of labelled training data, she has successfully looked into data augmentation techniques that artificially generate new data to improve model performance. When she is not busy scrolling through LinkedIn looking for a new full-time position as an ML Developer, you can find her singing at a karaoke bar or playing at a board games café. Currently, she is building data pipelines and doing prompt engineering as an Associate ML Developer at AltaML.

June 16
Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay
Hongming Zhang, University of Alberta

YouTube link

Abstract:
Experience replay, which stores transitions in a replay memory for repeated use, plays an important role of improving sample efficiency in reinforcement learning. Existing techniques such as reweighted sampling, episodic learning and reverse sweep update process the information in the replay memory to make experience replay more efficient. In this work, we further exploit the information in the replay memory by treating it as an empirical Replay Memory MDP (RM-MDP). By solving it with dynamic programming, we learn a conservative value estimate that only considers transitions observed in the replay memory. Both value and policy regularizers based on this conservative estimate are developed and integrated with model-free learning algorithms. We design the memory density metric to measure the quality of RM-MDP. Our empirical studies quantitatively find a strong correlation between performance improvement and memory density. Our method combines Conservative Estimation with Experience Replay (CEER), improving sample efficiency by a large margin, especially when the memory density is high. Even when the memory density is low, such a conservative estimate can still help to avoid suicidal actions and thereby improve performance.

Presenter Bio:
Hongming Zhang is a Ph.D. student at the University of Alberta working with Martin Müller. He is interested in deep reinforcement learning (DRL) and planning/search techniques. Currently he is particularly focused on enhancing the sample efficiency and learning performance of DRL algorithms through the integration of planning and search.

June 9
Towards Managing Temporal Resolution in Continuous-Time Value Estimation
Vincent (Zichen) Zhang, University of Alberta

YouTube link

Abstract:
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency.

In this talk, I will discuss our work that provides a key initial step to bridge this gap. We analyze Monte-Carlo policy evaluation for Linear Quadratic Regulator (LQR) systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently to time discretization, leading to an optimal choice of temporal resolution for a given data budget. These findings show that managing the temporal resolution can provably improve policy evaluation efficiency in LQR systems with finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and standard RL benchmarks for non-linear continuous control.

Presenter Bio:
Zichen (Vincent) Zhang is a Ph.D. student in Computing Science at University of Alberta, working with Prof. Dale Schuurmans and Prof. Martin Jagersand. He is interested in machine learning and its applications on robotics perception and control. Currently he’s working toward developing efficient algorithms for continuous control.

June 2
NO SEMINAR (no speaker)

May 26
NO SEMINAR (no speaker)

May 19
A Simple Decentralized Cross-Entropy Method
Vincent (Zichen) Zhang, University of Alberta

Amii Presents: AI Seminar 2023 YouTube link

Abstract:
In this talk, I will present a simple extension to the Cross-Entropy Method (CEM), a gradient-free optimization method frequently used for planning in model-based reinforcement learning (MBRL).

The classical CEM employs a centralized approach to update the sampling distribution based on a global top-k operation’s results on samples. However, we demonstrate that this approach can make CEM prone to local optima, thus impairing its sample efficiency. To address this issue, we propose Decentralized CEM (DecentCEM), a simple yet effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We show in an optimization task that our DecentCEM finds the global optimum more consistently. than CEM that uses either a single or even a mixture of Gaussian distributions. Notably, this improvement does not compromise CEM’s convergence guarantee. When applied to MBRL planning problems in continuous control environments, DecentCEM shows an improved sample efficiency, with only a reasonable increase in computational cost.

For those interested in exploring our work further, please check out our paper at: https://arxiv.org/abs/2212.08235

And the code is available at https://github.com/vincentzhang/decentCEM

Currently he’s working toward developing efficient algorithms for continuous control.

May 12
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley, University of Alberta

YouTube link

Abstract:
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios to combat the variance of the IS estimator. Unfortunately, once a trace has been fully cut, the effect cannot be reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain.

In this talk, I propose a unifying framework for per-decision and trajectory-aware methods, and establish the first general convergence conditions for trajectory awareness in the tabular setting. I also introduce a new algorithm called Recency-Bounded Importance Sampling (RBIS), which leverages trajectory awareness to perform robustly across hyperparameters in several off-policy control tasks.

Presenter Bio:
Brett Daley is a Ph.D. student at the University of Alberta working with Marlos C. Machado and Martha White. His research interest is temporal credit assignment for off-policy prediction and control with function approximation. Brett previously earned his M.Sc. in Computer Science from Northeastern University, M.Sc. in Management and Global Affairs from Tsinghua University through the Schwarzman Scholars program, and B.Sc. in Electrical and Computer Engineering from Northeastern University.

May 5
NO SEMINAR (no speaker)

April 28, 2023
Consistent Emphatic Temporal-Difference Learning

Jiamin He, University of Alberta

https://www.youtube.com/watch?v=DwQWbDvrRAQ&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=10

Abstract:
Off-policy policy evaluation has been a critical and challenging problem in reinforcement learning, and Temporal-Difference (TD) learning is one of the most important approaches for addressing it. There has been significant interest in searching for consistent off-policy TD algorithms that are guaranteed to find the on-policy TD fixed point. Notably, Full Importance-Sampling TD is the only existing consistent off-policy TD method under general linear function approximation but, unfortunately, has a high variance and is scarcely practical. This notorious high variance issue motivates the introduction of Emphatic TD, which tames down the variance but has a biased fixed point. Inspired by these two methods, we propose a new consistent algorithm called Average Emphatic TD (AETD) with a transient bias, which strikes a balance between bias and variance. Further, we unify AETD with several existing algorithms and obtain a new family of consistent algorithms called Consistent Emphatic TD (CETD), which can control a smooth bias-variance trade-off by varying the speed at which the transient bias fades. Through theoretical analysis and experiments on a didactic example, we validate the consistency of CETD. Moreover, we show that CETD converges faster to the lowest error in a complex task with a high variance.

Presenter Bio:
Jiamin is an M.Sc. student at the University of Alberta working with Rupam Mahmood. His research interests lie in artificial intelligence, especially reinforcement learning. Currently, he is focusing on off-policy policy evaluation.

April 14, 2023
IoT + AI enabled predictive maintenance & operational excellence for asset intensive industries
Sunil Vedula, Nanoprecise Sci Corp

*co-hosted w/ Technology Alberta*

https://www.youtube.com/watch?v=48Q-CmRlSJM&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=11

Abstract:
At Nanoprecise, we are on a mission to save 2-3% of global carbon emissions by providing right maintenance insight at the right time with fully automated IoT & AI powered condition monitoring of motor driven equipment such as pumps, compressors etc. In this session, we will talk about how we are achieving the same for the asset intensive industries such as Oil & Gas, Manufacturing, Pharma, Cement, Mining etc. We will talk about how Nanoprecise's AI approach, which is physics + AI enabled is giving us an edge over traditional black box AI models such as neural network applied on random data.

Presenter Bio:
Hailing from a family of engineers & scientists, Sunil is a first-generation entrepreneur. A mechanical engineer by background, he has industrial experience of the causes of failure of machinery and its impact on the manufacturer’s bottom-line and on the energy efficiency and climate change. With a combination of mech. Engineering + MBA degree from U of A, and passionate about predicting failures & energy efficiency, he set out to solve the issue of unplanned downtime and energy efficiency for 300 million electric motor driven machinery, that consume as much as 50% of electricity consumption in the world.

Starting in April 2017, with almost negligible money, he bootstrapped till Dec 2017 and did some quick pilots to gain some interest from the end users. After raising his first $1 million from Brian Craig (a Canadian serial entrepreneur) , under his leadership Nanoprecise grew from a mere $50,000 CAD ARR in 2018 to $5 million CAD in April 2023, a CAGR of 250% per annum and hope to cross $100 million in revenue by 2026 - 2027.

Led by him, in Jan 2023, Nanoprecise has raised over $13 million in growth capital from strategic investors such as Export Development Canada (affiliated to Govt. of Canada), Honeywell Technologies, NSK Bearings, Adventure Capital and EC M&A and secured a $5 million line of credit from banks for securing inventory of components to avoid issues with electronic chip shortage.

Under his leadership, Nanoprecise got 2 critical patents and 2 trademarks awarded and 1 patent and 1 trademark pending.

March 31, 2023
Living in Procedural Worlds: Creature Movement and Spawning in ‘Nightingale’
Nathan Sturtevant, University of Alberta
Arta Seify, Nightingale

https://www.youtube.com/watch?v=Dy6K6sM3teE&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=8

Abstract:
This session explores the unique AI challenges faced during the development of Inflexion Studio’s debut title, Nightingale, a co-op survival-crafting experience in which players venture through portals where adventure and mystery await across a myriad of beautiful and increasingly-dangerous procedurally generated realms.

Arta Seify and Nathan Sturtevant will describe the range of approaches crafted during development to solve challenges in pathfinding and populating the realms with creatures. Learn about the systems devised to support efficient pathfinding across large realms with a changing path network, creature terrain-type preferences, pathfinding for creatures of vastly different sizes, and procedurally filling the realms with wildlife such that they “feel alive”.

Presenter Bio:
Arta Seify has been working as an AI developer on Nightingale since 2021. Previously, he worked as an engine programmer on Minecraft: Bedrock, and before that, was a graduate student at the University of Alberta.

Nathan Sturtevant is a professor at the University of Alberta working on research in AI and games, is the director of the Alberta Machine Intelligence Institute, and a consultant with Inflexion Games. He previously wrote the pathfinding engine in Dragon Age: Origins, in addition to his own game-related research and development projects.

March 24, 2023
Memory-efficient Reinforcement Learning with Knowledge Consolidation
Qingfeng Lan, University of Alberta

https://www.youtube.com/watch?v=c3fS5xUkeMs&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=9

Abstract:
Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.

Presenter Bio:
Qingfeng Lan is a PhD student at the University of Alberta, supervised by Rupam Mahmood. He is interested in designing simple and efficient algorithms supported by sound theories and verified by rigorous experiments. In particular, his research focuses on continual reinforcement learning.

March 17, 2023
f-Divergence Minimization for Sequential Knowledge Distillation
Yuqiao Wen, University of Alberta

https://www.youtube.com/watch?v=5lyY2eBhcZA&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=5

Abstract:
Knowledge distillation (KD) is the process of transferring knowledge from a large model to a small one. It has gained increasing attention in the natural language processing community, driven by the demands of compressing ever-growing language models. In this work, we propose an f-distill framework, which formulates sequential knowledge distillation as minimizing a generalized f-divergence function. We propose four distilling variants under our framework and show that existing SeqKD and ENGINE approaches are approximations of our f-distill methods. We further derive step-wise decomposition for our f-distill, reducing intractable sequence-level divergence to word-level losses that can be computed in a tractable manner. Experiments across four datasets show that our methods outperform existing KD approaches, and that our symmetric distilling losses can better force the student to learn from the teacher distribution.

Presenter Bio:
Yuqiao is a first-year Ph.D. student at the Department of Computing Science, University of Alberta, working with Dr. Lili Mou. In his Master's program, he explored dialogue systems for diverse response generation. He focused on the one-to-many mapping phenomenon in the dialogue task, where a dialogue context may have many plausible responses. Yuqiao addressed the one-to-many phenomenon using mixture models and proposed a novel variant of the expectation-maximization algorithm for latent variable training. Currently, Yuqiao is interested in fundamental issues in natural language generation, such as the one-to-many phenomenon, label bias, and exposure bias. Yuqiao has papers published at LREC'22 and ICLR'23.

March 10, 2023
The big world hypothesis and its ramifications on reinforcement learning
Khurram Javed, University of Alberta

https://www.youtube.com/watch?v=Fwkcc9tupCI&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=6

Abstract:
The big world hypothesis is that for many problems, the world is multiple orders of magnitude larger than the agent, and the agent cannot represent the optimal value function and policy even in the limit of infinite data. In this talk, I will argue why the big world hypothesis is reasonable for many real-world problems. I will show---using existing and my own work---that algorithms that work well in the over-parameterized setting can fail when the big world hypothesis holds. I will then share an experimental protocol to benchmark algorithms under the big world conditions. Finally, I will talk about promising solution methods for RL in big worlds. The main conclusion is that computationally cheap algorithms that learn continually are promising solution methods when the big world hypothesis is true.

Presenter Bio:
Khurram Javed is a P.hD. student working with Prof. Rich Sutton. His main research interest is to develop robust and general algorithms that can enable agents to learn to achieve goals by interacting with the world. Currently, he is working on developing computationally efficient algorithms for learning continually under strong function approximation and partial observability. In the past, he explored gradient-based meta-learning for continual learning with Prof. Martha While, and worked on learning causal models with Prof. Yoshua Bengio.

March 3, 2023
Effective Real-time Reinforcement Learning for Vision-Based Robotic Tasks
Yan Wang, University of Alberta

https://www.youtube.com/watch?v=x65q1BrYwmY&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=7

Abstract:
Recently we have seen many successful applications of Reinforcement Learning (RL).
It is natural to extend the scope of RL to vision-based real-time learning of robotic control tasks.
However, a vision-based real-time robotic RL agent faces some practical issues oft-ignored in conventional RL research.
The first issue we investigated is that robots deployed in the real world are usually tethered to a resource-limited computer, while vision-based RL algorithms are expensive.
The second issue we investigated is how to design an effective reward function that is independent of domain knowledge.

Presenter Bio:
Yan Wang is a Master's student in the Department of Computing Science at the University of Alberta, supervised by Rupam Mahmood.

He is interested in applying RL methods to practical questions, especially robotic control tasks.

He likes playing video games, board games, and reading stories about UFOs.

February 24, 2023
Prototyping Properties of Specific Curiosity in a Reinforcement Learner
Nadia Ady, University of Alberta

YouTube Video Coming Soon!

Abstract:
Curiosity appears to motivate and guide effective learning in humans, so the machine-learning community has high hopes for machine analogues of curiosity. Studying human and animal curiosity has unearthed several properties that would offer important benefits for machine learners but have not yet been well-explored in machine intelligence. In this talk, I'm going to discuss three of these properties—directedness towards inostensible referents, cessation when satisfied, and voluntary exposure—and show how they may be implemented together in a proof-of-concept reinforcement learning agent. This work presents a novel view into how specific curiosity operates and in the future might be integrated into the behaviour of goal-seeking, decision-making machine agents in complex environments.

Note: This seminar is designed for the public and general AI / ML practitioners, not reinforcement learning experts. If you will be bored listening to an intuition-based primer on basic reinforcement learning concepts alongside the results, feel free to skip this talk and arrange a one-on-one chat with me.

Presenter Bio:
Nadia is a Ph.D. candidate in the Department of Computing Science at the University of Alberta supervised by Patrick Pilarski. Her research overall focuses on developing machine learning systems that are recognizably curious. When Nadia isn't learning about machines' and animals' motivation to learn, she enjoys a good murder mystery or nearly any story that piques her curiosity.

February 17, 2023
Representation Alignment in Neural Networks
Ehsan Imani, University of Alberta

https://www.youtube.com/watch?v=7BExMjBM8ts&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=4

Abstract:
It is now a standard for neural network representations to be trained on large, publicly available datasets, and used for new problems. The reasons for why neural network representations have been so successful for transfer, however, are still not fully understood. We show that, after training, neural network representations align their top singular vectors to the targets. We investigate this representation alignment phenomenon in a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment. We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks.

Presenter Bio:
Ehsan Imani is a PhD student at the University of Alberta. He received an MSc degree in computer science from the University of Alberta in 2019. Afterwards, he worked as a Research Assistant at the University of Alberta and later at Huawei Noah's Ark Lab. His research interests are Representation Learning and Reinforcement Learning.

February 10, 2023
Games for Crowdsourcing and Crowdsourcing for Games
Seth Cooper, Northeasten University

YouTube Video Coming Soon!

Abstract:
Where is there potential for crowdsourcing and video games to benefit each other? In this talk, I will discuss examples of how techniques from games might be applied to crowdsourcing and how crowdsourcing might be applied to games. Approaches to dynamic difficulty adjustment from games can be used in crowdsourcing to improve task assignment to members of the crowd. Crowdsourced player recruitment can be used to enable rapid design iteration by streamlining playtesting for games. Finally, we can close the loop and use crowdsourced playtesting to improve crowdsourcing games themselves.

Presenter Bio:
Seth Cooper is an associate professor at the Khoury College of Computer Sciences. His work combines scientific discovery games (particularly in computational structural biochemistry), serious games, and crowdsourcing games.

February 3, 2023
Reaching Scale: Challenges, Solutions, and Opportunities
Wayne Malkin, Drivewyze

*co-hosted w/ Technology Alberta*

https://www.youtube.com/watch?v=shsevcJ3zPI&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=1

Abstract:
In ten years, Drivewyze has grown from startup to managing the largest software deployment in trucking history. This presentation will introduce Drivewyze, business and technical challenges unique to the industry, solutions, and scaling experiences. Finally some opportunities in AI/ML will be introduced.

Presenter Bio:
Mr. Malkin is a technical leader with more than 30 years of experience building teams and delivering top quality software projects, including past experience at IBM taking solutions to scale (globalization, enterprise scalability, and security). Mr. Malkin is the chief designer and architect for Drivewyze including back‐office services and client applications. He has a dozen US software patents pending or awarded.

January 27, 2023
Search-based Decoding Methods to Control Perceived Emotion in Neural Symbolic Music Generation
Lucas N. Ferreira, University of Alberta

YouTube Video Coming Soon!

Abstract:
Neural language models are currently the leading generative models for algorithmic music composition. However, a major problem with these models is the lack of control for specific musical features on the decoded pieces. For example, one cannot directly control a language model trained on classical piano pieces to compose a tense piece for a scene in a thriller movie. It is hard to control these models because they typically have a large number of parameters, and it is not clear what parameters affect what musical features. Controlling the perceived emotion of generated music is a central problem in affective music composition, with applications in films, games, literature, and music therapy. In this talk, I will present my latest research on search-based decoding methods guided by an emotion classifier to solve this problem. I will also discuss our results of applying these methods to generate music for tabletop role-playing games.

Presenter Bio:
Lucas N. Ferreira is currently a Postdoctoral Fellow at the University of Alberta, working with Prof. Levi Lelis on controlling generative models to compose music with a given emotion. He received a Ph.D. in Computer Science from the University of California, Santa Cruz, where he started investigating this problem. Lucas is broadly interested in the intersections between Artificial Intelligence and Creativity, specifically in algorithmic music composition and procedural content generation for games.

January 20, 2023
Maintaining Plasticity in Deep Continual Learning
Shibhansh Dohare, University of Alberta

https://www.youtube.com/watch?v=oA_XLqh4Das&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=3

Abstract:
Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail catastrophically to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to adapt to new data, a phenomenon called \textit{loss of plasticity}. We show loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% correct on an early task to 77%, or to about the level of a linear network, on the 2000th task. Such loss of plasticity occurred with a wide range of deep network architectures, optimizers, and activation functions, and was not eased by batch normalization or dropout. In our experiments, loss of plasticity was correlated with the proliferation of dead units, units with very large weights, and more generally with a loss of unit diversity. Loss of plasticity was substantially eased by L2-regularization, particularly when combined with weight perturbation (Shrink and Perturb). We show that plasticity can be fully maintained by a new algorithm---called \emph{continual backpropagation}---which is just like conventional backpropagation except that a small fraction of less-used units are re-initialized after each example. This continual injection of diversity appears to maintain plasticity indefinitely in deep networks.

Presenter Bio:
Shibhansh is a Ph.D. student at the University of Alberta, working with Dr. Richard Sutton and Dr. Rupam Mahmood. He received his B. Tech. in Computer Science and Engineering from the Indian Institute of Technology Kanpur. He aims to understand the computational principles that give rise to intelligence. To this end, his research focuses on various aspects of Continual learning, Deep learning, and Reinforcement learning.

January 13, 2023
Deep Reinforcement Learning for Multi-Agent Interaction
Stefano Albrecht, University of Edinburgh

https://www.youtube.com/watch?v=r0MVzVUJWJQ&list=PLKlhhkvvU8-aKj4cWVY4SsOiz4dfKmiUl&index=2

Abstract:
Our group specialises in developing machine learning algorithms for autonomous systems control, with a particular focus on deep reinforcement learning and multi-agent reinforcement learning. We have a focus on problems of optimal decision making, prediction, and coordination in multi-agent systems. Questions we tackle include: How can a single agent learn to collaborate effectively in a team in which other agents may have diverse types and may enter/leave at any time? How can multiple autonomous agents learn to solve a given task in a scalable and robust way? In this talk, I will give an overview of our research agenda along with some recent published papers addressing above questions. I will also present some of my work done at UK-based self-driving company Five AI (recently acquired by Bosch) on robust and interpretable motion planning and prediction for autonomous driving.

Presenter Bio:
Dr. Stefano V. Albrecht is Assistant Professor in Artificial Intelligence in the School of Informatics, University of Edinburgh. He leads the Autonomous Agents Research Group (https://agents.inf.ed.ac.uk) which currently consists of 15 members that conduct research into developing machine learning algorithms for autonomous systems control. Dr. Albrecht is a Royal Society Industry Fellow working with a team at UK-based company Five AI (https://www.five.ai) to develop AI technologies for autonomous driving. He is also a Royal Academy of Engineering Industrial Fellow working with multinational company Dematic/KION to bring RL to multi-robot warehouse management. His research has been published in leading AI/ML/robotics conferences and journals, including NeurIPS, ICML, IJCAI, AAAI, UAI, AAMAS, AIJ, JAIR, ICRA, IROS, T-RO. Previously, Dr. Albrecht was a postdoctoral fellow at the University of Texas at Austin working with Prof. Peter Stone. He obtained PhD and MSc degrees in Artificial Intelligence from the University of Edinburgh and a BSc degree in Computer Science from Technical University of Darmstadt.