Friday, November 29, 2019

Alain Dagher (McGill)

Recording: https://bluejeans.com/s/r0qiI/




Looking for Computational signals in the brain

Is the brain a computer?

Over 20 years ago Dayan and Montague noticed that electrical signals recorded from dopamine neurons by Schultz looked like a computational signal from their machine learning algorithms. Thus was born one of the most fruitful theories in cognitive neuroscience: that dopamine encodes a reward prediction error used for reinforcement learning.

Inspired by this, our goal is to use computational frameworks to better design and understand the results of functional MRI experiments. By doing this, one can relate brain activity to information processing in the brain. Eventually this will provide a way to understand neurological and psychiatric illness at a computational rather than a descriptive anatomical level.

I will give examples of fMRI experiments where we detect: prediction error and value signals in basal ganglia and prefrontal cortex; evidence accumulation signals in the visual system; self-control signals in lateral prefrontal cortex. I will attempt to relate these to brain disease.


Friday, November 22, 2019

Zakia Hammal (CMU)

Not recorded.


Behavioral AI: Computational Modeling of Human Behavior

Nonverbal behavior is multimodal and interpersonal. In several projects, I addressed the dynamics of facial expression, head, and body movement for emotion communication, social interaction, and clinical applications. Leveraging recent advances in AI and machine learning, my work focuses on developing computational models to automatically analyze, recognize, and interpret multimodal social and communicative behavior. By modeling multimodal and interpersonal communication my work seeks to inform affective computing, social interaction, and behavioral health informatics. In this talk, I will share some of our recent work that has addressed computational methods for the assessment of treatment outcomes in psychiatric disorder, pain intensity measurement, and the automatic assessment of facial nerve dysfunction in children with facial abnormalities.


Friday, November 15, 2019

Xavier Bouthillier (Mila)

Recording: https://bluejeans.com/s/DlKN6/


Unreproducible Research is Reproducible

The apparent contradiction in the title is a wordplay on the different meanings attributed to the word reproducible across different scientific fields. What the title is implicitly stating is that unreproducible findings can be built upon reproducible methods. Instead of reducing reproducibility to a single concept, representing it as a spectrum enables a categorisation into sub-concepts (method, results, findings) that helps better understand the role of each parts, therefore enabling better evaluation of our current practices and how they could be modified to improve reproducibility. As a matter of fact, this categorization makes explicit the limitations of current solutions proposed in major machine learning conferences to address reproducibility. In this talk, I will describe the spectrum of reproducibitily and outline related issues that have been recently exposed in machine learning litterature. I will then explain why current solutions are insufficient and present how we could improve on them, using as an example statistical tests to provide reproducibility guarantees for benchmarks.


Friday, November 8, 2019

Danny Tarlow (Google Brain Montreal)

Recording: https://bluejeans.com/s/2oxAz/


Learning to Fix Programming Errors with Graph2Diff Neural Networks

Deep learning has made great advances in the last several years, and it excels in situations where we have big models and lots of data. Professional programmers generate lots of data in the course of their day-to-day work. Can we train big deep learning models on this data and create tools that are useful to professional developers?

I'll talk about our recent efforts in this direction, focusing on the problem of learning to repair build errors encountered by Google software engineers. We represent source code, build configuration files, and compiler diagnostic messages as a graph, and then use a Graph Neural Network model to predict a diff. The model is an instance of a more general abstraction that we call Graph2Tocopo, which we argue is superior to Sequence2Sequence for problems that involve predicting how to change source code. We evaluate the model on a dataset of over 500k real build errors and their resolutions from professional developers. Compared to a recently published Sequence2Sequence-based baseline, we achieve over double the accuracy while tackling a more difficult task.


Friday, November 1, 2019

Anirudh Goyal (Mila)

Recording: https://bluejeans.com/s/D_iHl/


Recurrent Independent Mechanisms

Physical Processes in the world often have modular structure. Despite this, most machine learning models employ the opposite inductive bias, i.e., that all processes interact. This can lead to poor generalization (if data is limited) and lack of robustness to changing task distributions. In this talk, I'll talk about how we can learn recurrent mechanisms which operate by default (i.e independently) and sparingly interact which can lead to better generalization to our of distribution samples.


Friday, October 25, 2019

Cătălina Cangea (University of Cambridge)

Recording:

https://bluejeans.com/s/uBl3L


Question Answering in Realistic Visual Environments: Challenges and Approaches

The Embodied Question Answering (EQA) and Interactive Question Answering (IQA) tasks were recently introduced as a means to study the capabilities of agents in rich, realistic 3D environments, requiring both navigation and reasoning to achieve success. Each of these skills typically needs a different approach, which should nevertheless be smoothly integrated with the rest of the system leveraged by the agent. However, initial approaches either suffer from potentially weaker performance than when using a language-only model or are preceded by additional hand-engineered steps. This talk will provide an overview of the existing work on this thread and describe in more detail our recent study published at BMVC 2019 and accepted at ViGIL 2019, VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering. Here, we investigate the feasibility of EQA-type tasks by building a novel benchmark, which contains pairs of questions and videos generated in the House3D environment. While removing the navigation and action selection requirements from EQA, we increase the difficulty of the visual reasoning component via a much larger question space, tackling the sort of complex reasoning questions that make QA tasks challenging. By designing and evaluating several VQA-style models on the dataset, we establish a novel way of evaluating EQA feasibility given existing methods, while highlighting the difficulty of the problem even in the most ideal setting.


Friday, October 18, 2019

Joseph Jay Williams (University of Toronto)

Recording:

https://bluejeans.com/s/jp2wn/


Combining Reinforcement Learning & Human Computation for A/B Experimentation: Perpetually Enhancing and Personalizing User Interfaces

How can we transform the everyday technology people use into intelligent, self-improving systems? I consider how to dynamically enhance user interfaces by using randomized A/B experiments to integrate Active Learning algorithms with Human Computation. Multiple components of a user interface (e.g. explanations, messages) can be crowdsourced from users, and then compared in real-world A/B experiments, bringing human intelligence into the loop of system improvement. Active Learning algorithms (e.g. multi-armed bandits) can then analyze data from A/B experiments in order to dynamically provide more effective A or B conditions to future users. Active Learning can also lead to personalization, by facing the more substantive exploration-exploitation tradeoff of discovering whether some conditions work better for certain subgroups of user profiles (in addition to discovering what works well on average). I present an example system, which crowdsourced explanations for how to solve math problems from students and teachers, simultaneously conducting an A/B experiment to identify which explanations other students rated as being helpful. Modeling this as a multi-armed bandit where the arms were constantly increasing (every time a new explanation was crowdsourced) we used Thompson Sampling to do real-time analysis data from the experiment, providing higher rated explanations to future students (LAS 2016, CHI 2018). This generated explanations that helped learning as much as those of a real instructor. Future work aims to discover how to personalize explanations in real-time, by discovering which conditions work for different subgroups of user profiles (such as whether simple vs complex explanations are better for students with different levels of prior knowledge or verbal fluency). Future collaborative work with statistics and machine learning researchers provides a testbed for a wide range of active learning algorithms to do real-time adaptation of A/B experiments, and integrate with different crowdsourcing workflows. Dynamic A/B experiments can be used to enhance and personalize a broad range of user-facing systems. Examples include modifying websites, tailoring email campaigns, enhancing lessons in online courses, getting people to exercise by personalizing motivational messages in mobile apps, and discovering which interventions reduce stress and improve mental health. www.josephjaywilliams.com has links to slides and a recording of a related talk.


Friday, October 4, 2019

Blake Richards (Mila)

Recording:

https://bluejeans.com/s/T7_aD/


Spike-based causal inference for weight alignment

In today's artificial neural networks, the weights used for processing stimuli are also used during backwards passes to calculate gradients. In current proposals for how the real brain could approximate gradients, these two processes are separated where one set of synaptic weights is used for processing and another set is used for backward passes. This produces the so-called "weight transport problem" where the backward weights used to calculate gradients need to mirror the forward weights used to process stimuli. This weight transport problem has been considered so hard that popular proposals for biological learning assume that the backward weights are simply random, as in the feedback alignment algorithm. However, such random weights do not appear to work well for large networks. Here we show how the discontinuity introduced in a spiking system can lead to a solution to this problem. The resulting algorithm is a special case of an estimator used for causal inference in econometrics, regression discontinuity design. We show empirically that this algorithm rapidly makes the backward weights approximate the forward weights. As the backward weights become correct, this improves learning performance over feedback alignment on tasks such as Fashion-MNIST and CIFAR-10. Our results demonstrate that a simple learning rule in a spiking network can allow neurons to produce the right backward connections and thus solve the weight transport problem.


Friday, September 27, 2019

Tegan Maharaj, Victor Schmidt & Sasha Lucioni (Mila)

Recording:

https://bluejeans.com/s/4uxo2/


How AI Can Save the World, or AI and the Climate Crisis

The team working on the VICC (Visualizing the Impacts of Climate Change) project is multidisciplinary, with researchers from fields such as climate science, behavioral psychology and machine learning working together to raise awareness and conceptual understanding of climate change. Led by Yoshua Bengio, they are developing an interactive website to depict future impacts of climate change that are both accurate and personalized . Initial results of their project can be found on their website.

Note: No audio from 04:12-13:20


Friday, September 20, 2019

Gaël Varoquaux (Inria + McGill)

Recording:

https://bluejeans.com/s/HCBml/


DirtyData: statistical learning on non-curated databases

Dirty data" is reported as the worst roadblock to data science in practice, requiring human curation before statistical analysis [1]. One challenge is that in many data-science applications, for instance in healthcare or business, the data are not measurements that naturally have a homogeneous structure, but rather heterogeneous entries and columns of different natural. The analysts must invest significant manual effort to cast the data in a representation amenable to statistical learning. Our goal in the DirtyData research axis is to unite statistical learning and database techniques to work directly on non-curate databases. I will present 3 recent contributions to building a statistical-learning framework on non-curated databases. First, we tackle the problem of non-normalized categorical columns, eg with typos or nomenclature variations. We introduce two approaches to inject the data in a vector space, based either on a character-level Gamma-Poisson factorization to recover latent categories, or by exploiting unstudied properties of min-hash vectors that lead to very fast stateless transformations of string inclusions into simple vector inequalities [2]. Second, we study the consistency of supervised learning in the presence of missing data [3]. We show that in missing-at-random settings simple imputation by the mean is consistent for powerful supervised models. We also stress that in missing not at random settings imputing may render supervised learning impossible and we study simple practical solutions. Finally, we consider two-sample tests: testing whether to set of observations are drawn from the same distribution, useful for instance when assembling multiple database. Methods such as kernel MMD are versatile to non-vector data, as they only require a kernel between observations. We contribute to this rich literature by showing that building these kernel two-sample tests on an l1 geometry improve statistical power upon rather than the classical Euclidean geometry implied by the RKHS associated to the kernel [4]. [1] https://www.kaggle.com/ash316/novice-to-grandmaster [2] Encoding high-cardinality string categorical variables, P Cerda, G Varoquaux https://arxiv.org/abs/1907.01860 [3] On the consistency of supervised learning with missing values J Josse, N Prost, E Scornet, G Varoquaux, https://arxiv.org/abs/1902.06931 [4] Comparing distributions: l1 geometry improves Kernel two-sample testing, M Scetbon, G Varoquaux, NeurIPS 2019

Friday, September 13, 2019

Matthew Amodio (Yale)

Recording:

https://bluejeans.com/s/j4QI@

Constraining GANs in Unsupervised Domain Translation

Generative Adversarial Networks (GANs) have recently come to dominate the field of unsupervised domain translation models. The domain translation task involves learning two generators that map between two domains X and Y. In the unsupervised version of the task, no pairing information is present to associate specific observations in X to specific observations in Y. Since exponentially many mappings between X and Y exist, just training generators as GANs with the distributional loss of the discriminator is underspecified and leads to poor results. Moreover, a distributional loss with no pointwise preferences rarely reflects the goals of the application of these models. Nor does the popular cycle-consistency assumption that the two generators be each other's inverses satisfy these concerns. For most problems, other constraints are necessary. In this talk, I will discuss some of these problems, such as sensitivity to density, data symmetry, and several fundamental problems with using cycle-consistent GANs for image-to-image translation. Topics will include the application of unsupervised domain translation to computational biology for integrating data from disparate measurement technologies and some of the open challenges to image-to-image translation.