UMN Visual Computing & AI Seminar

To subscribe VCAI, please send an email to hspark at

Upcoming Talk

Dec 11 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Praneet Bala

Title: Automated Marker less Pose Estimation in Freely Moving Rhesus Macaques


The rhesus macaque is an important model species in several branches of science, including neuroscience, psychology, ethology, and several fields of medicine. The utility of the macaque model would be greatly enhanced by the ability to precisely measure its behavior, specifically, its pose (position of multiple major body landmarks) in freely moving conditions. Existing marker-based motion capture approaches cannot provide enough tracking due to incompatibility and occlusion. Here, we propose OpenMonkeyStudio, a novel deep learning-based marker less motion capture system for estimating 3D pose in freely moving rhesus macaques in large unconstrained environments. Our approach makes use of 62 precisely calibrated and synchronized machine vision cameras that encircle an open enclosure. The resulting Multiview image streams allow for novel data augmentation via 3D reconstruction of hand-annotated images that in turn are necessary for training a robust view-invariant deep neural network model. We show that OpenMonkeyStudio can perform more precisely than trained human annotators and its precision matches that of marker-based systems.


Praneet is a First-year PhD student studying under Prof. Hyun Soo Park; Education: Bachelors in Electronics Engineering from University of Mumbai, India; Work status: Currently, working on non-human subject pose estimation and tracking; Research interests: Computer Vision, 3D Vision and Machine Learning.

Past Talk

Dec 4 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Shi Chen

Title: Leveraging Human Attention for Image Captioning


Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning.

Bio: Shi Chen is a first-year PhD student studying under Dr. Catherine Qi Zhao. His research interests lie in multi-modal fusion, human vision and efficient computing.

Nov 20 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Md Jahidul Islam

Title: Enhancement and super-resolution of underwater scenes for improved robotic perception


A major challenge for visually-guided underwater robots is to ensure robust perception in unfavorable sensing conditions. In particular, artifacts such as poor lighting, scattering, and chromatic distortions cause severe difficulties for an underwater robot to visually perceive and interpret its surroundings. In this talk, I will delineate our attempts to address these challenges by designing novel and improved vision-based solutions. Specifically, I will present robust deep generative models for automatic enhancement and super-resolution of distorted low-resolution underwater images to facilitate an improved perception. I will depict their algorithmic details and discuss relevant design choices to meet the real-time operating constraints on single-board embedded platforms.


Jahidul is a Ph.D. candidate at the Computer Science and Engineering (CSE) Department of the University of Minnesota. He works at the Interactive Robotics and Vision Laboratory (IRVLab) under the supervision of Professor Junaed Sattar. His research focuses on the design and development of vision-based solutions for improved human-robot cooperation in adverse underwater conditions.

Nov 13 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Prof. Yasin Yazıcıoğlu

Title: Distributed Learning of Optimal Coordination in Multi-Agent Systems: A Game Theoretic Approach


Teams of autonomous systems have proven to have great potential for serving as robust and efficient solutions in various applications such as environmental and infrastructure monitoring, search and rescue, precision agriculture, manufacturing, and logistics. Realizing this potential mainly hinges on achieving a proper coordination (an optimal joint plan) among the team, which usually leads to intractable large-scale combinatorial optimization problems. This intractability necessitates the design of scalable distributed learning or optimization algorithms that can work under the partial information available to the agents and achieve performance guarantees (e.g., convergence to optimal plans).

In this talk, I will present a game-theoretic approach to distributed learning of optimal coordination among a team of autonomous agents. In a nutshell, this generic methodology decomposes the team (global) objective function into properly designed individual (local) utility functions of agents to use in optimizing their own actions based on their local/partial observations. Once the problem is mapped to a game with such structure, so-called a potential game, the agents then can use a noisy best-response algorithm to update their actions and reach optimal joint configurations in a repeated play of the game. As I talk about this game-theoretic framework and the main technical results, I will also illustrate the application of this methodology to multi-agent tasks such as high-level planning of robot trajectories to serve cooperative tasks with time windows and coverage control for surveillance and monitoring applications.


Yasin Yazıcıoğlu is a research assistant professor in the Department of Electrical and Computer Engineering at the University of Minnesota. Prior to joining the University of Minnesota, he was a postdoctoral research associate in the Laboratory for Information and Decision Systems (LIDS) at MIT from 2014-2017. He received the Ph.D. degree in Electrical and Computer Engineering from the Georgia Institute of Technology in 2014, and the B.S. and M.S. degrees in Mechatronics Engineering from Sabancı University, Turkey, in 2007 and 2009 respectively. His research is primarily focused on distributed decision making, control, and learning with applications to robotics and cyber-physical and societal networks.

Nov 6 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Travis Henderson and Amalia Schwartzwald

Title: Unmanned Aerial Platform Design for Large-Scale Aerial Sensing Tasks


This talk describes a novel Unmanned Aerial Vehicle (UAV) design for performing advanced aerial sensing tasks. Aerial sensing tasks are becoming more complex due to the size of the survey area, the required resolution of imaging, the distribution of regions of interest, and the contexts in which these tasks are located. Although UAVs are the platform of choice for aerial sensing tasks, designing a simple hardware platform to meet more than one of these previously mentioned concerns is challenging. As such, UAV designs that hybridize the traditional classes of fixed-wing and multi-rotor aircraft have become increasingly popular in recent years. This talk will discuss the MIST-UAV—a specific solution to this problem—as well as the specific motivations and goals behind the design and the salient results to-date.


Travis Henderson has five years of experience designing and building Unmanned Aerial Vehicles (UAV). He is currently a Master's student in Mechanical Engineering at the University of Minnesota and researches Aerial Sensing Platform Design in the Center for Distributed Robotics under Professor Nikos Papanikolopoulos. His interests span airframe and propulsion system hardware design for small-scale UAVs, power-efficient robot control, and real-time model parameter estimation.

Amalia Schwartzwald is a senior undergraduate student in Aerospace Engineering and Mechanics at the University of Minnesota. She is currently researching the application of reinforcement learning on robotic platforms in the Center for Distributed Robotics under Professor Nikos Papanikolopoulos. Her interests include machine learning, computer vision, UAV design, and power-efficient robot control.

Oct 30 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Prof. Evan Suma Rosenberg

Title: Making Small Spaces Feel Large: Practical Illusions in Virtual Reality


Over the next decade, immersive technologies have the potential to revolutionize how people communicate over distance, how they learn, train, and operate in challenging physical environments, and how they visualize, understand, and make decisions based on an ever-growing landscape of complex data. However, despite rapid technical advances over the past few years and no small amount of media hype, there are numerous theoretical and practical problems yet to be solved before virtual reality can catch up with our imaginations and make good on these promises. Locomotion is one of the most significant interaction challenges because body movement is constrained by the real world. When walking in VR, users may collide with walls or physical obstacles if they attempt to travel outside the boundaries of a "room-scale" space. In this talk, I will present a series of illusory techniques that can overcome these movement limitations by imperceptibly manipulating the laws of physics. This approach, known as redirected walking, has stunning potential to fool the senses. Through a series of formal studies, users have been convinced that were walking along a straight path while actually traveling in a circle, or that they were exploring impossibly large virtual environments within the footprint of a single real-world room. Additionally, I will discuss technical challenges for redirected walking systems and present novel algorithms that can automatically redirect users in complex physical spaces with obstacles.

Bio: Evan Suma Rosenberg is an Assistant Professor in the Department of Computer Science and Engineering at the University of Minnesota. Previously, he was the Associate Director of the MxR Lab at the Institute for Creative Technologies and a Research Assistant Professor in the Department of Computer Science at the University of Southern California. His research interests are situated at the intersection of virtual/augmented reality and HCI, encompassing immersive technologies, 3D user interfaces, and spatial interaction techniques. He received his Ph.D. from the Department of Computer Science at the University of North Carolina at Charlotte in 2010. Dr. Suma Rosenberg's research has been recognized with multiple best paper awards and has been funded by NSF, ARL, ONR, and DARPA. Over the past decade, he has also directed the development of multiple publicly released free software projects and contributed to an open-source technology initiative that has had a major disruptive impact on the VR industry. Dr. Suma Rosenberg has served as General Chair and Program Chair for IEEE VR, the leading academic conference in the virtual reality field, and currently chairs the steering committee for ACM SUI. His team received first place at the 2015 SIGGRAPH AR/VR Contest, and he received a Google VR Research Award in 2017.

Oct 23 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Prof. Derya Aksaray (Aerospace Engineering and Mechanics)

Title: Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications


In this talk, I will address the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. I will introduce two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. I will show that Q-learning is not directly applicable to these problems because the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, I will present an approximation of the STL synthesis problems that can be solved via Q-learning under performance guarantees.

Bio: Derya Aksaray is an assistant professor in the Department of Aerospace Engineering and Mechanics at University of Minnesota. Before joining UMN, she was a post-doctoral associate in the Computer Science and Artificial Intelligence Laboratory at MIT from 2016-2017, and a post-doctoral researcher at Boston University from 2014-2016. She received her Ph.D. degree in Aerospace Engineering from the Georgia Institute of Technology in 2014. The theoretical foundation of her research lies in the areas of control theory, formal methods, and machine learning. Recently, she has been working on developing verifiable algorithms for safe, resilient, and efficient operation of autonomous robotic systems.

Oct 16 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Dr. Orazio Gallo (NVIDIA)

Title: Computational Imaging with Deep Learning


Neural networks have surpassed the performance of virtually any traditional computer vision algorithm thanks to their ability to learn priors directly from the data. The common and relatively simple encoder/decoder architecture, for instance, has pushed the state-of-the-art of a number of tasks, from optical flow estimation, to image deblurring, image denoising, and even higher level tasks, such as image-to-image translation. To improve the results further, one must leverage the constraints of the specific problem at hand. In this talk I will use a few of my recent works to show an example of how traditional computational imaging concepts can be combined with deep learning architectures to advance the state-of-the-art.

Bio: Dr. Orazio Gallo is a Principal Research Scientist at NVIDIA Research, which he joined in 2011 after obtaining his Ph.D. from UC Santa Cruz. He is interested in computational imaging, computer vision, deep learning and, in particular, in the intersection of the three. Alongside topics such as view synthesis and 3D vision, his recent interests also include integrating traditional computer vision and computational imaging knowledge into deep learning architectures. Previously, Orazio’s research focus revolved around tinkering with the way pictures are captured, processed, and consumed by the photographer or the viewer.

Orazio is an associate editor of the IEEE Transactions of Computational Imaging and was an associate editor of Signal Processing: Image Communication from 2015 to 2017. Since 2015 he is also a member of the IEEE Computational Imaging Technical Committee.

Oct 9 Wednesday 2:30-3:30pm @ Shepherd Drone Lab (Room 164)

Speaker: Prof. Ju Sun

Title: A couple of curious questions around deep learning


Autoencoder is a classic neural network model for unsupervised learning, and the breakthrough on training autoencoder made by Hinton around 2006 triggered the incumbent resurgence of deep neural networks (DNN). People have since believed that neural networks can do most things, if not everything. For example, computer vision folks have trusted DNN for image denoising, super-resolution, and deblurring, and revamping the whole 3D reconstruction pipeline --- all involving inverse problems. Can we use DNN to solve general inverse problems? I'm here to ask two curious questions on top of my mind regarding autoencoder and solving inverse problems with DNN.

Bio: Ju Sun is an assistant professor at the Computer Science & Engineering department, University of Minnesota at Twin Cities. Prior to this, he was a postdoctoral scholar at Stanford University, working with Professor Emmanuel Candѐs. He received his Ph.D. degree from Electrical Engineering of Columbia University in 2016 (2011--2016) and B.Eng. degree in Computer Engineering (with a minor in Mathematics) from the National University of Singapore in 2008 (2004--2008). His research interests span computer vision, machine learning, numerical optimization, signal/image processing, and high-dimensional data analysis. Recently, he is particularly fascinated by why simple numerical methods often solve nonconvex problems surprisingly well (on which he maintains a bibliographic webpage: ) and the implication on representation learning. He won the best student paper award from SPARS'15 and honorable mention of doctoral thesis for the New World Mathematics Awards (NWMA) 2017.