JRMOT: A Real-Time 3D Multi-Object Tracker
and a New Large-Scale Dataset

JRMOT is a 3D multi object tracking system that:

  • Is real-time

  • Is online

  • Fuses 2D and 3D information

  • Achieves State of the Art performance on KITTI

We also release JRDB:

  • A dataset with over 2 million annotated boxes and 3500 time consistent trajectories in 2D and 3D

  • Captured in social, human-centric settings

  • Captured by our social mobile-manipulator JackRabbot

  • Contains 360 degree cylindrical images, stereo camera images, 3D pointclouds and more sensing modalties

Tracking System

  • Our system is built on top of state of the art 2D and 3D detectors (mask-RCNN and F-PointNet respectively). These detections are associated with predicted track locations at every time step.

  • Association is done via a novel feature fusion, as well as a cost selection procedure, followed by Kalman state gating and JPDA.

  • Given the JPDA output, we use both 2D and 3D detections in a novel multi-modal Kalman filter to update the track locations.

Benchmark Evaluation

We evaluate our system on the 2D tracking benchmark (KITTI) to compare to other trackers, and on JRDB, to set up a competitive baseline. We achieve State of the Art tracking MOTA of 85.7% on the KITTI car tracking benchmark, and 46.33% on the KITTI pedestrian benchmark. Further, we achieve 20.2% MOTA running at 25 fps on the JRDB dataset.

On-Robot Evaluation

We evaluate our tracker on JackRabbot on 110 seconds of data, captured on our university campus. We test in situations as close to real world as possible, with 1-7 people per scene, 14 unique identitites across all sequences, and in indoor and outdoor settings. We record only 4 ID switches and 1 lost track across all tested sequences, at an average of 9.5 fps.

Code and Paper

We release our ROS implementation of our tracker, written in Python. Code can be found here . More details can be found here .
If you find our work useful, please consider citing the following:

@inproceedings{shenoi2020jrmot, title={JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset},author={Shenoi, Abhijeet and Patel, Mihir and Gwak, JunYoung and Goebel, Patrick and Sadeghian, Amir and Rezatofighi, Hamid and Martin-Martin, Roberto and Savarese, Silvio}, booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},year={2020},pages={10335-10342},doi={10.1109/IROS45743.2020.9341635}}
@article{martin2019jrdb, title={JRDB: A dataset and benchmark of egocentric visual perception for navigation in human environments}, author={Mart{\'\i}n-Mart{\'\i}n, Roberto and Patel, Mihir and Rezatofighi, Hamid and Shenoi, Abhijeet and Gwak, JunYoung and Frankel, Eric and Sadeghian, Amir and Savarese, Silvio}, journal={arXiv preprint arXiv:1910.11792}, year={2019}



Abhijeet Shenoi

Abhijeet Shenoi is an algorithm engineer at AiBee US Corporation. He earned his Master's degree in the Computer Science department at Stanford University in 2019, while working under the supervision of Prof. Silvio Savarese in the Stanford Vision and Learning Lab. His research interests include semantic understanding, 3D tracking and 3D computer vision.

Mihir Patel

Mihir Patel is pursuing a Bachelor's degree in the Mathematics department and a simultaneous Master's degree in the Computer Science department at Stanford University. He currently works at the Stanford Vision and Learning Lab with Prof. Silvio Savarese and Prof. Fei-Fei Li and is part of the JackRabbot team. His interests are in vision and generative modeling.

JunYoung Gwak

JunYoung Gwak is a PhD student at the Stanford Vision and Learning Lab with Prof. Silvio Savarese. His current research focus is on 3D vision and scene understanding.

Patrick Goebel

Patrick Goebel obtained his Ph.D. in Cognitive Psychology at the University of Toronto while working on some of the earliest artificial neural networks. Later he wrote the first two books on ROS, the Robot Operating System. For the past 3 years Patrick has worked as a Research Scientist in the Stanford Vision and Learning Lab primarily on the Jackrabbot social navigation project as well as the TRI HSR Challenge.

Amir Sadeghian

Amir Sadeghian is an Algorithm Scientist and founding member at Aibee US Corporation, where he leads the shopping mall team. His team works on building systems to digitalize the shopping malls, enabling better management and opening new ways to monetize off-line traffic.He received his PhD from Stanford University in Jan 2019, where he worked in Stanford Vision and Learning Lab with Prof. Silvio Savarese. His research interests primarily focus on computer vision and perception for robotics. During his PhD, he was leading the JackRabbot team that has been featured in several major news outlets including CBS, ABC, and BBC.

Hamid Rezatofighi

Hamid Rezatofighi received his PhD from the Australian National University in 2015. In 2018, he was awarded a prestigious Endeavour Research Fellowship and used this opportunity for a placement at the Stanford Vision Lab (SVL), Stanford University. Currently, he is also a senior research fellow with the Australian Institute for Machine Learning (AIML) at the University of Adelaide. His main research interest focuses on computer vision and vision-based perception for robotics, including object detection, multi-object tracking, trajectory forecasting and human collective activity recognition. He has also research expertise in Bayesian filtering, estimation and learning using point process and finite set statistics.
Roberto Martín-Martín
Roberto Martín-Martín is a postdoctoral scholar at the Stanford Vision and Learning Lab with Prof. Silvio Savarese and Prof. Fei-Fei Li. He coordinates research projects in two groups: the JackRabbot team, which works on mobile manipulation in human environments, and the People, AI & Robots (PAIR) team, which works on visuo-motor learning skills for manipulation and planning. He obtained his PhD in Robotics at the Technische Universität Berlin at the RBO group from Prof. Oliver Brock.

Silvio Savarese

Silvio Savarese is an Associate Professor of Computer Science at Stanford University and the inaugural Mindtree Faculty Scholar. He earned his Ph.D. in Electrical Engineering from the California Institute of Technology in 2005 and was a Beckman Institute Fellow at the University of Illinois at Urbana-Champaign from 2005–2008. He joined Stanford in 2013 after being Assistant and then Associate Professor of Electrical and Computer Engineering at the University of Michigan, Ann Arbor, from 2008 to 2013. His research interests include computer vision, robotic perception and machine learning. He is recipient of several awards including a Best Student Paper Award at CVPR 2016, the James R. Croes Medal in 2013, a TRW Automotive Endowed Research Award in 2012, an NSF Career Award in 2011 and Google Research Award in 2010. In 2002 he was awarded the Walker von Brimer Award for outstanding research initiative.