Getting Started in Computer Vision Research

Last update: May 21, 2015.

The content of this page is mainly collected from the web, especially from Quora website. Some of texts were taken "as it is". Searching Google with one of statements hopefully will indicates its source. Sorry for not keeping the original source.

In following notes, there will be too many links to check. Discovering all of them at once may be waste of time. Move to How to Start Section. Focus with it. Through discovering the field, open them from time to time to know about them.

Top Conferences and Journals

    • Top tier Conferences: CVPR, ECCV, ICCV, NIPS, IJCAI
    • High Prestigious Second Tier Conferences: BMVC
    • Prestigious Second Tier Conferences: ICIP, ACCV, ICPR, SIGGRAPH
    • Top tier Journals: PAMI, IJCV
    • Less Journals: CVIU, IVC
    • Microsoft Academic Research list of top conferences
    • Ranks from Core
    • Ranks from Arnetminer
    • Good source for recent papers in conferences
    • List of some journal impact factors
    • Journals scores from EigenFactor

Top Authors

Top Groups


    • Tomasz Malisiewicz blog
    • The Serious Computer Vision Blog
    • Research blog of Roman Shapovalov
    • Computer Vision Talks
    • Steves Computer Vision Blog
    • The Computer Vision
    • Computer Vision Blog
    • Andy's Computer Vision and Machine Learning Blog
    • Computer Vision Models
    • solem's vision blog
    • uncannyvision blog
    • Blogs on Computer Vision, Machine Vision and Image Processing
    • All About Computer Vision
    • Open Computer Vision

Industry Labs & Startups

    • Microsoft and Google
    • IBM Research
    • NEC Labs America
    • Acute3D (Sophia Antipolis, France) was founded in 2011.
    • Bubbli
    • ShoppTag
    • Oculusai
    • Videosurf (video search)
    • Willow garage (robotics)
    • Sportvision (sports broadcast)
    • Intelli-vision (surveillance)
    • Gauss Surgical
    • Adobe's Advanced Technology Labs
    • Dolby

How to Start in Research

  • I like to divide vision topics to 2 types.
  • These topics that involve AI learning. E.g. Image Classification, OCR, Video Tracking, etc
    • Most of this document about this type.
    • Learning means we have much data (e.g. 1M image and their labels), we learn the pattern (e.g. classify character in image in range 0-9)
    • For this type, you have to learn too Machine Learning. See Machine Learning Section
  • Other topics than involves algorithms not learning based. E.g. 3D Reconstruction, OpticalFlow, Panorma.

Using a textbook/course

    • One direct way is to start from books
    • Don't stuck in books. Remember, you want to start research. Try to understand the basics and do some coding. Keep Your eyes on recent interesting work for you.
  • Try to identify the different research vision problems..see which more exciting for you.
  • Then move to next element: "From Papers"

From Papers

    • Start with papers from top tier conferences and Journals. Other low rated conferences may fake results and waste your time.
    • CVPR maintains the list of the important conferences and many of their papers.
    • Use the papers to know what are the available will help too
    • Use Google Scholar to find surveys on a specific problem. Survey save much of your time.
    • Initially consider the last 3 years. Say we are on 2014..consider 2011, then 2012, then 2013. Don't start with 2014.
    • Collect papers such that title seems related. Google them to find if there is source code. Try to start with source code papers.
    • Start will be tough, as you meet many jargons & tools that you don't know. Be patient. Google them up, ask on forums like Quora or Stackoverflow.
    • Try to find a specific point (E.g. 3D reconstruction, point clouds, scene understanding, object recognition, big image data, multiple target tracking, image descriptor theory, etc...). Check wiki or conference tracks to find what interests you.
    • Use conferences to know its papers or use Google Scholar
    • Target work authoritative researchers. Or work with high # of citations
    • Preferred to start with research work that has available running software
    • For learning about engineering, which is preferred in start, pick a simple and nice paper for you and implement it. Make your target generating same results as the paper. While doing so, many questions will popup and many times you will have to do some assumptions, as typically not everything is mentioned. Also many implementation details, like how to do this point efficiently won't be listed. You will understand issues like performance, experiments, etc. Papers examples: Viola Jones face detection, Christophe Lampert Efficient Subwindow Search, or Brian Fulkerson superpixel neighborhoods. It is a very good idea to implement a paper that has full code available, so that you could check it if you stuck or even finished but with poor implementation.
    • For your research work, Try to build over existing code rather than implementing from scratch.
    • You may contact authors for code, if it is not on public
    • If after several trials to understand a paper, it still hard, move to another one. Or move to another field.
    • Subset of best award papers
    • A graduate seminar course that depends on papers.

From Code

Machine Learning

  • ML are the core algorithms to LEARN from the data.
  • For vision, specially beginners, you don't learn much. You can also use things as black box
    • It is a hard field by the way. To be guru, you put time.
  • The more you grow in the field, learn more about details.
  • Initially, You should learn some basics + recent used algorithms.
  • Every 4-5 years, there are some algorithms that are more popular in literature
    • E.g. 3 years ago, SVM is very popular one
    • Nowadays (2014, 15..), Deep Learning is the best in performance.
  • To establish basics in the field:
  • To know what are the recent used algorithms for you
    • Either ask someone works in this problem
    • Or download top conference papers in range 2-3 years in YOUR problem. Skim it and know what learning algorithms they used.
    • Overall should be little who repeat alot. Focus on them
    • Then
      • Try to read little more about these algorithms
      • Try to do some coding. Search for popular tools and use them
        • E.g. for SVM (libsvm), CNN(Caffe)
  • Now, you can go back to the papers/book and continue reading and you will find topics are easier when ML is involved.
  • To be more guru in ML
    • To understand more about how learning happens?
    • To understand more algorithmic topics and math behind them

Some papers

  • It is hard to say what are good papers to read. Better determine problem and follow references.
    • Top publications in vision
    • You don't have to read them all. Just what is releated to your problem
  • What are the must-read papers in the field of computer vision for a student in pursuing research in the field?
  • Courses in universities..hopefully useful
    • CS395T: Visual Recognition, Fall 2012
    • CMPT888: Human Activity Recognition, Summer 2010
    • CMPT882: Recognition Problems in Computer Vision, Summer 2009

Acquiring Experience

    • You typically learn to deal with all of these issues while getting a PhD
    • How do you address all such issues in your research efficiently and reliably ?
    • You basically have to be a member of a research group for several years in order to learn a little bit about all of these issues. If you're in a lab which focuses on object detection, there will be many students around you struggling with the same issues, and talking to fellow students in the middle of the night is the only way I know that you can gain the expertise you inquire about.
    • How do you debug the code and tune the parameters efficiently?
    • The best practices for debugging will be acquired as you look over the shoulders of more advanced students. You should be comfortable with debugging in general before you start debugging machine learning algorithms. Debugging machine learning algorithms is not like debugging quicksort. If you fix all bugs, your algorithm might still not work because of other issues such as lack of data, too low model complexity, etc. To be quite frank, debugging vision/learning algorithms is more like art than science.
    • Tuning parameters of an algorithm or software library you did not write is never easy. You should learn how to use validation data correctly, have a good sense of how it takes to run the full training/evaluation pipeline, and be ready to use a cluster of computers for cross-validation.
    • How do you implement a large scale of problem with a personal PC?(For image/video analysis, there may be a large volume of data beyond your RAM, how to deal with it?)
    • In general, you don't implement a large-scale problem on a single PC. One of the most valuable skills I learned in graduate school is how to parallelize computations across a cluster. This is unfortunately one of those make-it-or-break-it skills. While not impossible, it is hard for universities/labs without clusters to compete with universities with medium-to-large sized clusters. This is also one of the reasons why so many Professors are joining organizations like Google and Facebook -- they have the data AND the computational resources to let senior researchers work on more and more ambitious large-scale problems.
    • If you are unable to get access to a large cluster, then I would recommend applying for an internship at a place like Google. You will learn so much (at least I did) while you're there. While you won't be able to bring back home any code you write there, you will learn lots of scaling lessons that will impact your life as a student.
    • If you have to work on a single machine, you will have to cut up the dataset into smaller chunks and incrementally load the chunks into memory.


Online Videos & Talks

    • Online Course: Discrete Inference and Learning in Artificial Vision
    • UCF Computer Vision Video Lectures: Videos
    • EGGN 512 - Computer Vision Videos
    • Video Lectures includes many in computer vision.
    • Tech Talks For some conferences, like ICML2011, they host video for most (all?) of the talks from the event. Others, like CVPR2011, only have selected videos. This is a great way to learn about a lot of recent work without solely relying on reading papers.
    • CVPR2010, they host a lot of videos for the talks. They also have a lot of ML videos for summer schools.
    • Wired, IEEE Spectrum, TechCrunch, TED, BigThink, Sixty Symbols, GISCIA,,


Computer Vision

    • Computer Vision: Models, Learning, and Inference - This is a great (free!) preprint that leans heavily towards machine learning. Each section provides background on a set of models or machine learning tools involved, and methods of inference. The beginning is an in-depth overview of the necessary probability and machine learning concepts. I just started going through this book but it has been really useful for getting an overview of things like parts models and shape models.
    • Computer Vision: Algorithms and Applications - by Richard Szeliski. A survey book. This is more traditionally laid out textbook that is referenced in a number of current Intro to CV classes such as Fei-Fei Li’s above and the current CV course at my school (JHU).
    • Multiple View Geometry in Computer Vision - Richard Hartley and Andrew Zisserman
    • Computer Vision a Modern Approach - David Forsyth and Jean Ponce
    • Visual Object Recognition: Synthesis Lectures on Artificial Intelligence and Machine Learning - Kristen Grauman and Bastian Leibe
    • Introduction to 3D Computer Vision by Trucco and Verri
    • Digital Image Processing 3rd Edition by Gonzales and Woods
    • Practical Algorithms for Image Analysis

Computer Vision & Image Processing Coding

    • Programming Computer Vision with Python - Jan Erik Solem
    • Learning OpenCV - Gray Bradski and Adrian Kaehler
    • Fundamentals of Digital Image Processing: A practical approach with Examples in Matlab - Chris Solomon and Toby Breckon

Human Vision

    • Vision: A Computational Investigation into the Human Representation and Processing of Visual Information - David Marr
    • Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control - Stefano Soatto
    • Basic Vision: An Introduction to Visual Perception - Robert Snowden, Peter Thompson and Tom Troscianko
    • Programming Computer Vision with Python


    • CV Papers is a collection of recent computer vision papers from the top/largest vision conferences.
    • Visual Recognition and Machine Learning Summer School, Grenoble, 2012
    • I would take a few machine learning courses and also take a few courses in signal processing/ time-frequency analysis/ wavelet analysis.

Exciting Applications

    • Never Ending Image Learning(NEIL)
      • It is a computer program runs 24X7 to browse the internet to extract visual information from the internet data. It is being supported by Google and Department of Defense's Office of Naval Research.
      • It currently identifies object -object relationships, object-attribute relationships, scene-object relationships, scene -attribute relationships
    • Face detection
    • Tennis ball tracking
    • Body pose estimation with depth camera
    • Heads Turn as Microsoft Shows Off 3-D Scanning Techniques
    • Color changes reveals person blood flow
    • Reconstruct an entire city in 3D only by public Flickr photographs
    • Autonomous objects, e.g. self driving cars
    • Predator Object Tracking
    • Kinect Fusion - realtime 3D model construction from a moving Kinect
    • Veebot, a robot that takes blood samples
    • Harp: Detecting the interruption of a laser to play a note (simple, powerful). Piano.
    • Google Photo Search
    • Physical security
    • PTAM is great application of AR
    • Google Glass
    • Google Street View : Capturing the world at street level
    • Word Lens : An augmented reality camera-based language translation application. The mobile camera can identify text in one language and show the words translated in another language. The best thing I found about this application is that translation is performed in real-time without connection to the internet!
    • CarSafe : This application uses computer vision and machine learning algorithms to monitor and detect whether the driver is tired or distracted, and at the same time track road conditions using two separate cameras. Some details and results are provided in the paper here: CarSafe: A Driver Safety App that Detects Dangerous Driving Behavior using Dual-Cameras on Smartphones
    • iOnRoad : This is a mobile driver assistance system application using Qualcomm's FastCV mobile-optimized computer vision library. It uses the smartphone’s native camera and sensors to perform various functions. The application has advanced features such as forward collision warning, lane departure warning, headway monitoring and car locator.
    • Jumio : A real-time credit card scanning and validation application for online and mobile checkouts. They also provide ID verification of passports and licences in many countries.

Exciting Algorithm



  • CVPR Jobs Postings
    • Join LinkedIn and look at the Image Processing or Computer Vision interest groups.
    • Adobe's Advanced Technology Labs



  • My list


Helpful sites

    • Google Scholar
    • Microsoft Academic Research
      • You could get sorted list of top key figures in a field
      • You get top conferences and journals in a field
      • You could know about citation of person to know quality of work. If someone has 100 and 100 citation, it seems every work is used by 1 person. On the other hand, if cited by 10000, it is on average cited by 100 work. Second one is stronger advisor.