Getting Started in Vision: Short Guide

Computer Vision: Intro

Computer vision is about understanding (extracting info from) images and videos. Think about the following problems.

  • Given an image, find all faces in it (e.g. as in cameras).
    • Or even more complex to identify who are these persons (e.g. Facebook people tagging).
  • Given an image, detect different objects inside it (e.g. persons, dogs, cars, ..etc).
  • Autonomous cars and robots: Understand the 3D world around you (e.g. get a video stream, detect persons and car and drive safely by car).
  • Given a video for football, detect the players, track them and recognize events and win points.

Computer Vision received much attention in recent years, especially after good breakthroughs using deep learning. Top companies and start-ups are heavily involved and doing much research and development (e.g. Google Photos app). There are lots of opportunities and challenges!

Computer Vision: Success

  • The majority of problems in computer vision are hard...we still can't figure out a better way to understand the visible world around us.
  • Always keep in mind, vision quality is much worse than the capabilities of 5 years old child (or even less)!
  • So far, classical problems are still major difficulties in the field. Many giants (companies and universities) will work in such classical problems (e.g. Object Detection, Image Segmentation...etc)
  • Since 2012, deep learning has been applied in lots of computer vision problems and pushed the performance up. As a result, some industry apps now includes computer vision capabilities.
  • In a few computer vision problems, we do pretty well with exciting performance.
    • Finding/recognizing faces in images
    • Optical Character Recognition (OCR).
  • It is funny that vision can beat humans in some problems :)
    • E.g. If there are 100 different type of a specific bird/animal, human fails to recognize it. E.g., a human can know it is "bird", but it is not "Kingfishers bird"
  • In some other problems, humans can do too much pretty well than humans.
    • E.g. if an image has a very small car in its background. Given such small scale, vision algorithms will probably miss such an object.

Computer Vision: Related fields

  • Vision is related to some other fields. However, you don't need to master them first.
  • A critical field to learn is Machine Learning where we can understand and extract information from images.
  • Another field is image processing. However, you can just proceed with vision and learn the things you need from image processing over time.
    • Most of the time some libraries will just provide you with what you need.
    • Sometimes it is tricky and awareness of some image processing concepts related to your problem help much.
    • With nowadays deep learning, In many vision problems,s we rarely play with the image. Just feed to your network and let network learn whatever.
  • Other related fields: Image analysis, Machine vision, computer graphics.
  • One more time: the major field to learn is Machine Learning, and everything else can be learned on demand.
    • Definitely, if you are already good in such fields (e.g. Image processing), it is pretty helpful!
  • There are too many concepts, tools, problems and concerns in vision. However, typically per a project, you need few of them to know & master. So calm down :)

Programming Languages and Frameworks

  • I used both C++ and Python. C++ will be painful, while python will be so easy and much support
  • There are many frameworks, but I would say pytorch is easy according to others. I used Caffe, Lasange and Tensorflow. Tensorflow is hard in the begin, but much support from Google/community.

Machine Learning: Intro

  • Machine learning (ML) is simply algorithms to learn from data.
  • Supervised learning, one of the dominant learning techniques, understands the relationship between the given input feature vectors and their outputs.
    • Output can be Discrete such as email spam or not
    • Or can be Continuous such as house price estimation (called regression problem)
    • Feature vector? A representation of input in terms of a real-valued vector.
    • An email can be an array of size 100, represents the frequency of special 100 keywords (e.g. bank, money, transfer..etc). You know classical spammers use these keywords in a boring way!
    • A house can be represented as a simple array of some features
      • E.g. How many rooms, Overall size, which floor, area index, etc = [4, 70, 3, 152, ...]
  • If we can represent an image as a feature vector, we can apply ML. This is the challenging part :)
  • Keep the following critical note in mind!
    • From time to time, there are some new ML algorithms that appear.
    • We apply these techniques, improve them as much as we could.
    • At some time, we can't do better with them.
    • Then? some new powerful technique is introduced.
    • Old ones might be used in fewer applications or won't be used at all.
    • So what? Although there are many algorithms in ML, as a beginner, just learn the recent things that work pretty well ... and learn others by time if interested.
  • In computer vision, per problem, some machine learning algorithms are used. So determining a problem narrows down what to learn first.
    • E.g. guys doing image segmentation formulate some parts of their solutions as Discrete Energy Minimization. However many other problems never do so.

Machine Learning: Study Guide

  • There are some major concepts to learn in machine learning.
    • One can do that in parallel to exploring more about vision and identifying vision problems.
  • Machine learning needs a strong background in some areas of mathematics. Trying to understand much of algorithms internals will be tough!
  • To make it simpler for a beginner, your first baby steps is Andrew Ng Coursera Course
    • It avoids much math and gives attention to some practical concerns.
    • Course repeats from time to time. Old versions probably on youtube.
    • One can enroll in new versions.
    • It is very important to solve exercises and do assignments.
    • It is advised to finish all of it. If running out of time, then do following weeks:
      • W1 to W6, W8-Module1, W10
  • Andrew Ng course is matlab :( .
    • Someone implemented the course in Python
    • Another choice is python here from udacity (but andrew one is better and stronger)
  • As mentioned before, the deep learning now is the reason of the state of the art algorithms.
    • Understanding the Neural Network is important.
    • Andrew Course puts very few contents about NN.
    • One of the great resources is Neural Networks for Machine Learning — Geoffrey Hinton.
      • Here on youtube. Coursera Link.
      • It covers much more than a beginner needs
      • Try to identify the min # of videos to study to cover only basic NN concepts similar to Andrew course
  • Create Account on kaggle. Try some simple projects.

Computer Vision: Study Guide

  • In deep learning, one of the most important concepts is the Convolution Neural Network
    • There is a video that you may watch. Or a paper to read (Presentation summarizing it.)
    • Try to get general ideas.
    • For more details, see my CNN article.
  • Finish Udacity Course
  • As a parallel plan, finish all videos of this channel and use as a guide when meeting a new concept:
  • Again as a parallel plan, finish this great course: CS231N from Stanford
  • For videos/sequences, you will need also to learn LSTM. You don't for images, so delay for now.
  • Later you may do this udacity course. Don't do it early. ud810
  • An Arabic course (not mandatory at all, just giving the link)

Computer Vision: Problem Guide

  • If you know the exact problem = great. E.g. I want to learn Gesture Recognition or detect deaf language.
    • Use top papers to identify related ones and read some of them.
  • If you don't have an idea, try to think about a new useful problem or find one from papers.
  • Nothing specific to study. It is important to understand the problem, how people work against it.
  • Avoid papers that are far 3-5 years from now. Focus on last 2 years.


  • Critical part about working in vision is the hardware (For deep learning projects, the majority!).
    • From one side, you need large disk space (For a small project, maybe ~200 Giga).
    • The more critical thing, your machine needs to support GPU processing.
      • In Computer Vision, gtx 1080 ti is critical to most frameworks that consume much of GPU RAM (it is 11G)
    • A desktop machine with good GPUs is critical for the success of your project.
      • The good about a desktop, you can start with GPU, and replace later (given it fits with some other components)
    • Another path might be paying for Amazon and use on of their machines!
    • I would advise you while learning basic ML to keep going with your current machine and use your CPU. Probably it will fit with many toy examples while learning.
    • Later you could switch to a desktop. Buying a laptop is not practical at all for ML normal purpose.
    • Normally, when we work at companies, we use their resources (or they pay for some GPU cloud service)
  • Watch


  • One can wonder, how much time do I need before getting some results? A tough question.
  • If you can find a supervisor to point out exact steps to do = great
  • Otherwise, try to do minimal things to get things ready.
  • E.g. try to get fast ideas about what is CNN, then run Caffe, test it on some models.
  • As a beginner, You can do lots of things as a black box. You can use Caffe to train models easily (after some learning curve).
  • Generally, learning all of that and getting reasonable results can be done in a graduation project.
  • Some students can also get things done within a course term time window.