About Computer Vision

With the advancement of modern technology, Artificial Intelligence has become a vital part in every major industry. Processing and dealing with a huge amount of data in a short amount of time can be done with AI technology, which is an incredible feat that we could not imagine decades ago.

By learning how we solve problems, programs and machines can do tasks without constant monitoring or adjustment. For example, we see the world through our eyes. With our complex optic nerve, our brain can obtain the lights and visualize the world so we know what we are looking at. But does computer or machine view the world the same as we do?

Credit: https://www.commonlounge.com/discussion/c9975025c9ff473c8f9ed2c4b1c3ea6a

We can usually describe what is in a picture or possibly the story behind it. But to a computer, every image can be interpreted as number of pixels and RGB values. Despite these data, it does not know what the picture contains since computer cannot comprehend the arrangement of these value mean. Thus, if we manage to train computer to learn how we interpret images, it can help us with tasks where identifying features in images is vital. Thus, Computer Vision is born.


What is Computer Vision?

Basically, Computer Vision deals with how computers understand and do tasks from images and videos. It should be noted that Computer Vision is not considered a branch under AI. However, they are closely related to each other since Computer Vision involves multiple AI-related techniques.

We know that computer do not get tired or bored for doing tasks for weeks and months (if properly maintained), and can detect the slightest difference, just like the color #b4fac8 and #b4fac9. But in order to let computer understands our goal, it has to learn how we interpret images.

How would you describe a complex shape such as an elephant to a computer?

Source: https://medium.com/betterism/the-blind-men-and-the-elephant-596ec8a72a7d

There are many things which can be distinguished visually by describing their colors and shape, such as identifying an apple from a banana. But if someone were to tell a computer that a banana is “a fruit that is curved and is either yellow or green” and ask the computer to collect pictures of bananas on the internet, then it might bring in some interesting results like below:

The problem with teaching computer how to identify image is that they lack the learning process like us. Since the time of our birth, we have been observing our surroundings, learning and acquire knowledge as we grow up. When we receive instructions, the complex schema, or knowledge in our mind helps us understand and infer meaning through simple explanation and definitions. Now, imagine you can only use visual features to teach a computer, which in a sense know nothing at all. How long would it take to come up with definitions and exceptions which can exactly pinpoint the images you are looking for, but at the same time, comes with different colors and shapes? How difficult would that be? This is why we need another tactic to help computer learn.

How do computers learn to see?

Instead of giving complicated definitions with caveats, we can just teach computer by telling them the answer. For instance, we can show provide many (As in thousands or more) pictures of a banana to let the computer figure out the features in these pictures. If something is not a banana but has very similar colors and shapes, we will tell the computer that this is incorrect. The more data the computer can learn from, the more accurate it will be. Although there are various ways to train the computer which involves complex math and algorithm, this is a very simplified way to imagine the process. A better representation of this training process can be seen here.

But there are other issues which are sometimes trivial to us, but challenging for computer to recognize. Different lighting, combination of colors, obtrusion could affect how a computer process images of videos. In reality, computer has to analyze data which contain other elements which are irrelevant, such as trying to detect people who are wearing mask in a crowd. The complex issues which the computer has to consider are usually simplified by algorithm formulated by researchers, so that they can let the computer do specific tasks based on the kind of data it would receive. In a sense, creating a computer to have the mind and vision of a human might be an impossible task, but when its purpose is limited to complete certain tasks, it can achieve great things.

Nowadays, the use of Computer Vision is prevalent not only in the industry, but in our lives as well. In the next section, we will discuss the common functions used in Computer Vision.

Applications of Computer Vision

The main applications of Computer Vision is to assist in decision making based on what it "sees". These functions are detection, recognition, tracking, counting, and etc. One or more of these functions are used together depending on the application, which is why most AI technologies do not solely use Computer Vision, but incorporate it with other functions to make a specific functional system. To further elaborate the issues which a computer might face, we will demonstrate using Pedestrian Detection as an example.

Pedestrian Detection can be used for people counting or other useful applications. One of the main problems is occlusion, in which a target is being blocked by other objects that could affect the result. It is a challenge for various functions like urban autonomous driving and surveillance systems. One way to solve this problem, for example, is to limit the detection area using the head-shoulder feature of pedestrians. It is more stable and less likely to be occluded than other areas of the body.

Through Computer Vision, image can be edited to facilitate further analysis by decreasing obstructive information and increasing accuracy. Instead of relying on manually editing the image, such as adjusting brightness, removing noise and etc., the computer can do these tasks automatically through specially designed algorithm. Below is an example of how a computer manages to turn a rainy image into a clear night sky.

Various weather conditions, such as rain, haze, or snow, can degrade visual quality in images, which may significantly degrade the performance of related applications such as self-driving cars. Here, the rain has occluded much of the items in the image, and it would take a long time to remove them manually, let alone doing this on more than one image. This algorithm is designed to remove heavy rain. Even the rain drops which appeared in small subtle areas can be removed.

Experience Computer Vision

The effort required to build and train a computer to see and accomplish tasks can be arduous, and the applications available for the public to experience might be limited. Below are some of the examples which anyone can visit and see what sort of outcome computers usually can generate.

What would a person look like to a computer? Artists are able to draw fictional characters, so does computers can generate realistic people who do not exist. This website generates image of a real person based on what it knows about human features. Some of the images might feel "strange" due to some of the data used for learning, which is why you could get something like this, this, or even scary image like this.

An image will be generated once you visit the website. Simply refresh it to generate another image. You can even generate artwork, cat, horse and even chemicals as well.

Similar to the aforementioned website, the person on the left does not exist either, but you can decide certain features for the computer to generate. Aside from sex, head pose and age, you can select what emotion the person will show and, their skin tone, hair length and colors etc.

Every person has their own facial features, and might change as time goes by. Anonymizer analyzes the facial features from an image and generates multiple people who do not exists, has certain resemblance of the uploaded image. This is particularly useful to upload image on certain website profiles to keep your true face private. Note that the images generated are not meant to be close to the uploaded photo, but to combine and mixed the features instead. The portraits on the left were generated based on this image.

Remini for Android and iOS

Most people are familiar with Photoshop when thinking about editing photos. But if the image has low resolutions, it will be difficult to polish the photo. Remini can enhance photos with low resolutions using AI and Computer Vision. After learning from countless photo images, this application will enlarge the image and add realistic details. Since the data using for training this application are human photos, Remini will not be able to produce the similar quality if photos containing animals are given as input. Here is an example. If you look closely at the second picture on the second row, the frame on the man's right has been slightly distorted. As the application keeps training on more photos, it might become more efficient in the future, producing more stunning photos. So go ahead and have some fun with the app!

Conclusions and Further Reading

Computer Vision has come a long way, but there are many issues yet to be solved. It can be ubiquitous in different fields of research, but their problems can be summarized into accuracy and efficiency. Nowadays, Computer Vision has been applied in surveillance system, agricultural management, medical fields and others, which shows the flexibility of this technology and its ability to collaborate with various fields. There are many aspects of Computer Vision, such as algorithm which are not covered in this article, which is why the instinctive goal of Computer Vision should not be considered as easy at all. To learn more about Computer Vision, feel free to visit the following links:


Introductions


Tutorials