Computer Vision is one of the most revolutionary and advanced technologies that deep learning has birthed. It is the computer’s ability to classify and recognize objects in pictures and even videos like the human eye does. There are five main techniques of computer vision that we ought to know about for their amazing technological prowess and ability to ‘see’ and perceive surroundings like we do. Let us see what they are.
The main concern around image classification is categorization of images based on viewpoint variation, image deformation and occlusion, illumination and background clutter. Measuring the accuracy of the description of an image becomes a difficult task because of these factors. Researchers have come up with a novel way to solve the problem.
They use a data driven approach to classify the image. Instead of classifying what each image looks like in code, they feed the computer system with many image classes and then develop algorithms that look at these classes and “learn” about the visual appearance of each class. The most popular system used for image classification is Convolutional Neural Networks (CNNs).
Object detection is, simply put, defining objects within images by outputting bounding boxes and labels or tags for individual objects. This differs from image classification in that it is applied to several objects all at once rather than identifying just one dominant object in an image. Now applying CNNs to this technique will be computationally expensive.
So the technique used for object detection is region-based CNNs of R-CNNs. In this technique, first an image is scanned for objects using an algorithm that generates hundreds of region proposals. Then a CNN is run on each region proposal and only then is each object in each region proposal classified. It is like surveying and labelling the items in a warehouse of a store.
Object tracking refers to the process of tracking or following a specific object like a car or a person in a given scene in videos. This technique is important for autonomous driving systems in self-driving cars. Object detection can be divided into two main categories – generative method and discriminative method.
The first method uses the generative model to describe the evident characteristics of objects. The second method is used to distinguish between object and background and foreground.
Crucial to computer vision is the process of segmentation wherein whole images are divided or segmented into pixelgroups that are subsequently labeled and classified.
The science tries to understand the role of each pixel in the image. So, for instance, besides recognizing and detecting a tree in an image, its boundaries are depicted as well. CNNs are best used for this technique.
This method builds on semantic segmentation in that instead of classifying just one single dominant object in an image, it labels multiple images with different colours.
When we see complicated images with multiple overlapping objects and different backgrounds, we apply instance segmentation to it. This is done to generate pixel studies of each object, their boundaries and backdrops.
Besides these techniques to study and analyse and interpret images or a series of images, there are many more complex techniques that we have not delved into in this blog. However, for more on computer vision, you can peruse the DexLab Analytics website. DexLab Analytics is a premiere Deep Learning training institute In Delhi.