Everything happening in Computer Vision that you should know

saurabh409 Last Updated : 06 Jul, 2021

10 min read

This article was published as a part of the Data Science Blogathon

Introduction

Since the onset of life, human vision is essential, beautiful, and complex. Until a decade ago, machines were not able to comprehend the visual world, as efficiently as we did.

Computer vision is an interdisciplinary scientific field that deals with how computers can gain comprehend the visual world in the form of images and videos. From an engineering perspective, it seeks to replicate and automate tasks that the human visual system can do.

Computer vision uses Artificial Intelligence (AI) to train computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they “see”.

Computer scientists have been trying to give vision to computer vision for half a century, which led to the field of computer vision. The goal is to give computers the ability to extract a high-level understanding of the visual world from digital images and videos. If you’ve used a digital camera or smartphone, you know that computers are already good at capturing images and videos in incredible fidelity and detail, better than humans.

Today, a lot of things have changed for the good of computer vision:

1. Mobile tech with HD cameras has made quite a huge collection of images and videos available to the world.

2.Computing power has increased and has become easily accessible and more affordable.

3. Specific hardware and tools designed for computer vision are more widely available. We have discussed some tools later in this article.

These advancements have been beneficial for computer vision. Accuracy rates for object identification and classification have gone from 50% to 99% in a decade, resulting in today’s computers being more accurate and quick than humans at detecting visual inputs.

Where is Computer Vision used?

1. Healthcare

Computer vision is heavily used in healthcare. Medical diagnostics relies heavily on the study of images, scans, and photographs. The analysis of ultrasound images, MRI, and CT scans are part of the standard repertoire of modern medicine, and computer vision technologies promise not only to simplify this process but also to prevent false diagnoses and reduce treatment costs. Computer vision isn’t intended to replace medical professionals but to facilitate their work and support them in making decisions. Image segmentation helps in diagnostics by identifying relevant areas on 2D or 3D scans and colorizing them to facilitate the study of black and white images.

Image Segmentation of Lung CT-Scans

This technology is used in the COVID-19 pandemic. Image segmentation can help physicians and scientists identify COVID-19 and analyze and quantify the infection and course of the disease. The trained image recognition algorithm identifies suspicious areas on CT scans of the lungs. It determines their size and volume so that the disease of affected patients can be clearly tracked. In monitoring a new disease, computer vision not only makes it easier for physicians to diagnose the condition and monitor it during therapy, but the technology also generates valuable data for studying the disease and its course. Researchers also benefit from the collected data and the generated images, allowing more time to be spent on experiments and tests than data collection.

COVID-19 Tracking via CT-Scans

2. Automotive Industry

Self-driving cars belong to the use cases in artificial intelligence and have received the most media attention in recent years. This can probably be explained more by the idea of autonomous driving being more futuristic than by the actual consequences of the technology. Several machine learning problems are packed into it, but computer vision is an important core element in their solution. For example, the algorithm (the so-called “agent”) by which the car is controlled must be aware of the car’s environment at all times.

The agent needs to know how the road goes, where other vehicles are in the vicinity, the distance to potential obstacles and objects, and how fast these objects are moving on the road to adapt to the changing environment continually. For this purpose, autonomous vehicles are equipped with extensive cameras that film their surroundings over a wide area. The resulting footage is then monitored in real-time by an image recognition algorithm, which requires that the algorithm can search for and classify relevant objects not only in static images but in a constant flow of images.

Object recognition and classification in road traffic

This technology exists and is used industrially. The problem in road traffic stems from its complexity, volatility, and the difficulty of training an algorithm so that even possible failure of the agent in complex exceptional situations can be excluded. This exposes Computer vision’s Achilles’ heel: the need for large amounts of training data, the generation of which is associated with high costs in road traffic.

3. Retail: Customer Behavior tracking

Online stores like Amazon use their digital platform’s analysis capabilities to analyze customer behavior in detail and to optimize the user experience. The retail industry is also trying to optimize the experience of its customers and make it ideal. Until now, the tools to automatically capture the interaction of people with displayed items have been missing. Computer vision is now able to close this gap for the retail industry.

In combination with existing security cameras, algorithms can automatically evaluate video material and study customer behavior. For example, the current number of people in the store can be counted at any time, which is a useful application during the COVID-19 pandemic with its restrictions on the maximum number of visitors allowed in stores. The ability to track the attention that individual shelves and products receive from customers is revolutionary. Specialized algorithms can detect the direction of people’s gaze and thus measure how long any given object is viewed by passers-by.

With the help of this technology, retailers now have the opportunity to catch up with online trading and to evaluate customer behavior within their stores in detail. This increases sales minimize the time spent in the store and optimize the distribution of customers within the store.

Customer Behavior Tracking

4. Agriculture

Modern technologies enable farmers to cultivate ever-larger fields efficiently. This means that these areas must be checked for pests and plant diseases because if overlooked, plant diseases can lead to painful harvest losses and crop failures.

Machine learning is a savior because large amounts of data can be generated using drones, satellite images, and remote sensors. Modern technology facilitates the collection of various measured values, parameters, and statistics, which can be monitored automatically. Therefore, farmers have an around-the-clock overview of soil conditions, irrigation levels, plant health, and local temperatures, despite the extensive planting of larger fields. Machine learning algorithms evaluate this data so that the farmer can use this information to react to potential problem areas at an early stage and distribute available resources efficiently.

The analysis of image material allows plant diseases to be detected at an early stage. Just a few years ago, plant diseases were often only noticed when they were already able to spread. The extensive spread can now be detected and stopped at an early stage using early warning systems based on computer vision, which means that farms lose less crop and save on counter-measures such as pesticides since comparatively smaller areas need to be treated.

Example: Wheat rust detection with computer vision

How does Computer Vision work?

Images on computers are often stored as big grids of pixels. Each pixel is defined as color, stored as a combination of 3 additive primary colors: RGB(Red Green Blue). These are combined in varying intensities to represent different colors. Colors are stored inside pixels.
Let’s consider a simple algorithm to track a bright orange football on a football field. For that, we’ll take the RGB value of the centermost pixel. With that value saved, we can give a computer program an image, and ask it to find the pixel with the closest color match. The algorithm will check each pixel at a time, calculating the difference from the target color. Having looked at every pixel, the best match is likely a pixel from the orange ball. We can run this algorithm for every frame of the video and track the ball over time. But, if one of the teams is wearing an orange jersey, the algorithm might get confused. So, this approach doesn’t work for features larger than a single pixel, such as edges of objects, which are made up of many pixels.
To identify these features in images, computer vision algorithms have to consider small regions of pixels, called patches. For example, an algorithm that finds vertical edges in a scene, to help a drone navigate safely through a field of obstacles. For this operation, a mathematical notation is used, which is called a kernel or filter. It contains the values for a pixel-wise multiplication, the sum of which is saved into the center pixel.
Earlier, an algorithm called Viola-Jones Face Detection was used, which combined multiple kernels to detect features of the faces. Today, the newest and trending algorithms on the block are Convolutional Neural Networks (CNN).

Convolutional Neural Networks (CNNs)

An artificial neuron is the building block of a neural network. It takes a series of inputs, and multiplies each by a specified weight, then sums those values altogether. The input weights are equivalent to kernel values, but unlike a predefined kernel, neural networks can learn their own useful kernels that are able to recognize interesting features in images.
CNNs use banks of these neurons to process image data, each outputting a new image, essentially digested by different learned kernels. These outputs are then processed by subsequent layers of neurons, allowing for repeated convolutions.
The first convolutional layer might find things like edges, the next layer might convolve on those edge features to recognize simple shapes, comprised of edges, like corners. The next layer might convolve on those corner features, and contain neurons that can recognize simple objects, like mouths and eyebrows. This keeps repeating, building up in complexity, until there’s a layer that does a convolution that recognizes all features: eyes, ears, mouths, nose, etc., and says “it’s a face”.
Once we’ve isolated a face, we can apply more specialized computer vision algorithms to pinpoint facial landmarks, like the tip of the nose and the corners of the mouth. This can be used to determine if the eyes are open, the position of the eyebrows, which is easy once you have the landmarks. The relative distance between eyes and eyebrows can be compared to reveal surprise or delight.
Similarly, all of this information can be interpreted by emotion recognition algorithms, giving computers the ability to infer basic ‘moods’. This can help computers become context-sensitive, i.e., aware of their surroundings.

CNNs aren’t required to be many layers deep, but they usually are, in order to recognize complex objects and scenes. This technique is considered deep learning.

Recent advancements have enabled computers to track and identify hand gestures as well. The exciting thing, however, is that scientists are just getting started, enabled by advancements in computing like superfast GPUs.

CNNs are applied to many image recognition tasks, beyond faces, like recognizing handwritten text, spotting tumors in CT scans, and monitoring traffic flow on roads.

Top Tools used for Computer Vision

Currently, there are various online tools that provide algorithms for Computer Vision and a platform to execute these algorithms or create new ones. These tools also provide an environment for connecting with various other software and technologies in conjugation with computer vision. So let’s check out some of the Computer Vision tools now!

1. OpenCV

OpenCV (Open Source Computer Vision Library) is an open-source computer vision library that contains many different functions for computer vision and machine learning.

Created by Intel and originally released in 2000, OpenCV has many different algorithms related to computer vision that can perform a variety of tasks including facial detection and recognition, object identification, monitoring moving objects, tracking camera movements, tracking eye movements, extracting 3D models of objects, creating an augmented reality overlay with a scenery, recognizing similar images in an image database, etc. OpenCV has interfaces for C++, Python, Java, MATLAB etc. and it supports various operating systems such as Windows, Android, Mac OS, Linux, etc.

2. TensorFlow

TensorFlow is a free open-source platform that has a wide variety of tools, libraries, and resources for Artificial Intelligence and Machine Learning which includes Computer Vision. It was created by the Google Brain team and initially released on November 9, 2015. TensorFlow can be used to build and train Machine Learning models related to computer vision that include facial recognition, object identification, etc. Google also released the Pixel Visual Core (PVC) in 2017 which is an image, vision, and Artificial Intelligence processor for mobile devices.

This Pixel Visual Core also supports TensorFlow for machine learning. TensorFlow supports languages such as Python, C, C++, Java, JavaScript, Go, Swift, etc. but without an API backward compatibility guarantee. There are also third-party packages for languages like MATLAB, C#, Julia, Scala, R, Rust, etc.

3. MATLAB

MATLAB is a numerical computing environment that was developed by MathWorks in 1984. It contains the Computer Vision Toolbox which provides various algorithms and functions for computer vision. These include object detection, object tracking, feature detection, feature matching, camera calibration in 3-D, 3D reconstruction, etc. You can also create and train custom object detectors in MATLAB using machine learning algorithms such as YOLO v2, ACF, Faster R-CNN, etc. These algorithms can also be run on multicore processors and GPUs to make them much faster. The MATLAB toolbox algorithms support code generation in C and C++.

4. CUDA

CUDA (Compute Unified Device Architecture)is a parallel computing platform that was created by Nvidia and released in 2007. It is used by software engineers for general purpose processing using the CUDA-enabled graphics processing unit or GPU. CUDA also has the Nvidia Performance Primitives library that contains various functions for image, signal, and video processing.

Some other libraries and collections include GPU4Vision, OpenVIDIA for popular computer vision algorithms on CUDA, MinGPU which is a minimum GPU library for Computer Vision, etc. Developers can program in various languages like C, C++, Fortran, MATLAB, Python, etc. while using CUDA.

5. SimpleCV

SimpleCV is an open-source computer vision framework that can be used for building various computer vision applications. SimpleCV is simple (as the name suggests!) and you can use various advanced computer vision libraries with it such as OpenCV without learning all the CV concepts in-depth such as file formats, buffer management, color spaces, eigenvalues, bit depths, matrix storage, bitmap storage, etc. SimpleCV allies you to experiment in computer vision using the images or video streams from webcams, FireWire, mobile phones, Kinects, etc.

It is the best framework if you need to perform some quick prototyping. You can use SimpleCV with Mac, Windows, and Ubuntu Linux operating systems.

6. GPUImage

GPUImage is a framework or rather, an iOS library that allows you to apply GPU-accelerated effects and filters to images, live motion video, and movies. It is built on OpenGL ES 2.0. Running custom filters on a GPU calls for a lot of code to set up and maintain. GPUImage cuts down on all of that boilerplate and gets the job done for you.

Computer Vision as a Service

1. Microsoft Azure

Microsoft API allows you to analyze images, read the text in them, and analyze video in near-real-time. You can also flag adult content, generate thumbnails of images and recognize handwriting.

2. Google Cloud and Mobile Vision APIs

Google Cloud Vision API enables developers to perform image processing by encapsulating powerful machine learning models in a simple REST API that can be called in an application. Also, its Optical Character Recognition (OCR) functionality enables you to detect text in your images.

The Mobile Vision API lets you detect objects in photos and video, using real-time on-device vision technology. It also lets you scan and recognize barcodes and text.

3. Amazon Rekognition

Amazon Rekognition is a deep learning-based image and video analysis service that makes adding an image and video analysis to your applications, a piece of cake. The service can identify objects, text, people, scenes, and activities, and it can also detect inappropriate content, apart from providing highly accurate facial analysis and facial recognition for sentiment analysis.

Author

Saurabh Mahajan, a tech enthusiast

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

saurabh409

Advanced Computer Vision

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Everything happening in Computer Vision that you should know

Introduction

Where is Computer Vision used?

1. Healthcare

2. Automotive Industry

3. Retail: Customer Behavior tracking

4. Agriculture

How does Computer Vision work?

Convolutional Neural Networks (CNNs)

Top Tools used for Computer Vision

1. OpenCV

2. TensorFlow

3. MATLAB

4. CUDA

5. SimpleCV

6. GPUImage

Computer Vision as a Service

1. Microsoft Azure

2. Google Cloud and Mobile Vision APIs

3. Amazon Rekognition

Author

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie