Evolution of Computer Vision

INTRODUCTION:

Before getting into computer vision let’s try to understand what important vision is. Humans rely on vision to walk around obstacles, read sign boards and read articles like and perform many other tasks. Vision is the highest bandwidth of sense and provides a substantial amount of information about the surrounding environment.

For years, computer scientists have been trying to give this feature or power to computers , thus giving birth to a subfield which is referred to as computer vision.

The basic idea of computer vision is to give computers the ability to comprehend high level information from images and videos.

The idea of giving computers the ability to see dates back to the 1950s where scientists were trying to mimic human vision systems and they were asking one basic question to computers “Tell us what you see”. Each and every living being has a different way for perceiving its surroundings , similarly the computers see the world by counting the number of pixels,by measuring shades of colors and spatial difference between objects.

EVOLUTION OF COMPUTER VISION:

Let me walk you through on how this field has evolved over the past decades.

Experimental setup of Hubel and Wiesel

The first important contribution in this field came from neurophysiologists — David Hubel and Torsten Wiesel in 1959. Their publication, entitled “Receptive fields of single neurons in the cat’s striate cortex”,described core response properties of visual cortical neurons as well as how a cat’s visual experience shapes its cortical architecture.They basically discovered that visual processing starts from simple structures like oriented edges.This went on to become the core principle of deep learning.

Russell Kirsch and image of his son which was the first ever digitally scanned image

The next breakthrough in computer vision came from the invention of the digital image scanner.In 1959, Russell Kirsch and his colleagues developed an apparatus that allowed transforming images into grids of numbers — the binary language machines could understand. And it’s because of their work that we now can process digital images in various ways.

In 1963, Lawrence Roberts in his paper “Machine perception of three-dimensional solids” described how to derive 3D information from 2D photographs.It basically reduced the visual world to geometric shapes and is widely considered as the precursors of modern computer vision.

In the 1960's when AI became an academic discipline and few optimistic researchers were under the impression they could build a computer as intelligent as human beings within 30 years.During that time MIT professor Seymour Papert along with his students tried to engineer a platform that could perform, automatically, background/foreground segmentation and extract non-overlapping objects from real-world images. Their project wasn’t a success. Fifty years later we are still nowhere near solving computer vision. But this is considered as official birth of computer vision as a scientific field.

Next important breakthrough came in 1982, David Marr, a British neuroscientist, published a paper — “Vision: A computational investigation into the human representation and processing of visual information”. In his paper he gave insights that vision is hierarchical.He also stated that the main aim of a vision system is to create a 3D representation of the environment to interact with it.

Flowchart of David Marr’s representational framework for vision

He developed a framework where algorithms were used to detect the edges, curves, corners etc.His framework included primal sketch of image, 21/2 sketch where surface, depth and discontinuities in image are pieced out and finally a 3D model where it is hierarchically organised in terms of surface and volumetric primitives.

Though this work was groundbreaking at this time but was very abstract and high level.It didn’t contain information about mathematically modelling and type of learning process to be used.

Neocognitron

Around the same time Kunihiko Fukushima came up with “neocognitron” which was the first ever neural network which was bascially a self-organizing artificial network of simple and complex cells that could recognize patterns and was unaffected by position shifts. This is considered as grandfather of today’s convnets.

In 1989 Yann LeCun came up with a backprop style learning algorithm to Fukushima’s convolutional neural network architecture. After a few years he released LeNet-5 the first modern convnet that introduced some of the essential ingredients we still use in CNNs today.

MNIST's dataset of handwritten digits (perhaps the most famous benchmark dataset in machine learning) is the result of his work.

In 1997 Jitendra Malik wrote a paper on his attempts to solve perceptual grouping.But even today CV researchers are struggling with this problem.

In the 1990's the focus of the field shifted from creating 3D models of objects to feature based object recognition.David Lowe’s work “Object Recognition from Local Scale-Invariant Features” was an example of this.

Paul Viola and Michael Jones in 2001 developed a real time face detection framework.Though not based on deep learning, the algorithm still had a deep learning flavor to it as, while processing images, it learned which features (very simple, Haar-like features)could help localize faces.Viola/Jones face detector is still widely used. It is a strong binary classifier that’s built out of several weak classifiers, during the learning phase, which is quite time-consuming in this case, the cascade of weak classifiers is trained using Adaboost.

Five years after the paper was published, Fujitsu released a camera with a real-time face detection feature that relied on the Viola/Jones algorithm.

In 2009, another important feature-based model was developed by Pedro Felzenszwalb, David McAllester, and Deva Ramanan — the Deformable Part Model.

The ImageNet Large Scale Visual Recognition Competition (ILSVRC), which you’ve probably heard about, started in 2010. It runs annually and includes a post-competition workshop where participants discuss what they’ve learned from the most innovative entries.The ImageNet challenge has become a benchmark in object category classification and object detection across a huge number of object categories.

In 2012, a team from the University of Toronto entered a convolutional neural network model (AlexNet) into the competition and that changed everything as it achieved an error rate of 16.4%.

CONCLUSION:

Despite recent progress in recent years we are not even close to solving computer vision.However there are multiple healthcare institutions and enterprises that apply CV to real world problems.

REFERENCES:

https://hackernoon.com/a-brief-history-of-computer-vision-and-convolutional-neural-networks-8fe8aacc79f3

https://www.researchgate.net/figure/Neocognitron-structure-with-three-stages-of-S-Cells-u-S-l-and-C-cellsu-C-l_fig1_221306526

http://www.doc.gold.ac.uk/~mas02fl/MSC101/Vision/Marr.html

https://www.google.co.in/search?sa=G&hl=en_GB&tbs=simg:CAQSqwIJUir81SHaMdIanwILEKjU2AQaAghCDAsQsIynCBphCl8IAxInnA1snASgBZ0EuRmJDbkQuhD6AagurS-nLqUupi6sL8U2yS3gKpEoGjCqzruNBAlqcVVvMIs0wvGdHdL9lwBrLRmDrUO3ZiwTVw7EsY7Zj4J6N81g5WnT4nQgBAwLEI6u_1ggaCgoICAESBFmQGLkMCxCd7cEJGo0BCh0KCm1vbm9jaHJvbWXapYj2AwsKCS9tLzAxbXdrZgoZCgZvZmZpY2XapYj2AwsKCS9tLzAyMXNqMQoYCgZzY2hvb2zapYj2AwoKCC9tLzA2emRqChwKCWNsYXNzcm9vbdqliPYDCwoJL20vMDRoeXhtChkKB2xpYnJhcnnapYj2AwoKCC9tLzA0aDhoDA&sxsrf=ALeKk00UUpenrxnmBXOqcdbD7e_uwcGEJQ:1588693575016&q=first+scanner&tbm=isch&ved=2ahUKEwiqh8jhiJ3pAhVFwzgGHRcqCWIQwg4oAHoECAkQKA&biw=1536&bih=674#imgrc=O2OYZ1jqO8LBLM

https://www.researchgate.net/figure/Left-Experimental-setup-from-Hubel-Wiesel-136-137-adapted-from-253-Chapter-11_fig4_337360115

https://www.scaruffi.com/mind/ai1960.html