Machine Vision Lecture Slides: Introduction & Concepts

Lecture 1
Machine Vision
Dr. Muhammad Usman
University of Engineering and Technology, Lahore
Faisalabad Campus
What is
Machine Vision?
“The use of devices for optical, non-contact
sensing to automatically receive and interpret
an image of a real scene in order to obtain
information and/or control machines or
Machine Vision (MV) is concerned with the engineering of
“Integrated mechanical-optical-electronic-software systems”
Natural Objects
Human Artifacts
Manufacturing Processes
Detect Defects
Improve Quality
Operating Efficiency
Safety of product & processes
It is also used to control machines used in manufacturing
Goal – Machine Vision
To create a model of the real world from images
– Recovering useful information about a scene from its twodimensional projections
– This recovery requires the inversion of many to one mapping
– Knowledge about the objects in the scene and projection
geometry is required
Goal – Steps
Image Acquisition
Feature Detection
• Students will learn the tools to acquire and subsequently process
the images using a problem-solving approach.
• This approach requires to assess the needs first and employ the
solution components accordingly from a list of available procedures
and algorithms.
• Although the course can be thought of as a mixture of algorithm
development and their mathematical implementation, but the
focus would be on algorithm understanding and development.
• The course material and associated lab-work, therefore, is aimed to
enable the students to have a multitude of image processing
techniques to be used later in the development of their semester
• The course is also aimed to motivate the students by introducing
state-of-the-art in the field of machine vision.
Course Contents
• Introduction: What is Machine Vision? Practical
Mechatronic applications
• Image Acquisition and Representation: Concepts
of representation of images, Digitization, binary,
gray and color (RGB, CMYK, HSI etc.) images,
elementary image processing functions
(enhancement and filtration of digital image in
spatial as well as in frequency domain), image
properties, adjacency conventions
• Fundamentals of Digital Image Processing: Point, Neighborhood,
and Geometric operations, Image restoration, Mathematical
• Segmentation: Thresholding, Edge-based segmentation, Regionbased Segmentation, Mean Shift Segmentation
• Image Analysis: Template Matching, Decision-theoretic approaches,
The Hough transform
• Object Recognition: Statistical Pattern Recognition, Neural Nets,
Syntactic Pattern recognition, Optimization techniques in
recognition, Fuzzy Systems
• Motion Analysis: Differential motion analysis methods, Optical
Flow, Analysis based on correspondence of interest points, Video
tracking, Motion models to aid tracking
• Applications to robotics and intelligent machine interaction will
also be included.
• Suggested Text:
1. Image Processing, Analysis and Machine Vision by
Milan Sonka, Vaclav Hlavac and Roger Boyle
2. David Vernon. Machine Vision
3. Computer and Machine Vision - Theory, Algorithms,
Practicalities By E R Davies
4. Digital Image Processing by Rafael C. Gonzalez
• Course Pre-requisites:
– MA-244: Probability and Statistics
– MA-234: Linear Algebra
CLOs – Machine Vision
CLO – 1
Understand fundamental
concepts of machine vision
systems and digital image
CLO – 2
Apply basic image processing
techniques including point,
neighbourhood, geometric and
morphological operations.
CLO – 2
Apply basic image processing
techniques including point,
neighbourhood, geometric and
morphological operations.
CLO – 3
Comprehend image processing
algorithms for segmentation,
image analysis, object
recognition and motion
CLO – 3
Comprehend image processing
algorithms for segmentation,
image analysis, object
recognition and motion
to other
Relationship to other fields
Techniques developed from many areas are used for recovering
Information from images
Image Processing:
• Usually transform images into other images
• Task of information recovery is left to a human user
• e.g., image enhancement, compression, correcting blur images
Machine Vision:
• Takes image as input but produce outputs such as
representation for the object contours in an image
• Emphasis is to recover information automatically
Image processing algorithms are useful in pre-processing
stage of MV
To enhance particular information and suppress noise
Compute Vision:
• Generates images from geometric primitives such as lines,
circles, and free-form surfaces
• It plays significant role in visualization and virtual reality
Machine Vision:
• Estimating the geometric primitives and other features from
CV is the synthesis of images and MV is the analysis
of images
MV is using curved and surface representation
from CV
And CV is using MV techniques for creating
realistic images
Pattern recognition:
• Classifies numerical and symbolic data
• Techniques of this field plays an important role in MV for
recognizing objects
• Many industrial application heavily rely on pattern recognition
Artificial Intelligence:
• It is concerned with designing intelligent systems and with
studying computational aspects of intelligence
• Techniques from this field is used in later stages of MV
Perception translates signals from the world into
symbols, cognition manipulates symbols, and action
translates symbols into signals that effects changes into
the world
CV is considered a subfield of AI
”MV produces measurements
or abstraction from
geometrical properties”
Lecture 2
Image Geometry
• Vision is the most powerful sense
• Allows to interact with the world without
making any direct physical contact
• Approx. 60% of your brain processing in the
process of visual perception
• Able to navigate seamlessly in this complex
Machine Vision
• Enterprise of building machines that can see or
emulate human vision. Why?
• Several reasons:
– Various routine works can be performed by machines
(e.g., tidying things up, driving home etc.) So that we
can have time to perform other tasks
– Human vision focus on qualitative not quantitative
• Not capable of precise measurements of the things in
physical world
– Build a system that can surpass human vision and
extract information about the world that human can
not perceive.
Vision deals
with images
An image is an array of
– A pixel has values
• Brightness
• Color
• Distance (Depth)
• Material (soon)
What we see in images
What machine sees
Vision is challenging when we
want to extract all the information
we observed in previous image
– Vision is Hard
– Multi-disciplinary (optics, Mechanical,
Electronics, and IT, sometimes
psychology and biology)
We have successful applications
Various Applications
• Factory Automation
– vision guided robotics
• Deals uncertainty in the physical world (parts alignment etc.)
– Visual inspection
• Deals with defects
• Optical character recognition (OCR)
– Car number identification
• Security
– Object detection and tracking
• Biometrics
– Eye scan, face detection
• Many others:
– Optical mouse, Gaming (Kinect), Snapchat, Autonomous
navigation (exploration, driverless cars), Medical imaging
Image formation
There are two parts of the image formation process
➒ The geometry of image formation
➒ Which determine where in the image plane the
projection of the point in the scene will be located
➒ The physics of light
➒ Which determine the brightness of a point in the
image plane as a function of scene illumination and
surface properties
Image formation
The position (x’, y’) in the image plane of the point at position
(x, y, z) in the scene is found by computing the line-of-sight
intersecting (x’, y’) and (x, y, z)
Image formation
The distance of the point (x, y, z) from the z-axis is 𝒓 =
π’™πŸ + π’šπŸ
Distance of the projected point (x’, y’) from the origin of the
image plane is 𝒓′ = 𝒙′𝟐 + π’š′𝟐
The two triangles are similar, so the ratio of the corresponding
sides of the triangle must be same
𝒇 𝒓′
𝒛 𝒓
Image formation
Similarly, the two triangles are similar, so as their ratios
𝒙′ π’š′ 𝒓′ 𝒇
= = =
𝒓 𝒛
The position of the point in the image plane is give by
π’š′ =
➒ Inspection is necessary because consumers who purchase
unsatisfactory products are less likely to make a repeat
➒ In the aerospace, automotive and food industries, failed
products can cause injury or even fatal accidents.
➒ Humans are engaged in inspection tasks
➒ but their performance is often less than satisfactory.
➒In some cases, human inspection is not even possible
➒ Thus, automated inspection is required.
➒Automated inspection is highly desirable in the inspection of
dangerous materials.
➒These include flammable, explosive or radioactive substances.
Machine Vision Applications
• Vision systems are currently being used extensively in
manufacturing industry, where they perform a very wide
variety of inspection, monitoring and control functions.
Domestic Applications (furniture polish, tooth paste to refrigerators)
Food Industry
Application Classified by Tasks
Present and projected applications of Machine Vision to natural
products may be classified according to the function they perform
Analyzing the Content
Count how many Sweets of each kind are put
in a box
Analyzing the Shape
Fruit, Vegetables, Animal Carcasses
Analyzing Texture
Bread, Cork Floor Tiles, Wood Panels
Assembling Food Products
Pizza, Kimchi, Meat Pies
Checking Aesthetic
Loaves, Cakes, Quiches, Trifles
Selective Washing of Cauliflower, Broccoli,
Chocolate Coating of Confectionery Bars, Icing
Counting Cherries on the surface of a Cake
Cakes, Chocolates, Trifles, Pies
Detecting Foreign
Seeds, Nut Shells, Twigs, Stones, Contact lenses
Detecting Surface
Mildew, Mud, Bird Excrement
Estimating the Size or
Fruit, Fish/Animal/Poultry Carcasses, Meat
Identifying Premier-Quality Fruit and Vegetables
There is a huge variety of tasks of this type
Fragile/Variable Products, Cream Cakes,
Sorting, Spraying,
Fruit from Leaves; Fish by Size/Species on a
Other Machine Vision Applications
Document Processing
Optical Character Recognition/Verification and
Document Authentification
Security and
Identifying Intruders in Secure Spaces
Medicine and Health
Cell Samples for Genetic Screening, Identifying
Cancer Cells
Target Identification and Fire Control
Both Pedestrian and Motor Vehicles
Forensic Science and
Finger-Print analysis,
Astronomy, Bio-Medical, Particle Physics,
Materials Engineering
As an absolute minimum, a machine vision must contain:
• some means of presenting the object to be inspected to
the camera;
• lights;
• camera;
• an electronic circuit card to digitize the signal from the
• computer, or dedicated electronic image-processing
• Software: if a computer is used for image processing;
• actuator: this may be anything from a simple
accept/reject gate, to a multi-axis robot.
• AVI operates by employing a camera to acquire an image of
the object being inspected and then utilizing appropriate
image processing hardware and software routines to find
and classify areas of interest in the image.
• Generally, AVI involves the following processing stages
• Image acquisition to obtain an image of the object to be
• Image enhancement to improve the quality of the
acquired image, which facilitates later processing;
• Segmentation to divide the image into areas of interest
and background. The result of this stage is called the
segmented image, where objects represent the areas of
• Feature extraction to calculate the values of parameters
that describe each object;
• Classification to determine what is represented by each
Image Processing
Image processing involves changing the nature of an image in
order to either
➒ improve its pictorial information for human interpretation
➒ render it more suitable for autonomous machine perception
Humans like their images to be sharp, clear and detailed;
machines prefer their images to be simple and uncluttered.
• We can subdivide different image processing algorithms into
broad subclasses
• Image enhancement. This refers to processing an image so
that the result is more suitable for a particular application. It
• Sharpening or de-blurring an out of focus image,
• Highlighting edges,
• Improving image contrast, or brightening an image,
• Removing noise.
• Image restoration This may be considered as reversing the
damage done to an image by a known cause, for example
• removing of blur caused by linear motion
• removal of optical distortions,
• removing periodic interference
Image segmentation This involves
subdividing an image into constituent parts
Isolating certain aspects of an image:
Finding lines, circles, or particular shapes in an image,
In an aerial photograph, identifying cars, trees, buildings, or
An Image Processing Task
➒ Acquiring the image First we need to produce a digital image
from a paper envelope.
➒ Preprocessing This is the step taken before the major image
processing task.
➒ To render the resulting image more suitable for the job
➒ It may involve enhancing the contrast, removing noise, or
identifying regions likely to contain the postcode.
➒ Segmentation Here is where we actually get the postcode; in
other words we extract from the image that part of it which
contains just the postcode.
• Representation and description These terms refer to
extracting the particular features which allow us to differentiate
between objects. Here we will be looking for curves, holes and
corners which allow us to distinguish the different digits which
constitute a postcode.
• Recognition and interpretation This means assigning labels
to objects based on their descriptors (from the previous step),
and assigning meanings to those labels. So we identify
particular digits, and we interpret a string of four digits at the
end of the address as the postcode.
Enhancing the edges of an image to make it appear sharper
Removing noise from an image
Removing motion blur from an image
Obtaining the edges of an image
Removing detail from an image
• Sampling refers to the process of digitizing a continuous
function suppose we take the function
y = sin( x) + sin(3 x)
• A continuous function can be reconstructed from its samples
provided that the sampling frequency is at least twice the
maximum frequency in the function—Nyquist Criterion
• We consider an image as a continuous function of two variables,
which is then sampled and quantized to convert it to produce a
digital image
• Sampling rate determines how many pixels the digital image
will have, and
• Quantization determines how many intensity levels will be used
to represent the intensity value at each sample point
➒ To view the scene, we record the energy reflected from it; we
may use visible light, or some other energy source
Using light
➒ It is the predominant energy source for images
➒ it is the energy source which human beings can observe directly.
➒ It has the advantage of being safe, cheap, easily detected and
readily processed with suitable hardware
➒ Two very popular methods of producing a digital image are with
– A Digital Camera
– A Flat-Bed Scanner
Digital Camera
• Such a camera has an array of photo-sites, these are silicon
electronic devices whose voltage output is proportional to
the intensity of light falling on them.
• For a camera attached to a computer, information from the
photo-sites is then output to a suitable storage medium.
• Generally this is done on hardware using a frame-grabbing card.
• This allows a large number of images to be captured in a very
short time in the order of one ten-thousandth of a second each.
• The images can then be copied onto a permanent storage
Digital Camera
• The output will be an array of values; each representing a
sampled point from the original scene. The elements of this
array are called picture elements, or more simply pixels.
• Digital still cameras use a range of devices, from floppy discs
and CD's, to various specialized cards and memory sticks.
Flat Bed Scanner
• This works on a principle similar to the digital camera.
• Instead of the entire image being captured at once on a large
array, a single row of photo-sites is moved across the image,
capturing it row-by-row as it moves.
• This is a much slower process but it is quite reasonable to allow
all capture and storage to be processed by suitable software.
Other Energy Sources
• Visible light is part of the electromagnetic spectrum: radiation
in which the energy takes the form of waves of varying
• X-rays
• x-ray tomography
• CAT (Computed Axial Tomography)
• As the beam moves around the object,
an image of the object can be
constructed; such an image is called
a tomogram.
• Consider image as being a two dimensional function, where the
function values give the brightness of the image at any given
• Image brightness values can be any real numbers in the range
0.0 (black) to 1.0 (white)
• The ranges of x and y will clearly depend on the image, but
they can take all real values between their minima and maxima
• A digital image differs from a image in that the x, y and f (x,y)
values are all discrete.
• X and y range from 1 to 256 each and brightness also range
from 0 (black) to 255 (white)
• it can be considered as a large array of sampled points, each of
which has a particular quantized brightness
• Neighborhoods have odd numbers of rows and columns
• Four basic type of images
• Binary Each pixel is just black or white.
• There are only two possible values for each pixel, we only need
one bit per pixel.
• Images for which a binary representation may be suitable
include text (printed or handwriting)
fingerprints or
architectural plans
• Grayscale (Intensity) Each pixel is a shade of grey, normally
from 0 (black) to 255 (white)
• It means each pixel can be represented by eight bits
• Natural range for image file handling
• Such images arise in medicine (X-rays), images of printed
works, and indeed 256 different gray levels is sufficient for the
recognition of most natural objects
• True color, or RGB Here each pixel has a particular color;
that color being described by the amount of red, green and
blue in it.
• If each of these components has a range 0-255 this gives a
total of 2553=16,777,216 different possible colors in the image
• Since the total number of bits required for each pixel is 24 such
images are also called 24-bit color image
• For every pixel there correspond three values.
• Indexed Most color images only have a small subset of the
more than sixteen million possible colors.
• Image has an associated color map, which is simply a list of all
the colors used in that image.
• Each pixel has a value which does not give its color (as for an
RGB image), but an index to the color in the map.
• It is convenient if an image has 256 colors or less, for then the
index values will only require one byte each to store.
• Suppose we consider a 512x512 binary image
512x512x1=262,144=32768 bytes=32768 Kb=0.033Mb
• A grayscale image of the same size requires
512x512x1=262,144 bytes=262.14 KB=0.262 MB
• For color images, in which each pixel is associated with 3 bytes
of color information
512x512x3=786,432 bytes=786.43 KB=0.786 MB
• Satellite images may be of the order of several thousand pixels
in each direction
We should be aware of the limitations of the human visual system.
Image perception consists of two basic steps:
• capturing the image with the eye,
• recognizing and interpreting the image with the visual cortex in
the brain
The combination and immense variability of these steps influences
the ways in we perceive the world around us
There are a number of things to bear in mind:
1. Observed intensities vary as to the background
• A simple block of grey will appear darker if we placed on a white
background than if it were placed on a black background.
• i.e. we don’t perceive grey scales “as they are”, but rather as they
differ from their surroundings
2. We may observe non-existent intensities as bars in
continuously varying grey levels
• Our visual system tends to undershoot or overshoot
around the boundary of regions of different intensities
Thank You