Uploaded by Hrithik Sagar Rachakonda

CAL401 Intro1

advertisement
Computer Vision
CSA 401
Department of CSE
School of Engineering & Technology
Sharda University
Dr. Ali Imam Abidi
Vision of the University
To serve the society by being a global University of
higher learning in pursuit of academic excellence,
innovation and nurturing entrepreneurship.
Mission of the University
1. Transformative educational experience
2. Enrichment by educational initiatives that
encourage global outlook
3. Develop research, support disruptive
innovations and accelerate entrepreneurship
4. Seeking beyond boundaries
Outcome-Based Education (OBE)
An education theory that equates learning of a
subject matter through an educational system
with few goals and whether those goals are
being achieved or not and up to what degree.
Course Outcomes (COs)
• Course outcomes are the goals which would be assessed
against the learning of that course itself.
• This would provide an estimate of the degree of
outcomes that have been achieved at the end of the
course and how much of the subject matter itself.
Course Outcomes (COs)
1. Define the Fundamentals of Computer Vision and Computer
Graphics and relate them with real world applications
2. Explain Image formation models and Foundations for
mathematical basis for various projection systems
3. Apply Image processing techniques such as Segmentation and
Edge Detection for real time and real world applications.
4. Analyze various feature extraction techniques for different
problem domain.
5. Evaluate Pattern Recognition Using Clustering, Classification,
Supervised Learning and Unsupervised Learning Techniques
6. Build computer vision applications for real world Applications.
•Computer Vision and Nearby
Fields
Computer Graphics: Models
to Images
Photography: Images to
Images
Image Processing: Captured
Images to workable (refined)
images
Computer Vision:
Images to Models
•Computer Vision
•
Make computers understand images and
video.
•What kind of
scene?
•Where are the
cars?
•How far is the
building?
•…
•Vision is really hard
• Vision is an amazing feat of natural
intelligence
• Visual cortex occupies about 50% of Macaque brain
• More human brain devoted to vision than anything else
•Is that a queen or
a bishop?
•Why computer vision matters
•Safety
•Health
•Comfort
•Fun
•Security
•Access
•Ridiculously brief history of computer vision
• 1966: Minsky assigns computer vision
as an undergrad summer project
• 1960’s: interpretation of synthetic
worlds
• 1970’s: some progress on interpreting
selected images
• 1980’s: ANNs come and go; shift toward
geometry and increased mathematical
rigor
• 1990’s: face recognition; statistical
analysis in vogue
• 2000’s: broader recognition; large
annotated datasets available; video
processing starts
•Guzman ‘68
•Ohta Kanade ‘78
•Turk and Pentland ‘91
•Optical character recognition (OCR)
• Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software
•Digit recognition, AT&T labs
•http://www.research.att.com/~yann/
•License plate readers
•http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
•Face detection
• Many new digital cameras now detect faces
• Canon, Sony, Fuji, …
•Smile detection
•Sony Cyber-shot® T70 Digital Still Camera
•3D from thousands of images
•Building Rome in a Day: Agarwal et al. 2009
•Object recognition (in supermarkets)
•LaneHawk by EvolutionRobotics
•“A smart camera is flush-mounted in the checkout lane, continuously
watching for items. When an item is detected and recognized, the
cashier verifies the quantity of items that were found under the basket,
and continues to close the transaction. The item can remain under the
basket, and with LaneHawk,you are assured to get paid for it… “
•Vision-based biometrics
•“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
•wikipedia
•Login without a password…
•Fingerprint scanners on
many new laptops,
other devices
•Face recognition systems now
beginning to appear more widely
http://www.sensiblevision.com/
•Object recognition (in mobile phones)
• Point & Find, Nokia
• Google Goggles
•Special effects: shape capture
•The Matrix movies, ESC Entertainment, XYZRGB, NRC
•Special effects: motion capture
•Pirates of the Carribean, Industrial Light and Magic
•Sports
•Smart cars
•Slide content courtesy of Amnon Shashua
• Mobileye
• Vision systems currently in high-end BMW, GM,
Volvo models
• By 2010: 70% of car manufacturers.
•Google cars
•http://www.nytimes.com/2010/10/10/science/10google.html?ref=artificialintelligence
•Interactive Games: Kinect
• Object Recognition:
http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
•Vision in space
•NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
• Vision systems (JPL) used for several tasks
•
•
•
•
Panorama stitching
3D terrain modeling
Obstacle detection, position tracking
For more, read “Computer Vision on Mars” by Matthies et al.
•Industrial robots
•Vision-guided robots position nut runners on wheels
•Mobile robots
•NASA’s Mars Spirit Rover
•http://en.wikipedia.org/wiki/Spirit_rover
•http://www.robocup.org/
•Saxena et al. 2008
•STAIR at Stanford
•Medical imaging
•3D imaging
•MRI, CT
•Image guided surgery
•Grimson et al., MIT
The Computer Vision Hierarchy
• Low-level vision: process image for feature
extraction (edge, corner, or optical flow).
• Middle-level vision: object recognition, motion
analysis, and 3D reconstruction using features
obtained from the low-level vision.
• High-level vision: interpretation of the evolving
information provided by the middle level vision as
well as directing what middle and low level vision
tasks should be performed.
The Computer Vision Hierarchy
• low-level
Image → image
• mid-level
image → features
• high-level
features → analysis
Low Level Vision (≈ Digital Image Processing)
•
Image Acquisition
•
•
•
Image captured by a sensor (digitized)
Pre-processing
•
Denoising
•
Sharpening/Blurring
•
Image enhancement
•
Contrast stretch
Image Segmentation
•
Object separation from background
•
Region growing, edge linking
Mid Level Vision
• Attributes/Features Selection
• Attributes/Feature Extraction
•
Features like edges, points (corners)
• Feature subset selection etc.
High Level Vision
• ‘High-level’ vision lacks a single, agreed upon
definition.
• It might usefully be defined as those stages of visual
processing that transition from analyzing local image
structure to analyzing structure of the external
world that produced those images.
High Level Vision
•
The central computational challenge of vision arises
from the fact that it is an ‘ill-posed’ problem
•
•
the external world is three dimensional and made up of
surfaces with different reflectance properties, yet the
visual system must make do with a pair of twodimensional retinas containing only a handful of
photoreceptor types.
depending on:
•
•
•
the object’s position relative to the viewer
the configuration of light sources,
the presence of other objects
any given object can cast an effectively infinite number of
different images onto the retina
While much research has focused on object
recognition as a central task in vision, many
real world scenes are only poorly described
by object labels
such labels provide an extremely impoverished
description of the scene, and it is unclear whether
some of the labels (e.g. ‘planter’) are valid for
animals besides humans
A segmentation-based description
of the scene is better, but still
represents a shadow of the total
information content of
the scene. We are easily able to
extra a wealth of information from
this scene, even for objects that
are problematic to label.
This additional information
includes 3D information, such as
normal vector directions, and even
more abstract task-driven
information, such as a whether a
portion of the scene is
in the open or under cover. Such
tasks may represent a more framing
context for high-level vision,
especially in non-human animals.
High Level Vision
• Object Recognition
•
•
Detection of classes of objects (faces, motorbikes, trees
etc.) in images.
Recognition of specific objects such as George Bush/Bill
Clinton or a specific machine part etc.
• Classification of images or parts of images for
medical or scientific applications.
• Recognition of events in surveillance videos.
• Measurement of distances for robotics.
High Level Vision Tools
• Graph-Matching: A*, Constraint Satisfaction, Branch
and Bound Search, Simulated Annealing
• Learning Methodologies: Decision Trees, Neural
Nets, SVMs, EM Classifier.
• Probabilistic Reasoning, Belief Propagation,
Graphical Models Graphical Models.
Overview of Diverse
Computer Vision
Applications
Document Image Analysis
• algorithms and techniques that are applied to
images of documents to obtain a computer-readable
description from pixel data.
Document Image
(non-textual
format)
Pixel Data
Algorithms &
Techniques
Textual format)
• Optical Character Recognition (OCR) software that
recognizes characters in a scanned document.
• objective of document image analysis: is to
recognize the text and graphics components in
images of documents, and to extract the intended
information as a human would.
A hierarchy of document processing subareas listing the types of document
components dealt within each subarea.
(Reproduced with permission from O’Gorman & Kasturi 1997.)
Two components + Pictures
1. Textual processing:
deals with the text components of a document
image.
Some tasks here are:
•
•
•
determining the skew (any tilt at which the document
may have been scanned into the computer),
finding columns, paragraphs, text lines, and words,
and finally recognizing the text (and possibly its attributes
such as size, font etc.) by optical character recognition
(OCR).
Two components
2. Graphics processing:
deals with the non-textual line and symbol
components that make up:
•
line diagrams, delimiting straight lines between text
sections, company logos etc.
Pictures
•
•
are a third major component of documents,
but except for recognizing their location on a page,
further analysis of these is usually the task of other image
processing and machine vision techniques.
Biometrics
• Associating an identity with an individual is called
personal identification.
• resolving the identity of a person can be categorized
into two fundamentally distinct types of problems:
•
•
Verification
Recognition
Identity
Resolution
Verification
Recognition
(Identification)
• Verification (authentication): refers to the problem
of confirming or denying a person’s claimed identity.
• Identification: refers to the problem of establishing
a subject's identity•
•
either from a set of already known identities (closed
identification problem) or
otherwise (open identification problem).
• The term positive personal identification typically
refers (in both verification as well as identification
context) to identification of a person with high
certainty.
• An engineering approach to the (abstract) problem
of authentication of a person's identity is to reduce
it to the problem of authentication of a concrete
entity related to the person.
• these entities include:
•
•
a person's possession {"something that you possess''),
e.g., permit physical access to a building to all persons
whose identity could be authenticated by possession of a
key;
person's knowledge of a piece of information {"something
that you know""), e.g., permit login access to a system to
a person who knows the user-id and a password
associated with it
• Some systems, e.g., ATMs, use a combination of
"something that you have" (ATM card) and
"something that you know" (PIN) to establish an
identity.
ATM card
+
PIN
Access
• problem with the traditional approaches of
identification using possession as a means of
identity is that the possessions could be lost, stolen,
forgotten, or misplaced.
• Further, once in control of the identifying
possession, by definition, any other "unauthorized"
person could abuse the privileges of the authorized
user. The problem with using knowledge as an
identity authentication mechanism is that it is
difficult to remember the passwords/PINs; easily
recallable passwords/FINs etc.
• another approach to positive identification has been
to reduce the problem of identification to the
problem of identifying physical characteristics of the
person. The characteristics could be either a
person's physiological traits, e.g., fingerprints, hand
geometry, etc. or her behavioral characteristics, e.g.,
voice and signature. This method of identification of
a person based on his/her physiological/behavioral
characteristics is called Biometrics.
Biological Measurements Qualify To Be A Biometric
•
Any human physiological or behavioral characteristic could
be a biometrics provided it has the following desirable
properties:
i.
Universality: which means that every person should have the
characteristic.
ii.
Uniqueness: which indicates that no two persons should be the
same in terms of the characteristic.
iii.
Permanence: which means that the characteristic should be
invariant with time, and
iv.
Collectability: which indicates that the characteristic can be
measured quantitatively
•
important requirements:
i.
Performance: which refers to the achievable identification accuracy,
the resource requirements to achieve an acceptable identification
accuracy, and the working or environmental factors that affect the
identification accuracy.
ii.
Acceptability: which indicates to what extent people are willing to
accept the biometric system, and
iii.
Circumvention: which refers to how easy it is to fool the system by
fraudulent techniques.
Biometrics Markers: Overview
No single biometrics is expected to effectively satisfy the
needs of all identification (authentication) applications.
• Each biometrics has its strengths and limitations; and
accordingly, each biometric appeals to a particular
identification (authentication) application.
1. Voice
2. Infrared Facial and Hand Vein Thermograms
•
3.
4.
5.
6.
7.
Fingerprints
Face
Iris
Ear
Gait
8. Keystroke Dynamics
9. DNA
10. Signature
11. Retinal Scan
12. Hand and Finger Geometry
Object Recognition
• Object recognition system finds objects in the real
world from an image of the world, using object
models which are known a-priori.
• Object recognition problem can be defined as a
labeling problem based on models of known
objects.
• Formally, given an image containing one or more
objects of interest (and background) and a set of
labels corresponding to a set of models known to
the system, the system should assign correct labels
to regions, or a set of regions, in the image.
• The OR problem is closely tied to the segmentation
problem.
• Without at least a partial recognition of objects,
segmentation cannot be done, and without
segmentation, object recognition is not possible.
OR: System Components
•
•
•
•
Model database (also called modelbase)
Feature detector
Hypothesizer
Hypothesis verifier
1. Model database:
contains all the models known to the system
information in the model database depends on the
approach used for the recognition.
• a qualitative or functional description and/or precise
geometric surface information.
• representation of an object should capture all relevant
information without any redundancies and should
organize this information in a form that allows easy
access by different components of the object recognition
system.
• Feature is some attribute of the object that is considered
important in describing and recognizing the object in relation
to other objects.
•
•
• Size, color, and shape are some commonly used features.
2. Feature Detector:
•
•
•
•
•
applies operators to images and identifies locations of
features that help in forming object hypotheses.
features used by a system depend on the types of objects
to be recognized and the organization of the model
database.
Using the detected features in the image, the
hypothesizer assigns likelihoods to objects present in the
scene.
This reduce the search space for the recognizer using
certain features.
The modelbase is organized using some type of indexing
scheme to facilitate elimination of unlikely object
candidates from possible consideration.
2.a. Feature extraction
• Which features should be detected, and how can
they be detected reliably?
• Most features can be computed in two-dimensional
images but they are related to three-dimensional
characteristics of objects.
• Due to the nature of image formation process, some
features would be easy to compute than the others.
2.b. Feature-model matching
• How can features in images be matched to models
in the database?
• In most object recognition tasks, there are many
features and numerous objects.
• An exhaustive matching approach will solve the
recognition problem but may be too slow to be
useful.
• Effectiveness of features and efficiency of a
matching technique must be considered in
developing a matching approach.
3. Hypotheses formation
• How can a set of likely objects based on the feature
matching be selected?
• How can probabilities be assigned to each possible
object?
• The hypothesis formation step is basically a heuristic
to reduce the size of the search space.
• uses knowledge of the application domain to assign
some kind of probability or confidence measure to
different objects in the domain.
• reflects the likelihood of the presence of objects
based on the detected features.
https://www.mygreatlearning.com/blog/
yolo-object-detection-using-opencv/
https://www.kdnuggets.c
om/2020/08/metricsevaluate-deep-learningobject-detectors.html
https://apple.github.io/turicreate/docs/userguide/object_detection/
https://www.fritz.ai/image-recognition/
Object Verification
•
•
•
•
•
How can object models be used to select the most likely
object from the set of probable objects in a given
image?
The presence of each likely object can be verified by
using their models.
One must examine each plausible hypothesis to verify
the presence of the object or ignore it.
If the models are geometric, it is easy to precisely verify
objects using camera location and other scene
parameters.
In other cases, it may not be possible to verify a
hypothesis.
Medical Image Analysis
An Overview
Medical Imaging Modalities
• looking into modalities you would be having CT, MR,
ultrasound, microscopy, optical coherence
tomography.
• Another critical part you will obviously be getting
exposed to is about organ appearances module.
• basically how different organs are going to appear in
different modalities whether in a healthy state or in
a disease state
• E.g., bone would appear brighter on x rays and CT’s
would appear darker on T1 MR and T2 MR.
https://www.researchgate.net/figure/Typology-of-Medical-Imaging-Modalities_fig1_319535615
https://www.researchgate.net/figure/Overview-of-common-clinical-imagingmodalities-which-have-potential-for-multimodal_fig11_280117631
https://cancerimagingjournal.biomedcentral.com/articles/10.1186/s40644
-020-00312-3
•
•
•
•
•
•
Your fatty regions would appear brighter on MR and
darker on x rays,
your fatty regions would again appear brighter on
ultrasound as well,
whereas a water filled region which would appear
brighter on MR will appear darker on ultrasound.
Now different organs under different modalities will
have different sort of ways in which they are viewed.
water would appear darker in ultrasound and brighter in
MR
whereas fat would appear brighter in ultrasound and
brighter in MR.
Modules of Medical Image Analysis
https://link.springer.com/chapter/10.1007/978-3-540-74658-4_62
Medical Image formats and protocols
Medical images can be efficiently processed,
objectively evaluated and made available at many
places at the same time by means of appropriate
communication networks and protocols.
•
•
•
PACS: Picture Archiving and Communication Systems
DICOM: Digital Imaging and Communications in
Medicine.
Medical Image Analysis covers four major
areas:
1. Image formation includes all the steps from capturing the
image to forming a digital image matrix.
2. Image visualization refers to all types of manipulation of this
matrix, resulting in an optimized output of the image.
3. Image analysis includes all the steps of processing, which
are used for quantitative measurements as well as abstract
interpretations of medical images. This requires a prior
knowledge on the context and content of the images.
Medical Image Analysis covers four major
areas:
4. Image management sums all techniques that provide the
efficient storage, communication, transmission, archiving,
and access (retrieval) of image, since an uncompressed
radiograph may require several megabytes of storage
capacity. The methods of telemedicine are also a part of the
image management.
Low level & High level Medical Image Processing
•
•
manual or automatic techniques,
which can be implemented without
a-priori knowledge on the specific
content of images.
•
Primarily image analysis methods
•
Feature extraction, classification
•
Prior knowledge is consequential
This type of algorithm has similar
effects regardless the content of the
images.
•
Interpretations etc.
•
E.g., SURF algorithm, SVM for
images etc.
•
Morphological techniques
•
E.g., histogram stretching of a
radiograph improves the contrast as
it does on any holiday photograph.
Degrees of abstraction of medical image data
•
The raw data level records an image as a whole. Therefore, the
totality of all pixels is regarded on this level.
•
The pixel level refers to discrete individual pixels.
•
The edge level represents the one-dimensional (1-D) structures,
which are composed of at least two neighbored pixels.
•
The texture level refers to two-dimensional (2-D) structures. On
this level, however, the delineation of the area’s contour may be
unknown.
•
The region level describes 2-D structures with a well-defined
boundary.
•
The object level associates textures or regions with a certain
meaning or name, i. e. semantics is given on this level.
•
The scene level considers the ensemble of image objects in spatial
and /or temporal terms.
Enhancement (Why?)
• Can’t distinguish between tissues
The nature of the physiological system under
investigation and the procedures used in imaging
may diminish the contrast and the visibility of
details.
• Data is too noisy for computer algorithm to
perform well
Medical images are often deteriorated by noise due
to various sources of interference and other
phenomena that affect the measurement processes
in imaging and data acquisition systems.
• Imaging artifacts interfere with visualization
or computer processing.
How?
•
•
•
•
Increase contrast
Remove noise
Emphasize edges: Edge boost, Unsharp masking
Modify shapes
Contrast Enhancement by Histogram Equalization
Enhancement by Adaptive Wavelet Shrinkage Denoising
Enhancement by adaptive filtering:
Noise or Speckle Reduction
Convolution
•
mathematical operation, i. e., convolution is performed using
templates.
•
A template is a mostly small, squared mask with odd lateral
length. This template is mirrored along two axes (hence the
name convolution is commonly used) and positioned in one
corner of the input image. The image pixels under the mask
are named kernel.
•
The sliding average (a) and the binomial low-pass filter (b) cause a smoothing of the image.
The binomial high-pass filter (c), however, increases contrast and edges, but also the noise
in the image. The templates (a) to (c) must be normalized to make sure that range domain
of values is not exceeded. The contrast filter (d) is based on integer pixel values. The
convolution with (d) is, therefore, very easy to calculate. The anisotropic templates (e) and
(f) belong to the family of Sobel operators.
Registration
•
Image Registration is defined as the process of establishing correspondences
between two images.
•
It is the alignment/overlaying of two or more images so that the best structural
superimposition can be achieved.
•
Registration methods can be performed on two or more images, but in general it
involves only two images at a time.
•
One is usually referred to as the source or moving image, while the other is
referred to as the target or fixed image. The source image is denoted by S, while
the target is denoted by T.
Source (S) is Green;
Target (T) is Red
Registered
•
combining data obtained from a variety of imaging modalities (combining a CT
and an MRI view of the same patient) to get more information about the disease
at once.
Multimodal registration and
fusion.
Row 1. T1- weighted MRI of a
66 year old subject with
right parietal glioblastoma.
Row 2. Corresponding PET
layers after multimodal
registration.
Row 3. Fusion of registered
layers to support the
treatment planning.
Row 4. The fusion of MRI with
PET of the sensorimotoractivated cortex area proves
that the relevant area is out
of focus
Feature Extraction
•
Feature extraction is defined as the first stage of intelligent
(high level) image analysis.
•
It is followed by segmentation and classification, which often
do not affect in the image itself, i.e. the data or pixel level,
but are performed on higher abstract level.
Therefore, the task of feature extraction is to emphasize
image information on the particular level where the
following algorithms operate.
•
•
Consequently, information provided on other levels must be
suppressed. Thus, data reduction is executed to obtain the
characteristic properties.
Feature Extraction
•
Data-based features (raw data level)
•
Pixel-based features (individual pixels)
•
Edge-based features (local contrast, i. e., a strong difference
of (gray scale or color) values of adjacent pixels).
•
Textural features (e.g. honeycomb-like lung). (i) Structural
approaches, which are based on texture primitives (so called
texel or textone) and their rules of combinations, and (ii)
Statistical approaches.
•
Regional features (object classification and identification).
(i) localization-descriptive (along the major axes),
(ii) delineation-descriptive measures such as shape,
convexity, and length of the border etc.
Medical Image Segmentation
•
Segmentation, separation of structures of interest from the
background and from each other, is an essential analysis
function for which numerous algorithms have been
developed in the field of image processing.
•
The principal goal of the segmentation process is to partition
an image into regions that are homogeneous with respect to
one or more characteristics or features.
•
Segmentation is an important tool in medical image
processing, and it has been useful in many applications.
E.g. 1
Simple Case
Complex
Case
E.g. 2
•
In medical imaging, segmentation is important for feature
extraction, image measurements, and image display.
•
In some applications it may be useful to classify image pixels
into anatomical regions, such as bones, muscles, and blood
vessels, while in others into pathological regions, such as
cancer, tissue deformities, and multiple sclerosis lesions.
• Segmentation can be thought as the preprocessor
for further analysis.
Segmentation techniques can be divided into classes in different
ways.
e.g., based on the classification scheme:
•
Manual, semi-automatic, and automatic
•
Pixel-based (local methods) and region-based (global methods).
•
Low-level segmentation (thresholding, region growing, etc.), and
•
Model-based segmentation (multispectral or feature map techniques,
Marcov random field, deformable models, etc.).
• Model-based techniques are suitable for segmentation of images that
have artifacts, noise, and weak boundaries between structures.
• Deformable models: Snake model and Level Sets
•
Classical (thresholding, edge-based, and region-based techniques),
Statistical, Fuzzy, and Neural network techniques.
Classification
•
Assigns all connected regions obtained from the
segmentation to particularly specified classes of objects.
•
Usually, region based features that capture the
characteristics of the objects sufficiently abstractedly are
used to guide the classification process.
In this case, another feature extraction step is performed
between segmentation and classification.
•
•
These features must be sufficiently discriminative and
suitably adopted to the application, since they fundamentally
impact the resulting quality of the classifier.
Classification
•
•
Non-parametric classifier: Nearest neighbor (NN), K-NN.
Parametric procedures normally based on the assumption of
distribution functions for feature specifications of objects, this is not
always possible in medical image processing.
Statistic Classifiers: regards object identification as a
problem of the statistical decision theory.
•
• Syntactic Classifiers: based on grammar, which can possibly
generate an infinite amount of symbol chains with finite
symbol formalism.
• can be understood as a knowledge-based classification system
(expert system), because the classification is based on a formal
heuristic symbolic representation of expert knowledge, which is
transferred into image processing systems by means of facts and
rules.
Classification
Computational Intelligence-Based Classifiers:
•
•
ANNs, GA, Fuzzy Logic based
Face Detection/Recognition
Face Detection
•
Face detection only involves the detection of a face within a
digital image or video. It simply means that the face
detection system can identify that there is a human face
present in an image of video – it cannot identify that person.
•
Face detection is a component of Facial Recognition systems
– the first stage of facial recognition is detecting the presence
of a human face in the first place.
•
Face detection can also be used in cameras to help with
auto-focus – you will have noticed that on some digital
cameras and phones.
•
Face detection does not identify people or give names to
faces. The technology simply checks to see whether there is,
in fact, a person in a certain photograph or video. It uses
machine learning algorithms to scan digital images for
human faces, typically by looking for the eyes first and then
calculating the edges of each human face. This is how the
system pinpoints exactly where human faces are and counts
how many people are present in a photo or video.
•
Labelling an object/element in an input image/video as a
‘Face’.
Challenges:
1.
Pose variation: ideal scenario = only frontal images, not likely in
uncontrolled/dynamic situations; subject’s movements or camera’s
angle.
2.
Feature occlusion: elements like beards, glasses or hats introduces high
variability; partially covered by objects or other faces.
3.
Facial expression: Facial features also vary greatly because of different
facial gestures
4.
Imaging conditions: Different cameras and ambient conditions.
Face Recognition
The Problem: Given a still image or video of a scene,
identify or verify one or more persons in this scene
using a stored database of facial images.
1. Who is this person?
2. Is he/she who he/she claims to be?
Face recognition in humans
•
The human visual system starts with a preference for facelike patterns
•
The human visual system devotes special neural mechanisms
for face perception
•
Facial features are processed holistically:
•
•
Among facial features eyebrows are most important for recognition.
Humans can recognize faces in very low dimensional images
•
Tolerance to image degradation increases with familiarity
•
Color and texture are as important as shape
•
Illumination changes influence generalization
•
View-generalization is mediated by temporal association
Challenges: Intrapersonal variations
•
If people can do it so easily, why can’t computers?
•
Intrapersonal (intra-class) variations are variations of the
appearance of the same face caused by
•
Illumination variations
•
Pose variations
•
Facial expressions
•
Use of cosmetics and accessories, hairstyle changes
•
Temporal variations (aging, etc.)
Challenges: Interclass similarity
•
Interclass similarity: different persons may have very similar
appearance
•
Twins
•
Relatives
•
Strangers may look alike
Challenges: Illumination variations
•
Illumination variations may significantly affect the
appearance of a face in 2D images
•
Recognition performance may drop more than 40% for
images taken outdoors!
•
Humans have difficulties in recognizing familiar faces when
light direction changes (e.g. top-lit → bottom-lit)
Challenges: Pose variations
•
Difference between two images of the same subject under
different view angles is greater than differences between
images of two different subjects under the same view.
Challenges: Facial expressions
• Facial expressions caused by facial muscle movements may
significantly deform the face surface.
Challenges: Disguises
R. Singh, M. Vatsa and A. Noore, “Recognizing Face Images with Disguise Variations”,
Recent Advances in Face Recognition, I-Tech, Vienna, 2008.
Challenges: Information redundancy
• 20x20 facial image
• 256400=23200 possible combinations of intensity values
• Total world population as of Sept. 2021
7.9 Billion ≈ 233
• Extremely high-dimensional space
Download