Uploaded by Racshanyaa Jagadish

Racshanyaa Computer Science BotPro Case Study Notes

Computer Science - BotPro Case Study Notes
Bundle Adjustment
Technique used in computer vision and photogrammetry
Refines parameters of a 3D reconstruction system
Used to align multiple views of the scene points and camera poses
“Bundle” - a collection of features observed in multiple images of the same scene.
The bundles are then optimised while taking into account the camera parameters such as distortion,
calibration and relative positioning to rearrange into a proper scene.
• Iterative minimisation of reprojection error to provide a more precise representation of the scene
Computer Vision
• Field of study and research focusing on enabling computers to understand, interpret and analyse
information from images and videos
• Development of algorithms, techniques and models to extract meaningful information from visual data to
replicate human vision capabilities
• Steps in computer vision
• Accepts the digital and video frames as input
• Extract high-level information from visual data using complex mathematical and statistical
algorithms to analyse patterns, shapes, colours and textures to recognise objects.
Dead reckoning
• The technique of estimating current position, orientation or velocity of an object or vehicle using
previously interpreted measurements of kinematic properties
• The data gathered about the position using dead reckoning is called dead reckoning data
• Initial position: starting position, reference point
• Acceleration: in X, Y and Z axes using accelerometer to estimate changes in orientation or angular
• Rotation: provided by gyroscopes or inertial measurement units (IMU) to estimate changes in
orientation or angular velocity
• Time: integrates the other measured parameters
• Prone to accumulating errors due to sensor drift, noise and imprecise measurements leading to inaccurate
position estimates if not properly calibrated using GPS or visual tracking
Edge Computing
• Distributed computing paradigm
• Brings computation and data storage to the edge of the network where data is generated or consumed
compared to sole reliance on centralised cloud computing infrastructure
• Processing algorithms is located near the edge devices or sensors which reduces the need to transmit data
to remote cloud servers to compute data
• Idea is to enable real time processing and processing closer to the data source
• Advantages
• Low latency: less access times, good for time-sensitive applications such as autonomous vehicles
and real-time analytics
• Bandwidth Optimisation: Reduces amount of data needed to be transferred across the internet,
alleviates congestion and high bandwidth costs
• Privacy and security: data does not need to traverse external networks, allows localised data
storage and processing
• Offline operation: Allows operation during times of limited cloud/network connectivity
Global Map Optimisation
• Technique used to refine and improve accuracy of 3D map reconstruction
• Involves simultaneous optimisation of 3D positions of landmarks (scene points) and its relevant camera
• Employed in SLAM algorithms to create a map of the environment while simultaneously determining
position of sensor within the map
• Minimises discrepancy between predicted positions of 3D points and actual observed points.
• Bundle adjustment or non-linear least squares optimisation is used
GPS Signal
• Refers to radio frequency signals transmitted by GPS satellites that provide information to receivers on
• Allows receivers to calculate precise location, position and time to synchronise its motion
• Consists of satellites that continuously transmit signals about their orbital parameters and exact time
• Components of the GPS Signal
• Navigation Message - information about satellite orbit, clock errors and other parameters at 50bits
per second
• Carrier Wave - radio wave carrying the navigation message on L1 or L2 in the form of modulated
• Spread Spectrum Signal - to enhance signal quality and resistance to interference and/or
disruptions to spread the signal over wider frequency band.
• Receivers intercept the signal to analyse time delay between sending and receiving to calculate distance
GPS-degraded environment
• The GPS signals are severely compromised or degraded, leading to challenges/limitations in accurate
positioning and navigation
• Causes
• Signal obstruction - physical obstructions along line-of-sight of GPS receiver and satellites
• Multi path interference - signals reflect off buildings and other terrain before reaching receiver to
interference with direct signals to cause errors/inaccuracies
• Signal jamming - intentional or unintentional interference to disrupt or block GPS using EM
waves, concern in high electronically active areas
• Alternative positioning methods such as inertial navigation system (INS) using accelerometers and
gyroscopes or other satellites.
GPS-denied environment
• Situation where the GPS signals are too weak/completely unavailable
• Indoor, underground, dense areas where signals may be weakened, distorted or blocked
• Again, alternative positioning methods or technologies are required
Human Pose Tracking
• Computer vision task that involves estimating position and orientation of human body joints or body parts
• Understand and analyse human movement and posture
• Algorithms operate on visual data and machine learning techniques to estimate 2D and 3D positions or
orientations of the body joints to represent human poses
• Can employ deep-learning, graphic or optimisation based methodologies
• Leverage on CNNs (convolutions neural networks), or recurrent neural networks (RNNs) to learn features
and relationships between body key points
Inertial Measurement Unit (IMU)
• Electronic sensor device combining multiple sensors to measure linear and angular motion of object
• Integrates multiple devices including accelerometer, gyroscope and magnetometer into one single compact
• Provide complete insight into object’s kinematic state
• Used in navigation systems to estimate changes in position to enable precise control and localisation
Keyframe Selection
• Video processing technique involving the identification and selection of key frames that represent the
entire scene from a sequence of videos / images
• Capture essential information from the content of the visual sequence
• Reduces amount of data to be processed or analysed while limiting the data to only relevant information
• Criteria for a frame to be considered keyframe
• Visual Saliency: only capture visually salient regions or objects in the video
• Content Diversity: represent different scenes, perspectives and/or actions to provide a
comprehensive overview of text
• Content Diversity: represent different perspectives of the visually salient regions for
• Temporal Significance: select specific points in time of significance
• Motion Characteristics: based on motion analysis
• Redundancy: Selecting frames that only offer unique information compared to neighbouring
• Computational efficiency: strike balance between accuracy and complexity
Key points / pairs
Distinctive or informative locations or regions in a set of images
Identified based on unique visual characteristics including corners, edges and blobs
Serve as landmarks or reference points for computer vision tasks
Detected using feature extraction algorithms that analyse local properties such as intensity gradients,
texture or scale-space representation.
• After detection, they are described using feature descriptions that encode local appearance around the key
• Key pairs - corresponding key points detected in two or more images
• Matching key pairs allows the tracking of movement by observing changes in the environments
Light Detection and Ranging (LiDAR)
• Remote sensing technology using laser light to measure distances and create precise 3D representations of
surrounding environment
• Emit laser pulses to measure time taken to bounce back and delay is used to calculate the kinematic
• Components
• Laser source: to emit pulsated bursts of light in rapid succession
• Scanner / Receiver: to detect the reflected pulses and to steer laser beam in larger directions
• Timing and Positioning: measures time to detect laser pulses to enable calculation of distance
• Applications
• Mapping and Surveying
• Autonomous Vehicles
• Environmental Monitoring
Object Occlusion
• Phenomena in which an object positioned in front of another obstructs the visibility of obscured object
from viewpoint of observer
• Affects tracking, segmentation and recognition
• Causes complexities due to partial visibility making it difficult to predict actual nature
• Disadvantages
• Full extent and boundary of object not visible
• Loss of tracking on an object - continuity requires complex algorithms
• Exhibit limited visual cues or fragmented appearance
• Depth relationships between occluded and occluded objects are not visible
Odometer Sensor
Device used to measure movement and displacement of mobile robot or vehicle
Provides information about vehicle change in position based on motion
Use rotational encoders or sensors on wheels / motor shafts to measure movement
Combined with other information such as from IMU or GPS to improve accuracy and reliability of vehicle
pose estimation and localisation
• Provide real-time feedback to allow for precise control and monitoring
• Process of finding best possible solution by maximising a specific objective function within a given set of
• Involves systematic exploration and performance improvement
• Process
• Defining the problem and its constraints
• Identification of the space of possible solutions through range and bound determination
Objective function to measure quality of solution based on optimisation goal
Reference to any constraints of the problem that solution must aim to solve
Developing algorithms and techniques to solve the problem
Assessing the optimised solution and evaluating its performance against defined objectives and
• Relocating the position when camera or robot looses track or encounters environmental change due to
sensor drift, occlusion, movement or lighting
• Successful re-localization allows system to accurately cover its pose estimation and continue operation
• Steps to re-localization
• Map of the environment is created with key reference points for pose estimation
• Extraction of visual features including key points and key frames
• Features matched against map
• Estimation of camera or robot pose calculated using matched features
• Refinement or verification of estimation to improve accuracy and reliability
• Most useful in SLAM situations where the robot/camera needs to constantly update its location
Rigid Pose Estimation (RPE)
Process of determining precise position and orientation of rigid object in 3D space
Estimates six degrees of freedom (6DoF) transformation
Rigid - object does not deform or change its shape during pose estimation
Process steps
• Feature detection - distinctive key points and frames detected
• Feature matching - matching the data with corresponding figures
• Pose estimation - solving for 6DoF transformation and estimating the pose using key points in
reference frame
• Refinement - to improve accuracy
• Performed using various algorithmic techniques including PnP, ICP or RANSAC algorithms
Robot Drift
• Robot’s estimated position gradually deviates from actual position over time
• Factors that contribute
• Sensor noise: inaccuracies or anomalies
• Calibration errors: misalignment of components, incorrect calibration
• Environmental changes: terrain, lighting or magnetic field affect readings
• Accumulative integration: adding sensor measurements causing error propagation and
accumulation of mistake
• Uncertainty: complex or dynamic environments
• Methods for mitigation
• Sensor fusion: integrating data from multiple sensors doing same thing
• Kalman Filtering: to mitigate noise and uncertainties
• Loop closure: correction mechanisms to correct accumulated error
• Environmental Constraints: connect drift by aligning estimated pose with actual environment
• Online calibration/recalibration: reduce systematic errors
Simultaneous Localisation and Mapping (SLAM)
• Technique used in order to enable a robot or an autonomous system to build a map of an unknown
environment while estimating its own position
• Process
• Data Acquisition: images, range measurements, visual cues
• Extraction: key frames, key pairs and reference points
• Data association: identify common features across the different positions and viewpoints to create
consistent map
• Mapping: construct map using point clouds, occupancy grids and feature-based maps
• Localisation: use of IMU and GPS data to estimate robot position and orientation
• Loop Closure: revisit previously obstructed areas to correct accumulated error and improve
Sensor Fusion Model
• Process of combining information from multiple sensors to obtain more accurate and comprehensive
understanding of the environment/state
• Integration of data from different sensors to overcome limitation
• Process
• Sensor selection: choosing appropriate sensor based on location
• Data acquisition: from selected sensors including measurements, images, point clouds
• Preprocessing: to remove noise, anomalies and align data spatially and temporally
• Data Fusion Algorithms: to combine the data using statistic methods such as Kalman filter or
Bayesian networks
• Fusion output: generation of data that provides a more accurate and comprehensive representation
• Benefits
• Improved accuracy and reliability
• More robust
• Enhance situational awareness
Visual Simultaneous Localisation and Mapping (vSLAM)
• Technique of using visual information from cameras to simultaneously estimate pose and construct map
• Step 1: Initialisation
• Camera calibration: estimation of intrinsic parameters (distortion, foci)
• Feature extraction: visual features or key points from initial frames to serve as reference points.
• Pose estimation: estimate initial position relative to initial reference frame using PnP algorithm
• Map initialisation: sparse drawing of surroundings as starting point
• Scale estimation: obtaining absolute size using LiDAR or depth cameras for accuracy
• Step 2: Local Mapping
• Feature extraction: visual features or key points from current camera frames that are distinctive
and unique using SIFT, SURF or ORB
• Feature tracking: track position of the key points across consecutive frames to maintain
consistency and measure time delay
• Triangulation: estimate 3D position of key points and calculate spatial coordinates
• Map representation: Triangulated key points placed on map
• Update: as per new processing of camera frames
• Step 3: Loop closure
• Feature matching: compares current frames with previous to find similarities or matches
• Similarity detection: determination of whether or not current frames contain similarities
• Hypotheses Generation: algorithm generates hypothetical ways to close error loops
• Verification and Consistency: to determine true loop closure
• Update and Correction: based on changes to revisited area
• Step 4: Re-localization (if any)
• Image/frame matching: find match between current frames and existing map
• Hypothesis Generation: about camera pose or position in environment
• Hypothesis verification: using RANSAC or geometric verification
• Map Re-association: cameras current position with map to continue mapping process
• Step 5: Tracking
• Feature extraction: of visual key points or features from frames
• Feature matching: with corresponding features from previous frames
• Motion estimation: using IMU / kinematic data
• Pose Update: based on pose estimation using features of current frame
• Robustness and error handling: recover from errors and gives real time update