Computer Science - BotPro Case Study Notes Bundle Adjustment • • • • • Technique used in computer vision and photogrammetry Refines parameters of a 3D reconstruction system Used to align multiple views of the scene points and camera poses “Bundle” - a collection of features observed in multiple images of the same scene. The bundles are then optimised while taking into account the camera parameters such as distortion, calibration and relative positioning to rearrange into a proper scene. • Iterative minimisation of reprojection error to provide a more precise representation of the scene Computer Vision • Field of study and research focusing on enabling computers to understand, interpret and analyse information from images and videos • Development of algorithms, techniques and models to extract meaningful information from visual data to replicate human vision capabilities • Steps in computer vision • Accepts the digital and video frames as input • Extract high-level information from visual data using complex mathematical and statistical algorithms to analyse patterns, shapes, colours and textures to recognise objects. Dead reckoning • The technique of estimating current position, orientation or velocity of an object or vehicle using previously interpreted measurements of kinematic properties • The data gathered about the position using dead reckoning is called dead reckoning data • Initial position: starting position, reference point • Acceleration: in X, Y and Z axes using accelerometer to estimate changes in orientation or angular velocity • Rotation: provided by gyroscopes or inertial measurement units (IMU) to estimate changes in orientation or angular velocity • Time: integrates the other measured parameters • Prone to accumulating errors due to sensor drift, noise and imprecise measurements leading to inaccurate position estimates if not properly calibrated using GPS or visual tracking Edge Computing • Distributed computing paradigm • Brings computation and data storage to the edge of the network where data is generated or consumed compared to sole reliance on centralised cloud computing infrastructure • Processing algorithms is located near the edge devices or sensors which reduces the need to transmit data to remote cloud servers to compute data • Idea is to enable real time processing and processing closer to the data source • Advantages • Low latency: less access times, good for time-sensitive applications such as autonomous vehicles and real-time analytics • Bandwidth Optimisation: Reduces amount of data needed to be transferred across the internet, alleviates congestion and high bandwidth costs • Privacy and security: data does not need to traverse external networks, allows localised data storage and processing • Offline operation: Allows operation during times of limited cloud/network connectivity Global Map Optimisation • Technique used to refine and improve accuracy of 3D map reconstruction • Involves simultaneous optimisation of 3D positions of landmarks (scene points) and its relevant camera poses • Employed in SLAM algorithms to create a map of the environment while simultaneously determining position of sensor within the map • Minimises discrepancy between predicted positions of 3D points and actual observed points. • Bundle adjustment or non-linear least squares optimisation is used GPS Signal • Refers to radio frequency signals transmitted by GPS satellites that provide information to receivers on earth • Allows receivers to calculate precise location, position and time to synchronise its motion • Consists of satellites that continuously transmit signals about their orbital parameters and exact time signals • Components of the GPS Signal • Navigation Message - information about satellite orbit, clock errors and other parameters at 50bits per second • Carrier Wave - radio wave carrying the navigation message on L1 or L2 in the form of modulated signals • Spread Spectrum Signal - to enhance signal quality and resistance to interference and/or disruptions to spread the signal over wider frequency band. • Receivers intercept the signal to analyse time delay between sending and receiving to calculate distance GPS-degraded environment • The GPS signals are severely compromised or degraded, leading to challenges/limitations in accurate positioning and navigation • Causes • Signal obstruction - physical obstructions along line-of-sight of GPS receiver and satellites • Multi path interference - signals reflect off buildings and other terrain before reaching receiver to interference with direct signals to cause errors/inaccuracies • Signal jamming - intentional or unintentional interference to disrupt or block GPS using EM waves, concern in high electronically active areas • Alternative positioning methods such as inertial navigation system (INS) using accelerometers and gyroscopes or other satellites. GPS-denied environment • Situation where the GPS signals are too weak/completely unavailable • Indoor, underground, dense areas where signals may be weakened, distorted or blocked • Again, alternative positioning methods or technologies are required Human Pose Tracking • Computer vision task that involves estimating position and orientation of human body joints or body parts • Understand and analyse human movement and posture • Algorithms operate on visual data and machine learning techniques to estimate 2D and 3D positions or orientations of the body joints to represent human poses • Can employ deep-learning, graphic or optimisation based methodologies • Leverage on CNNs (convolutions neural networks), or recurrent neural networks (RNNs) to learn features and relationships between body key points Inertial Measurement Unit (IMU) • Electronic sensor device combining multiple sensors to measure linear and angular motion of object • Integrates multiple devices including accelerometer, gyroscope and magnetometer into one single compact unit • Provide complete insight into object’s kinematic state • Used in navigation systems to estimate changes in position to enable precise control and localisation Keyframe Selection • Video processing technique involving the identification and selection of key frames that represent the entire scene from a sequence of videos / images • Capture essential information from the content of the visual sequence • Reduces amount of data to be processed or analysed while limiting the data to only relevant information • Criteria for a frame to be considered keyframe • Visual Saliency: only capture visually salient regions or objects in the video • Content Diversity: represent different scenes, perspectives and/or actions to provide a comprehensive overview of text • Content Diversity: represent different perspectives of the visually salient regions for comprehensiveness • Temporal Significance: select specific points in time of significance • Motion Characteristics: based on motion analysis • Redundancy: Selecting frames that only offer unique information compared to neighbouring frames • Computational efficiency: strike balance between accuracy and complexity Key points / pairs • • • • Distinctive or informative locations or regions in a set of images Identified based on unique visual characteristics including corners, edges and blobs Serve as landmarks or reference points for computer vision tasks Detected using feature extraction algorithms that analyse local properties such as intensity gradients, texture or scale-space representation. • After detection, they are described using feature descriptions that encode local appearance around the key point • Key pairs - corresponding key points detected in two or more images • Matching key pairs allows the tracking of movement by observing changes in the environments Light Detection and Ranging (LiDAR) • Remote sensing technology using laser light to measure distances and create precise 3D representations of surrounding environment • Emit laser pulses to measure time taken to bounce back and delay is used to calculate the kinematic information. • Components • Laser source: to emit pulsated bursts of light in rapid succession • Scanner / Receiver: to detect the reflected pulses and to steer laser beam in larger directions • Timing and Positioning: measures time to detect laser pulses to enable calculation of distance • Applications • Mapping and Surveying • Autonomous Vehicles • Environmental Monitoring Object Occlusion • Phenomena in which an object positioned in front of another obstructs the visibility of obscured object from viewpoint of observer • Affects tracking, segmentation and recognition • Causes complexities due to partial visibility making it difficult to predict actual nature • Disadvantages • Full extent and boundary of object not visible • Loss of tracking on an object - continuity requires complex algorithms • Exhibit limited visual cues or fragmented appearance • Depth relationships between occluded and occluded objects are not visible Odometer Sensor • • • • Device used to measure movement and displacement of mobile robot or vehicle Provides information about vehicle change in position based on motion Use rotational encoders or sensors on wheels / motor shafts to measure movement Combined with other information such as from IMU or GPS to improve accuracy and reliability of vehicle pose estimation and localisation • Provide real-time feedback to allow for precise control and monitoring Optimisation • Process of finding best possible solution by maximising a specific objective function within a given set of constraints • Involves systematic exploration and performance improvement • Process • Defining the problem and its constraints • Identification of the space of possible solutions through range and bound determination • • • • Objective function to measure quality of solution based on optimisation goal Reference to any constraints of the problem that solution must aim to solve Developing algorithms and techniques to solve the problem Assessing the optimised solution and evaluating its performance against defined objectives and constraints Re-localization • Relocating the position when camera or robot looses track or encounters environmental change due to sensor drift, occlusion, movement or lighting • Successful re-localization allows system to accurately cover its pose estimation and continue operation • Steps to re-localization • Map of the environment is created with key reference points for pose estimation • Extraction of visual features including key points and key frames • Features matched against map • Estimation of camera or robot pose calculated using matched features • Refinement or verification of estimation to improve accuracy and reliability • Most useful in SLAM situations where the robot/camera needs to constantly update its location Rigid Pose Estimation (RPE) • • • • Process of determining precise position and orientation of rigid object in 3D space Estimates six degrees of freedom (6DoF) transformation Rigid - object does not deform or change its shape during pose estimation Process steps • Feature detection - distinctive key points and frames detected • Feature matching - matching the data with corresponding figures • Pose estimation - solving for 6DoF transformation and estimating the pose using key points in reference frame • Refinement - to improve accuracy • Performed using various algorithmic techniques including PnP, ICP or RANSAC algorithms Robot Drift • Robot’s estimated position gradually deviates from actual position over time • Factors that contribute • Sensor noise: inaccuracies or anomalies • Calibration errors: misalignment of components, incorrect calibration • Environmental changes: terrain, lighting or magnetic field affect readings • Accumulative integration: adding sensor measurements causing error propagation and accumulation of mistake • Uncertainty: complex or dynamic environments • Methods for mitigation • Sensor fusion: integrating data from multiple sensors doing same thing • Kalman Filtering: to mitigate noise and uncertainties • Loop closure: correction mechanisms to correct accumulated error • Environmental Constraints: connect drift by aligning estimated pose with actual environment • Online calibration/recalibration: reduce systematic errors Simultaneous Localisation and Mapping (SLAM) • Technique used in order to enable a robot or an autonomous system to build a map of an unknown environment while estimating its own position • Process • Data Acquisition: images, range measurements, visual cues • Extraction: key frames, key pairs and reference points • Data association: identify common features across the different positions and viewpoints to create consistent map • Mapping: construct map using point clouds, occupancy grids and feature-based maps • Localisation: use of IMU and GPS data to estimate robot position and orientation • Loop Closure: revisit previously obstructed areas to correct accumulated error and improve consistency Sensor Fusion Model • Process of combining information from multiple sensors to obtain more accurate and comprehensive understanding of the environment/state • Integration of data from different sensors to overcome limitation • Process • Sensor selection: choosing appropriate sensor based on location • Data acquisition: from selected sensors including measurements, images, point clouds • Preprocessing: to remove noise, anomalies and align data spatially and temporally • Data Fusion Algorithms: to combine the data using statistic methods such as Kalman filter or Bayesian networks • Fusion output: generation of data that provides a more accurate and comprehensive representation • Benefits • Improved accuracy and reliability • More robust • Enhance situational awareness Visual Simultaneous Localisation and Mapping (vSLAM) • Technique of using visual information from cameras to simultaneously estimate pose and construct map • Step 1: Initialisation • Camera calibration: estimation of intrinsic parameters (distortion, foci) • Feature extraction: visual features or key points from initial frames to serve as reference points. • Pose estimation: estimate initial position relative to initial reference frame using PnP algorithm • Map initialisation: sparse drawing of surroundings as starting point • Scale estimation: obtaining absolute size using LiDAR or depth cameras for accuracy • Step 2: Local Mapping • Feature extraction: visual features or key points from current camera frames that are distinctive and unique using SIFT, SURF or ORB • Feature tracking: track position of the key points across consecutive frames to maintain consistency and measure time delay • Triangulation: estimate 3D position of key points and calculate spatial coordinates • Map representation: Triangulated key points placed on map • Update: as per new processing of camera frames • Step 3: Loop closure • Feature matching: compares current frames with previous to find similarities or matches • Similarity detection: determination of whether or not current frames contain similarities • Hypotheses Generation: algorithm generates hypothetical ways to close error loops • Verification and Consistency: to determine true loop closure • Update and Correction: based on changes to revisited area • Step 4: Re-localization (if any) • Image/frame matching: find match between current frames and existing map • Hypothesis Generation: about camera pose or position in environment • Hypothesis verification: using RANSAC or geometric verification • Map Re-association: cameras current position with map to continue mapping process • Step 5: Tracking • Feature extraction: of visual key points or features from frames • Feature matching: with corresponding features from previous frames • Motion estimation: using IMU / kinematic data • Pose Update: based on pose estimation using features of current frame • Robustness and error handling: recover from errors and gives real time update