Efficient AIS Data Processing for Environmentally Safe Shipping Marios Vodas1, Nikos Pelekis1, Yannis Theodoridis1, Cyril Ray2, Vangelis Karkaletsis3, Sergios Petridis3, Anastasia Miliou4 1 University of Piraeus 2 Naval Academy, France 3 NCSR “Demokritos” 4 Archipelago – Inst. of Marine Conservation 1 Outline 1. Part I: Marine Transportation 2. Part II: Automatic Identification System (AIS) 3. Part III: Objectives 4. Part IV: Methodology 5. Part V: Conclusion 2 I. MARITIME TRANSPORTATION 3 Safety (and Environmental) Issues Ships, control centers and marine officers have to face many security and safety problems due to: Staff reduction, cognitive overload, human errors Traffic increase (ports, maritime routes), dangerous contents Terrorism, pirates Technical faults (bad design, equipment breakdowns) Bad weather Etc. HELCOME AIS IRENav (NATO) MarineTraffic.com 4 The Most Prominent Cause of Accidents About 75-96% of marine casualties are caused, at least in part, by some form of human error * : 88% of tanker accidents 79% of towing vessel groundings 96% of collisions 75% of fires and explosions *Rothblum A.M. (2006) “Human Error and Marine Safety”, U.S. Coast Guard Research & Development Center Solution to such issues requires different levels of responses taking into account : People (activities) Technology Environment Organisational factors 5 Ways to Minimize Accidents Level of education and practice for mariners Work safety regulations (behaviour guidelines, normalised onboard equipments) Navigation and decision support systems providing real-time information, predictions, alerts... Integrate and use properly multiple and heterogeneous positioning systems : AIS, ARPA, Long Range Identification System (LRIT), Global Maritime Distress and Safety System (GMDSS), synthetic aperture radar, airborne radar, satellite based sensors Generalisation of vessel traffic monitoring, port control, search and rescue systems, automatic communications 6 Traffic Monitoring Air-based support Human and semi-automatic monitoring On-demand and on a regular basis Remote Sensing support Semi-automatic monitoring Every 2 to 6 hours Sensor-based support Almost automatic analysis and monitoring Real-time 7 II. AUTOMATIC IDENTIFICATION SYSTEM (AIS) 8 AIS Device The Automatic Identification System identifies and locates vessels at distance It includes an antenna, a transponder, a GPS receiver and additional sensors (e.g., loch and gyrocompass) It is a broadcast system based on VHF communications It is able to operate in autonomous and continuous mode Ships fitted with AIS send navigation data to surrounding receivers (range is about 50 km) Ships or maritime control centres on shore fitted with AIS receives navigation data sent by surrounding ships → AIS is mandatory (IMO) for big ships and passengers’ boats 9 AIS Transmission Rate and Accuracy AIS accuracy is defined as the largest distance the ship can cover between two updates The AIS broadcasts information with different rates of updates depending on the ship’s current speed and manoeuvre The IMO assumes that accuracy of embedded GPS is 10m Vessel behaviour Time between updates Accuracy (m) Anchored 3 min = 10 metres Speed between 0-14 knots 12 s Between 10 and 95 metres Speed between 0-14 knots and changing course 4s Between 10 and 40 metres Speed between 14-23 knots 6s Between 55 and 80 metres Speed between 14-23 knots and changing course 2s Between 25 and 35 metres Speed over 23 knots 3s > 45 metres Speed over 23 knots and changing course 2s > 35 metres General update rules have been compared to reality: it appears that update rates are lower 10 AIS Data The AIS provide location-based information on 2D routes, this defining point-based 3D trajectories That is, an ordered series of locations (X,Y,T) of a given mobile object O with T indicating the timestamp of the location (X,Y) Transmitted data include ship’s position and textual metainformation Static: ID number (MMSI), IMO code, ship name and type, dimensions Dynamic: Position (Long, Lat), speed, heading, course over ground (COG), rate of turn (ROT) Route-based: Destination, danger, estimated time of arrival (ETA) and draught → Time does not exist in AIS frames : to be add by receivers !AIVDM,1,1,,A,1Bwj:v0P1=1f75REQg>rPwv:0000,0*3B 11 III. OBJECTIVES 12 Big AIS Data Processing for Environmentally Safe Shipping Objectives, based on Archipelagos Institute of Marine Conservation requests, was to Investigate factors which contribute most to the risk of a shipping accident Identify dangerous areas How : traffic database processing in order to address some requirements / queries set by Archipelagos towards semiquantitative risk analysis of shipping traffic → Data coming from AIS → Application to the Aegean Sea 13 Typical Questions From Domain Experts Calculate average and minimum distances from shore or between two ships Calculate the maximum number of ships in the vicinity of another ship Find whether (and how many times) a ship goes through specified areas (e.g. narrow passages, biodiversity boxes) Calculate the number of sharp changes in ship’s direction Find typical routes vs. outliers etc. etc. 14 Mediterranean Sea European Maritime Safety Agency (EMSA) centralizes data from EU states and provides them through a Web service → Data Volume is 100 million positions per month, that is about 2300 positions per minutes We worked on a dataset on Mediterranean sea provided By IMIS Hellas (a Greek IT company related to IMIS Global, collecting AIS data, mariweb.gr) • Focus on Aegean sea : 3 days, 3 million position records (933 distinct ships) • Full dataset is more than 2000 SQL tables for a total of 2 TB covering 2,5 years of vessel activity Two datasets are available at Chorochronos.org interface (IMIS 3 days and AIS Brest) 15 Vessel Statistics Country Greece Number of ships Flag of Convenience 263 No Panama (Republic of) 112 Yes Turkey 96 No Malta 76 Yes Liberia (Republic of) 32 Yes Vincent and Grenadines the 29 Yes 16 IV. METHODOLOGY 17 Populating a Database Relational database (postgres and postgis) Data model based on AIS messages : positions, ships and trips Parsing, Integration, error checking filtering Reconstructing trajectories from raw data and feeding a trajectory DB Apply “simple” queries to answer experts needs “What is the (sub)trajectory of a ship during its presence in an area” ? 18 MOD Engine and Rule-Based Analysis An integrated approach for maritime situation awareness based on an inference engine (drools) The expert defines his rules according its needs and objectives The engine executes rules using the AIS database Mixed top-down / bottom-up approach involving an expert monitoring real-time traffic on a touch table Hermes is a MOD engine providing extensible DBMS support for trajectory data Defines trajectory data type SQL extensions at the logical level Efficient indexing techniques at the physical level Includes trajectory clustering support http://infolab.cs.unipi.gr/hermes 19 Methodology Steps Cleaning Filter: Wrong CRC Duplicates Decoding AIS type: 1/2/3 Position Report 5 Static and Voyage Related Data Cleaning Filter: Invalid MMSI GPS Error Querying Timeslice Range Temporal only Spatial only Spatio-Temporal wrt. a reference static object (point / segment / box) wrt. a reference trajectory Nearest Neighbor (NN) Hermes Loader Degrees to Meters Trajectory Update Outputs Trajectories Advanced Querying Pair-wise similarity queries OD-Matrix origin/destination are spatial vs. spatio-temporal boxes Trajectory Clustering 20 Take the Maritime Environment Into Account The maritime domain is peculiar as there is no underlying network but some maritime rules define predefined paths and anchorage areas (polylines and polygons) that might constrain a given trajectory S-57 ENC (Electronic Nautical Chart) We added official vector chart and expertdefined areas of interest in the database Coastlines Starting, ending, passing, restricted areas, waiting zones Regulations and dangers (rocs, buoys, seabed) … 21 Exploring the Data Calculating trajectory aggregations and feeding a trajectory data warehouse Performing OLAP analysis over aggregations (eg. O/D analysis) Running KDD techniques : frequent pattern analysis, clustering, outlier detection, etc. Cloud of locations Association of points coming from the same source-destination set Definition of a route and qualifying of positions at each time Qualifying of a new trajectory compared to the identified route 22 Visualizing Trajectories and Patterns → Web-based visualisation using Google Maps / Earth applications, Openlayers (OSM) frequent patterns speed behaviour space-time cube: trajectory too far on the right → ← spacetime cube: ship is late 23 V. CONCLUSION 24 Some Open Questions Q1. What kind of storage is appropriate for BIG volumes of vessel traffic data? Serial vs. parallel/distributed processing (e.g. Hadoop) (batch vs. streaming) MOD engines? What about indexing BIG mobility data? Q2. What kind of analysis on vessel traffic data makes sense? Analysis on current (location, speed, heading, …) vs. historical information (trajectories) Clusters (+ outliers), frequent patterns, next location prediction, etc. Exploit on previous knowledge to improve real-time analysis Trajectory clustering Q3. What kind of visualization is appropriate for vessel traffic data / patterns Current location vs. trajectory-based visual analytics Frequent pattern mining 25 Research Challenges on Data – Just a Few Examples Trajectory compression / simplification: how to compress / simplify trajectories keeping quality as high as possible? Semantic trajectory reconstruction: how to extract semantics from raw (GPS-based) trajectory data? Trajectory sampling: how to find a representative sample among a trajectory dataset? Generating trajectories by example: how to build large synthetic datasets that simulate the ‘behavior’ of a small real one? Etc. 26 Questions 27