The United States air transportation network analysis Dorothy Cheung Introduction • The problem and its importance • Missing Pieces • Related works in summary • Methodology – Data set – Network Generation – Network Analysis • Conclusion Outline • The problem and its importance • Missing Pieces • Related works • Methodology – Data set – Network Generation – Network Analysis • Conclusion The problem and its importance • Problem – Analysis the air transportation network in the U.S. • Network driven by profits and politics • Better understand the network structure not maximize utility • Importance – Economy: transport of good and services – Air traffic flow: convenience – Health studies: propagation of diseases Outline • The problem and its importance • Missing Pieces • Related works • Methodology – Data set – Network Generation – Network Analysis • Conclusion Missing pieces • Sufficient amount of researches on the network with focuses on utility optimization. • Commercial enterprises: OAG and Innovata • But … lack of research on analyzing the network features studied in class. Outline • The problem and its importance • Missing Pieces • Related works • Methodology – Data set – Network Generation – Network Analysis • Conclusion Related works Air transportation networks analysis • WAN – World-wide Airport Network • ANI – Airport Network of India • ANC – Airport Network of China Related works Summary: Features of air transportation networks • Small world network (compared with random graphs) – Small average shortest path – High average clustering coefficient – Degree mixing differs WAN ANI ANC Avg. shortest path 4.4 4 2.067 Avg. Clustering Coef. 0.62 0.6574 0.733 Degree mixing Associative Dissociative Dissociative • Scale free power law degree distribution Power law exponent WAN ANI ANC 1.0 2.2 +/- 0.1 1.65 Outline • The problem and its importance • Missing Pieces • Related works • Methodology – Data set – Network Generation – Network Analysis • Conclusion Methodology • Data Set • Network Generation • Network Analysis Methodology – Data Set T100 OAI RITA BTS Legends OAI : Office of Airline Information RITA : Research and Innovative Technology Administration BTS : Bureau of Transportation Statistics My data DATABASE Methodology – Data Set Domestic Air Traffic Hubs [1] Methodology – Data Set • Domestic scheduled flights – Passengers, cargos, and mails – Military excluded • Market Data vs. Segment Data – Market : Used • Accounts for passenger once on the same flight number – Segment : Not used • Accounts for passenger more than once per leg • Month specific : July 2011 Methodology – Data Set • Relevant information • Number of Passengers • Number of Cargos : Freight and Mail • Origin City • Destination City Sample .csv from BTS PASSENGE RS FREIGHT MAIL ORIGIN_CITY_NAME 59 700 17 Akhiok, AK 19 200 2 Akhiok, AK 24 0 0 Akhiok, AK 2 0 0 Akiachak, AK 176 47748 2250 Adak Island, AK 20 0 0 Adak Island, AK 105 28 320 Akiachak, AK DEST_CITY_NAME Kodiak, AK Kodiak, AK Kodiak, AK Akiak, AK Anchorage, AK Anchorage, AK Bethel, AK DEST_CITY DEST_STAT DEST_STAT DEST_STAT _NUM E_ABR E_FIPS E_NM DEST_WAC YEAR 1017 AK 2 Alaska 1 1017 AK 2 Alaska 1 1017 AK 2 Alaska 1 1024 AK 2 Alaska 1 1029 AK 2 Alaska 1 1029 AK 2 Alaska 1 1055 AK 2 Alaska 1 QUARTER MONTH 2011 3 2011 3 2011 3 2011 3 2011 3 2011 3 2011 3 DISTANCE_ GROUP CLASS 7 1F 7 1L 7 1F 7 1F 7 3F 7 3L 7 1F Methodology – Network Generation • Network – 850 Nodes: airports – 21405 entries • Weighted edges: sum of passengers and cargos – Directed and Undirected network input files for Pajak [2] and GUESS [5]. Methodology – Network Generation .CSV GenerateNwk Microsoft.Jet.OLEDB 4.0Provider ParseCSV Data Table LINQ PajekDirected.net PajekUndirected.net GUESSDirected.gdf GUESSUndirected.gdf Network Generation Tool written in C# using LINQ (Language Integrated Query) Methodology – Network Generation The U.S. Air Transportation Network drawn in Pajek Methodology – Network Analysis • Metrics – Degree distributions and correlations • Top 10 most connected cities • Top 10 most central cites – Small world network? • Shortest path length • Clustering coefficient • Compare against WAN, ANI, and ANC – Cumulative degree distribution and the power law – Resilience – Associativity : Rich-club? – Random graph – Z-Score TBD? Methodology – Network Analysis – Degree distributions and correlations • Directed network • Pajek: In degree : Net -> Partitions -> Degree -> Input Out degree : Net -> Partitions -> Degree -> Output Both : Net -> Partitions -> Degree -> All – Shortest path length • Directed network • Pajek: Net -> Paths between 2 vertices -> Diameter – Clustering coefficient • Directed network • Pajek: Net -> Paths between 2 vertices -> Diameter Methodology – Network Analysis – Cumulative degree distribution and the power law • Directed network Step 1 in Pajek: – Create a partition of all degree – Export the partition in a tab delimited file Tools -> Export to Tab Delimited File -> Current Partition Step 2 in MatLab [6]: – Generating a power law integer distribution X = GetInput.m : reads the partition from the tab delimited file (X => X.name, X.label, X.degree) – Calculating the cumulative distribution cumulativecounts.m [4] [xlincumulative,ylincumulative] = cumulativecounts(X.degree) Methodology – Network Analysis – Resilience What % of nodes are removed to reduce the size of the Giant component by half? • Consider: – Random attack – Targeted attack : remove nodes with the highest degree and betweenness centrality measures • Undirected network with 850 nodes • GUESS toolbars: resiliencedegree.py and resiliencebetweenness.py that are downloaded from cTools [4] • Compare against a random network (Random and targeted attacks) GUESS : makeSimpleRandom(numberOfNodes, numberOfEdges) => numberOfNodes = 850 numberOfEdges = 21405 Methodology – Network Analysis – Associativity : Rich-club? • Draw conclusion from graphical analysis in GUESS – Random graph • Difficulty in constructing a realistic random network that models the real network [3]. – Z-Score? • To Be Determined. Methodology – Network Analysis • Expectations/Predictions – Larger degree nodes are more central (betweenness). Consider LAX, SFO, HOU, JFK, etc. – Small world as compared to WAN, ANI, and ANC – Scale free power law distribution – Dissociate Outline • The problem and its importance • Missing Pieces • Related works • Methodology – Data set – Network Generation – Network Analysis • Conclusion Conclusion The United States air transportation network analysis • The problem and its importance • Missing Pieces • Related works – WAN, ANI, ANC • Methodology Data set : BTS : Bureau of Transportation Statistics Network Generation : Directed and Undirected network input files Network Analysis : Degree distribution Small world network as compared to WAN, ANI, and ANC Cumulative degree distribution and power law Resilience Associativity z-score – TBD? References for this presentation 1. T-100 reporting guide, RITA, http://www.rita.dot.gov/, www.transtats.bts.gov, http://www.bts.gov/programs/airline_information/. 2. Pajak, program for large network analysis, http://vlado.fmf.unilj.si/pub/networks/pajek/. 3. Albert-Laszlo Barabasi and Reka Albert, “Emergence of Scaling in Random Networks”, Department of Physics, University of Notre-Dame, October, 1999. 4. CTools, https://ctools.umich.edu/portal. 5. GUESS, graph exploration system, http://graphexploration.cond.org/. 6. Matlab, The language of technical computing, http://www.mathworks.com/products/matlab/index.html