Air Transport Network

advertisement
The United States air transportation
network analysis
Dorothy Cheung
Introduction
• The problem and its importance
• Missing Pieces
• Related works in summary
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
Outline
• The problem and its importance
• Missing Pieces
• Related works
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
The problem and its importance
• Problem
– Analysis the air transportation network in the U.S.
• Network driven by profits and politics
• Better understand the network structure not maximize
utility
• Importance
– Economy: transport of good and services
– Air traffic flow: convenience
– Health studies: propagation of diseases
Outline
• The problem and its importance
• Missing Pieces
• Related works
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
Missing pieces
• Sufficient amount of researches on the
network with focuses on utility optimization.
• Commercial enterprises: OAG and Innovata
• But … lack of research on analyzing the
network features studied in class.
Outline
• The problem and its importance
• Missing Pieces
• Related works
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
Related works
Air transportation networks analysis
• WAN – World-wide Airport Network
• ANI – Airport Network of India
• ANC – Airport Network of China
Related works
Summary:
Features of air transportation networks
• Small world network (compared with random graphs)
– Small average shortest path
– High average clustering coefficient
– Degree mixing differs
WAN
ANI
ANC
Avg. shortest path
4.4
4
2.067
Avg. Clustering Coef.
0.62
0.6574
0.733
Degree mixing
Associative
Dissociative
Dissociative
• Scale free power law degree distribution
Power law
exponent
WAN
ANI
ANC
1.0
2.2 +/- 0.1
1.65
Outline
• The problem and its importance
• Missing Pieces
• Related works
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
Methodology
• Data Set
• Network Generation
• Network Analysis
Methodology – Data Set
T100
OAI
RITA
BTS
Legends
OAI : Office of Airline Information
RITA : Research and Innovative Technology Administration
BTS : Bureau of Transportation Statistics
My data
DATABASE
Methodology – Data Set
Domestic Air Traffic Hubs [1]
Methodology – Data Set
• Domestic scheduled flights
– Passengers, cargos, and mails
– Military excluded
• Market Data vs. Segment Data
– Market : Used
• Accounts for passenger once on the same flight number
– Segment : Not used
• Accounts for passenger more than once per leg
• Month specific : July 2011
Methodology – Data Set
• Relevant information
• Number of Passengers
• Number of Cargos : Freight and Mail
• Origin City
• Destination City
Sample .csv from BTS
PASSENGE
RS
FREIGHT MAIL
ORIGIN_CITY_NAME
59
700
17 Akhiok, AK
19
200
2 Akhiok, AK
24
0
0 Akhiok, AK
2
0
0 Akiachak, AK
176
47748
2250 Adak Island, AK
20
0
0 Adak Island, AK
105
28
320 Akiachak, AK
DEST_CITY_NAME
Kodiak, AK
Kodiak, AK
Kodiak, AK
Akiak, AK
Anchorage, AK
Anchorage, AK
Bethel, AK
DEST_CITY DEST_STAT DEST_STAT DEST_STAT
_NUM
E_ABR
E_FIPS
E_NM
DEST_WAC YEAR
1017 AK
2 Alaska
1
1017 AK
2 Alaska
1
1017 AK
2 Alaska
1
1024 AK
2 Alaska
1
1029 AK
2 Alaska
1
1029 AK
2 Alaska
1
1055 AK
2 Alaska
1
QUARTER MONTH
2011
3
2011
3
2011
3
2011
3
2011
3
2011
3
2011
3
DISTANCE_
GROUP
CLASS
7
1F
7
1L
7
1F
7
1F
7
3F
7
3L
7
1F
Methodology – Network Generation
• Network
– 850 Nodes: airports
– 21405 entries
• Weighted edges: sum of passengers and cargos
– Directed and Undirected network input files for
Pajak [2] and GUESS [5].
Methodology – Network Generation
.CSV
GenerateNwk
Microsoft.Jet.OLEDB
4.0Provider
ParseCSV
Data Table
LINQ
PajekDirected.net
PajekUndirected.net
GUESSDirected.gdf
GUESSUndirected.gdf
Network Generation Tool written
in C# using LINQ (Language
Integrated Query)
Methodology – Network Generation
The U.S. Air Transportation Network
drawn in Pajek
Methodology – Network Analysis
• Metrics
– Degree distributions and correlations
• Top 10 most connected cities
• Top 10 most central cites
– Small world network?
• Shortest path length
• Clustering coefficient
• Compare against WAN, ANI, and ANC
– Cumulative degree distribution and the power law
– Resilience
– Associativity : Rich-club?
– Random graph
– Z-Score TBD?
Methodology – Network Analysis
– Degree distributions and correlations
• Directed network
• Pajek:
 In degree : Net -> Partitions -> Degree -> Input
 Out degree : Net -> Partitions -> Degree -> Output
 Both : Net -> Partitions -> Degree -> All
– Shortest path length
• Directed network
• Pajek:
 Net -> Paths between 2 vertices -> Diameter
– Clustering coefficient
• Directed network
• Pajek:
 Net -> Paths between 2 vertices -> Diameter
Methodology – Network Analysis
– Cumulative degree distribution and the power law
• Directed network
Step 1 in Pajek:
– Create a partition of all degree
– Export the partition in a tab delimited file
 Tools -> Export to Tab Delimited File -> Current Partition
Step 2 in MatLab [6]:
– Generating a power law integer distribution
X = GetInput.m : reads the partition from the tab delimited file
(X => X.name, X.label, X.degree)
– Calculating the cumulative distribution
cumulativecounts.m [4]
[xlincumulative,ylincumulative] = cumulativecounts(X.degree)
Methodology – Network Analysis
– Resilience
What % of nodes are removed to reduce the size of the Giant component
by half?
• Consider:
– Random attack
– Targeted attack : remove nodes with the highest degree and
betweenness centrality measures
• Undirected network with 850 nodes
• GUESS toolbars: resiliencedegree.py and resiliencebetweenness.py
that are downloaded from cTools [4]
• Compare against a random network (Random and targeted
attacks)
GUESS : makeSimpleRandom(numberOfNodes, numberOfEdges)
=> numberOfNodes = 850
numberOfEdges = 21405
Methodology – Network Analysis
– Associativity : Rich-club?
• Draw conclusion from graphical analysis in GUESS
– Random graph
• Difficulty in constructing a realistic random network
that models the real network [3].
– Z-Score?
• To Be Determined.
Methodology – Network Analysis
• Expectations/Predictions
– Larger degree nodes are more central (betweenness).
Consider LAX, SFO, HOU, JFK, etc.
– Small world as compared to
WAN, ANI, and ANC
– Scale free power law distribution
– Dissociate
Outline
• The problem and its importance
• Missing Pieces
• Related works
• Methodology
– Data set
– Network Generation
– Network Analysis
• Conclusion
Conclusion
The United States air transportation network analysis
•
The problem and its importance
•
Missing Pieces
•
Related works – WAN, ANI, ANC
•
Methodology
 Data set : BTS : Bureau of Transportation Statistics
 Network Generation : Directed and Undirected network input files
 Network Analysis :






Degree distribution
Small world network as compared to WAN, ANI, and ANC
Cumulative degree distribution and power law
Resilience
Associativity
z-score – TBD?
References for this presentation
1. T-100 reporting guide, RITA, http://www.rita.dot.gov/, www.transtats.bts.gov,
http://www.bts.gov/programs/airline_information/.
2. Pajak, program for large network analysis, http://vlado.fmf.unilj.si/pub/networks/pajek/.
3. Albert-Laszlo Barabasi and Reka Albert, “Emergence of Scaling in Random
Networks”, Department of Physics, University of Notre-Dame, October, 1999.
4. CTools, https://ctools.umich.edu/portal.
5. GUESS, graph exploration system, http://graphexploration.cond.org/.
6. Matlab, The language of technical computing,
http://www.mathworks.com/products/matlab/index.html
Download