NDW - CBS

advertisement
National Datawarehouse for Traffic
Information – Big Data supplier
Els Rijnierse
Contents
• Introducing NDW
• Experiences with our big data
• Challenges, choices and changes
Posting
• The last slide will ask you to post your impression,
to share what struck you most with all conference
attendees
NDW is a Collaborative venture
• 24 Road authorities
National
6 out of 12 provinces
Cities, either independent or in an alliance
• Covering >6000 km road network
(total Dutch road network
is 130.000 km)
Introducing NDW
What is our aim?
• Develop and maintain a joint database for traffic data.
Up-to-date, complete and unambiguous with known
quality
• Create efficiency by working together and sharing
information
• Stimulate effective use of this data for:
- real time traffic management
- real time traffic information
- analyses, policy making and research
Introducing NDW
Trafficmanagement
Central source for all road authorities
Introducing NDW
Objectives
•
•
•
•
•
Less traffic jams
Predictability
Safer roads
Less emission
More collaboration
Data voor doorstroming
Happy road users
Introducing NDW
Supervisory Board
Accountability
Supervision
Supply
Selection from
data
Participating
goverments
(IDP)
Commercial
parties(EDP)
Common data
need
NDW
Individual data
supply
Road authorities
Individual
Data need
Infrastructure
supply
Demand
Service
providers
System provider
(external)
Introducing NDW
Data types - 1
Every minute, traffic data from more than 24,000
measuring sites is collected, processed and within 75
seconds distributed to the users
• Traffic flow per lane per vehicle class on 14818 measuring
sites
• Travel time (realised or estimated) per lane on 9424
measuring sites
• Traffic speed per lane per vehicle class on 13410
measuring sites
(measuring sites may produce more kind of data)
Introducing NDW
Data collection
Introducing NDW
Some figures on figures
• Over 24,000 measurement sites
• Giving aprox. 460,000 figures on speed, flow and travel
time each minute
• => >27 Million per hour
• => >600 million per day
• => >240 billion per year
+ meta data on these figures
Real-time traffic data (February 2012: 5 cm snow)
Introducing NDW
Data types - 2
On occurrence data on availability of the road is
collected
•
•
•
•
Road works, planned and actual
Reports of congestion and accidents
Status (open/closed) of bridges
Near future: Status (open/closed) of peak lanes and
regular lanes
Introducing NDW
Cooperation between CBS en NDW
• NDW collects and distributes raw data, we do not aim to
do any statistical analysis.
• CBS started with small NDW datasets (1 day) and is now
working on a larger set (3 months) to determine new
methodology
• Conclusion:
Forget everything you learned about statistics
Experiences
When to start calculating (Experiences with big data – 1)
Traditional statistical methodology:
gather and store everything and perform the statistical
analyses on certain times.
When using big data:
• This traditional way of working does not produce statistics
quicker.
• This requests huge datastores for raw data storage
• Strongly advised is starting with statistical analyses the
moment data is streaming in and storing only aggregated
in between results
• Adapt you algorithms to be able to handle correct any
unpredictable gaps in the raw data that will occur
Experiences
Technical issues (Experiences with big data - 2)
• Traditional relational databases but also statistical tools
(SPSS/SAS/R) are not fast enough, run far out of memory
and do not have enough performance for quick retrieval of
raw data.
• When using a data storage technique suitable for fast
recovery of raw data then some coding and programming
has to be done on the raw data.
• Recalculating because of wrong choices or methods takes
an increasing amount of time as the amount of raw data
grows quickly every day.
Experiences
Challenges, Choices, Changes
Devils Triangle
Contents awareness
Statistical knowledge
IT knowledge
Challenges, Choices, Changes
Challenges, Choices, Changes
Challenge
Government policy is that public data are open data, which
means our raw data are on the WWW
(www.ndw.nu/datalevering)
Anybody can download them and produce surveys, statistics,
tables, draw conclusions and publish these (long) before
statistical office does.
Be aware of publicity this might cause, discussions on ‘the
truth’ and the status of a response or statement from the
statistical office.
Take on the challenge of producing real time statistics
Challenges, Choices, Changes
Challenges, Choices, Changes
Choice
Traditional storage of raw data used for statistics is at the
statistical office.
Big data should be left at their origin and withdrawn when
needed.
Challenges, Choices, Changes
Challenges, Choices, Changes
Change
Look for appropriate IT infrastructure and develop a new way
of handling data
Challenges, Choices, Changes
www.sendsteps.com
Prepare to react; keep your phone ready!
Internet
TXT
1
Go to sendc.com
2
Log in with Session
3
Type WS3 <space> your answer
1
Text to +316 4250 0030
2
Type Session <space> WS3 <space> your answer
Posting messages is anonymous
No additional charge per message
When using Big Data for our statistics the biggest change in our way of working will
be….
Internet
Go to sendc.com and log in with Session Type WS3 <space> Your answer
TXT
Send to 06 4250 0030: Session Type WS3 <space> Your answer
Challenges, Changes and Choices when using
these amounts for Statistics
Forget everything you learned on statistics:
When to start
calculating
Immediately soon after data is
available, continuously
Were to store
what
(intermediate or
raw) data
Raw data at the providers,
intermediate results at the
statistical office
Tools and IT
techniques
No more SPSS, R, and SAS, but
programming and working with
new tools
Algorithms
Asymptotically, the time
complexity as well as the space
complexity should be of a lower
order, because of the large
volumes of data
As/When open
data is
government
policy:
Be aware of others producing
also statistics, quicker and with
other conclusions
How to produce 1 representative figure
on traffic intensity from this:
Download