I2 DBDA Chairs call June 11 2015 v1

advertisement
Internet2 innovation Working Group
Distributed Big Data & Analytics (DBDA)
1
Chairs call – June 11, 2015
Attendees:
 Chairs – Alex Feltus - Clemson, Sam Gustman - USC, Marc Hoit – NC State
 Internet2 - Florence Hudson, Rick McMullen, Bob Brammer, Khalil Yazdi,
Ann O’Beay, Giselle Trent
Opportunity for Internet2 to add value in Distributed Big Data &
Analytics:
• Internet2 good for moving data, consider extending to support data
analytics
• Internet2 is able to convene the community and communities-of-practice at
scale, in U.S. and globally
• Internet2 is able to bring services forward at scale and tuned to researcher
needs, working with both Internet2 internal and external service providers
• Internet2 can work with the community to identify good models of
collaboration across IT, research & libraries on campuses to handle the
"data problem" (responding to agency requirements, IP control, compliance,
etc.)
Challenges the Internet2 community and users face re DBDA:
• Hard to use the network – complicated sets of IT issues – need a cookbook
for researchers – how to use the network, get access to services, use
services
• Networking, nationally and globally – many layers of challenges (including –
the problem of fast networks, slower networks, slow end devices such as
spinning disk)
• Data analytics and proximity of data and analytical tools
• Distributed data, analytics and network speeds
• Inclusion of librarians and those responsible for curation, archiving and
Internet2 innovation Working Group
Distributed Big Data & Analytics (DBDA)
2
preservation
• Researchers don't know what Internet2 is or how to access and leverage
the network, including for their big data needs
• Current research data sharing is "excel" scale, vs. needed Exascale,
Genomics needs Gigascale
• Need to serve the "missing middle"
• End to End 100Gb connectivity internationally, or even domestically, is
"impossible" ... today
• Even within a country or region, there are different layers of connectivity ...
some @100Gb, some at 1Gb...some slower, etc.
• Standards can take ~24 months or longer to bake, then uptake starts,
based on input from InCommon report
• Storage bottlenecks can be a challenge, corollary to network bottlenecks
• User "pre-sales" and "sales" support needed, to help potential users
prepare for Internet2 network connectivity, then use it most effectively
• Measuring TCP window sizing
• Distributed big data repositories and how to get the data through
Internet2 • Need less dilution of case studies, to share core technical information • Need cyber-infrastructure experts AND storage experts AND research
scientists on the phone to set up a system that will work when you turn it
on
• Need end to end performance monitoring and tuning and support
• Traffic shaping rules
• Identify network and storage and computing bottlenecks
• Network may be considered speed problem, but the true bottleneck can be
writing to local spinning disk
• Especially when going over 3 or 4 international research networks (e.g.
USC to Europe to Prague)
• Network trouble shooting
• End user infrastructure and support desk needed for Network trouble
Internet2 innovation Working Group
Distributed Big Data & Analytics (DBDA)
3
shooting
• How best to aggregate data and transfer
• Internet2 interacting with CIO organizations, but not researchers
• Digital preservation of data
•
=> make sure it's not physically rotting...newer technology rots faster
Potential strategies to address DBDA challenges:
• Communicating what is available to researchers – they don’t know what
they may have access to • Establishing relationships between and among researchers and service
providers (university and industry) around the particulars of use cases –
need to develop a Community of Practice
• Establish community protocols for data access and sharing of data
• Managing data repositories
• Engage NSF
• NSF big data hub
• Regional focus...e.g. GIS in southeastern US
• Identify a few potential use cases
• Power grid monitoring use case, with geographically isolated
sensors...Petabytes of data needed for regional sensor data from power grid
monitoring, near real time data aggregation, Internet2 in proposal as a
partner for SouthEastern US GIS regional project
• Economically underserved researchers in U.S. or Africa, thy are note
currently accessing large data sets and don't know it's there or how to
access
• Agriculture
• Libraries
• Enable community sharing of core technical information and details of use
case studies
• Create CORBA - Common Object Request Broker API to query across
datasets
Internet2 innovation Working Group
Distributed Big Data & Analytics (DBDA)
• Include libraries, scientists and engineers
• Create the "Library of Things" API
4
Download