Cyber-Infrastructure in Education - Digital Science Center

advertisement
Cyber-Infrastructure in Education
South Carolina State University
Cyberinfrastructure Day
March 3 2011
Geoffrey Fox
gcf@indiana.edu
http://www.infomall.org http://www.futuregrid.org
Director, Digital Science Center, Pervasive Technology Institute
Associate Dean for Research and Graduate Studies, School of Informatics and Computing
Indiana University Bloomington
Types of Activities
• Cyberinfrastructure ranges from a web page through a
petaflop supercomputer
• Research has substantial needs such as either
– Petaflop supercomputer
– Ability to analyze many (upto 100 now) terabytes of data
(on a cloud)
• Education needs
– Access to results of cyberinfrastructure research
– Broad access to scholarly information (digital library)
– Teach students about e-Science (domain science) and
Cyberinfrastructure (Computer Science)
– Exploit electronic infrastructure to enhance learning
Access to results of
cyberinfrastructure research
• Portals are the access points to electronic
resources
– For example Amazon.com is an access point to an
electronic shop
• e-Science projects have a portal interface for
their scientists
– Some have education components
– Some interest in producing education oriented
interfaces by outsiders but no clear initiative?
Broad access to scholarly information
(digital library)
• National Science Digital Library http://nsdl.org/
• There is an interesting discussion of role of University
libraries in preservation of “data produced by faculty”
• Curriculum libraries such as that at MIT or HPCUniversity
• Collections of articles maintained by publishers and
professional societies have problems due to charges
– Role of centralized and de-centralized collections still not agreed
• Google (for example) is keen to “own all data” including
digital books and even science data if can be linked to Google
Earth!
– But this has opposite problem of preserving Intellectual property
(seen clearly in music piracy)
• Note MapReduce perfect for analyzing such data
Teach students about e-Science (domain science)
and Cyberinfrastructure (Computer Science)
• This can be quite sophisticated as in difficult parallel
algorithms
• As in portals, one can leverage research investments
• Does not need students to run petaflop simulations
– Should be able to capture essence of
computational/science issues in smaller runs
– Appliances (see later) can be used
• FutureGrid possible site
• Note clouds very popular with students as many
commercial jobs in development and use companies
– As well as for CS research and as vehicle for domain
science
Exploit electronic infrastructure to
enhance learning
• Several quite old approaches are critical and dominant
– “Just a bunch of web pages” aka digital library
– Video conferencing
– Shared material as in Webex, Adobe Connect
• Note asynchronous interaction via Twitter, Blackboard,
Google docs etc. much easier (and successful) than
synchronous (Polycom, access grid, Webex) approaches
• Interactive web learning environments such as
www.whyville.net
• Virtual worlds such as Second Life have not taken off but
some think this will change as performance of clients and
networks are improving dramatically (VRML failed ~1999)
• Must move to an environment consistent with world view of
current students aka the “Twitter University”
C4 = Continuous Collaborative
Computational Cloud
C4 Education Vision
C4 EMERGING VISION
While the internet has changed the way
we communicate and get
entertainment, we need to empower
the next generation of engineers and
scientists with technology that enables
interdisciplinary collaboration for
lifelong learning.
Today, the cloud is a set of services that
people explicitly have to access (from
laptops, desktops, etc). In 2020 the C4
will be part of our lives, as a larger,
pervasive, continuous experience. The
measure of success will be how
“invisible” it becomes.
C4 Education will exploit advanced means of
communication, for example, “Tabatar”
conference tables as clients , with real-time
language translation, contextual awareness of
speakers, support for people with disabilities;
servers supporting collaboration between
learners and teachers through “virtual
worlds” generalizing Twitter Clouds with
MapReduce frontends, Second Life ……
C4 Society Vision
We are no prophets and can’t anticipate what
exactly will work, but we expect to have high
bandwidth and ubiquitous connectivity for
everyone everywhere, even in rural areas
(using power-efficient micro data centers the
size of shoe boxes). Here the cloud will enable
business, fun, destruction and creation of
regimes (societies)
Higher Education 2020
Computational Thinking
Modeling
& Simulation
C(DE)SE
C4 I
N
C4
C4 Intelligent Society
TE
L
Continuous
L
I
Collaborative
Computational G
E
Cloud
N
C
E
Internet &
Cyberinfrastructure
Motivating
Issues
job / education mismatch
Higher Ed rigidity
Interdisciplinary work
Engineering v Science, Little v. Big science
CDESE is Computational and Dataenabled Science and Engineering
C4 Intelligent Economy
C4 Intelligent People
NSF
Educate “Net Generation”
Re-educate pre “Net Generation”
in Science and Engineering
Exploiting and developing C4
C4 Curricula, programs
C4 Experiences (delivery mechanism)
C4 REUs, Internships, Fellowships
Educational appliances
• One component of C4
• A flexible, extensible platform for hands-on, laboriented education (on FutureGrid)
• Need to support appliances representing clusters of
resources
• Virtual machines + social/virtual networking to
create sandboxed modules
– Virtual “Grid” appliances: self-contained, pre-packaged
execution environments
– Group VPNs: simple management of virtual clusters by
students and educators
Why use Virtualization?
• Traditional ways of delivering hands-on training and
education in parallel/distributed computing have
non-trivial dependences on the environment
• Difficult to replicate same environment on different resources (e.g.
HPC clusters, desktops)
• Difficult to cope with changes in the environment (e.g. software
upgrades)
• Virtualization technologies remove key software
dependences through a layer of indirection
Appliance Infrastructure - guiding
principles
• Fidelity: activities should use full-fledged, executable
software: education/training modules
– Learn using the proper tools
• Reproducibility: Creators of content should be able to
install, configure, and test their modules once, and
be assured of the same functional behavior
regardless of where the module is deployed
– Incentive to invest effort in developing, testing and
documenting new modules
Appliance Infrastructure - guiding
principles
• Deployability: Students and users should be
able to deploy modules in a simple manner,
and in a variety of resources
– Reduce barriers to entry; avoid dependences upon
a particular infrastructure
• Community-oriented: Modules should be
simple to share, discover, reuse, and expand
– Create conditions for “viral” growth
Towards this vision in FutureGrid
• Executable modules – virtual appliances
– Deployable on FutureGrid resources
– Deployable on other cloud platforms, as well as
virtualized desktops
• Community sharing – Web 2.0 portal,
appliance image repositories
– An aggregation hub for executable modules and
documentation
Virtual appliance example
• Linux, Java, Hadoop, configuration scripts
Hadoop
image
A Hadoop worker
Another Hadoop worker
instantiate
copy
Virtualization
Layer
Repeat…
Virtual cluster appliances
• Virtual appliance + virtual network
Virtual
network
Hadoop
+
Virtual
Network
A Hadoop worker
Another Hadoop worker
instantiate
copy
Virtual
machine
Repeat…
Download