Cyber-Infrastructure in Education South Carolina State University Cyberinfrastructure Day March 3 2011 Geoffrey Fox gcf@indiana.edu http://www.infomall.org http://www.futuregrid.org Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington Types of Activities • Cyberinfrastructure ranges from a web page through a petaflop supercomputer • Research has substantial needs such as either – Petaflop supercomputer – Ability to analyze many (upto 100 now) terabytes of data (on a cloud) • Education needs – Access to results of cyberinfrastructure research – Broad access to scholarly information (digital library) – Teach students about e-Science (domain science) and Cyberinfrastructure (Computer Science) – Exploit electronic infrastructure to enhance learning Access to results of cyberinfrastructure research • Portals are the access points to electronic resources – For example Amazon.com is an access point to an electronic shop • e-Science projects have a portal interface for their scientists – Some have education components – Some interest in producing education oriented interfaces by outsiders but no clear initiative? Broad access to scholarly information (digital library) • National Science Digital Library http://nsdl.org/ • There is an interesting discussion of role of University libraries in preservation of “data produced by faculty” • Curriculum libraries such as that at MIT or HPCUniversity • Collections of articles maintained by publishers and professional societies have problems due to charges – Role of centralized and de-centralized collections still not agreed • Google (for example) is keen to “own all data” including digital books and even science data if can be linked to Google Earth! – But this has opposite problem of preserving Intellectual property (seen clearly in music piracy) • Note MapReduce perfect for analyzing such data Teach students about e-Science (domain science) and Cyberinfrastructure (Computer Science) • This can be quite sophisticated as in difficult parallel algorithms • As in portals, one can leverage research investments • Does not need students to run petaflop simulations – Should be able to capture essence of computational/science issues in smaller runs – Appliances (see later) can be used • FutureGrid possible site • Note clouds very popular with students as many commercial jobs in development and use companies – As well as for CS research and as vehicle for domain science Exploit electronic infrastructure to enhance learning • Several quite old approaches are critical and dominant – “Just a bunch of web pages” aka digital library – Video conferencing – Shared material as in Webex, Adobe Connect • Note asynchronous interaction via Twitter, Blackboard, Google docs etc. much easier (and successful) than synchronous (Polycom, access grid, Webex) approaches • Interactive web learning environments such as www.whyville.net • Virtual worlds such as Second Life have not taken off but some think this will change as performance of clients and networks are improving dramatically (VRML failed ~1999) • Must move to an environment consistent with world view of current students aka the “Twitter University” C4 = Continuous Collaborative Computational Cloud C4 Education Vision C4 EMERGING VISION While the internet has changed the way we communicate and get entertainment, we need to empower the next generation of engineers and scientists with technology that enables interdisciplinary collaboration for lifelong learning. Today, the cloud is a set of services that people explicitly have to access (from laptops, desktops, etc). In 2020 the C4 will be part of our lives, as a larger, pervasive, continuous experience. The measure of success will be how “invisible” it becomes. C4 Education will exploit advanced means of communication, for example, “Tabatar” conference tables as clients , with real-time language translation, contextual awareness of speakers, support for people with disabilities; servers supporting collaboration between learners and teachers through “virtual worlds” generalizing Twitter Clouds with MapReduce frontends, Second Life …… C4 Society Vision We are no prophets and can’t anticipate what exactly will work, but we expect to have high bandwidth and ubiquitous connectivity for everyone everywhere, even in rural areas (using power-efficient micro data centers the size of shoe boxes). Here the cloud will enable business, fun, destruction and creation of regimes (societies) Higher Education 2020 Computational Thinking Modeling & Simulation C(DE)SE C4 I N C4 C4 Intelligent Society TE L Continuous L I Collaborative Computational G E Cloud N C E Internet & Cyberinfrastructure Motivating Issues job / education mismatch Higher Ed rigidity Interdisciplinary work Engineering v Science, Little v. Big science CDESE is Computational and Dataenabled Science and Engineering C4 Intelligent Economy C4 Intelligent People NSF Educate “Net Generation” Re-educate pre “Net Generation” in Science and Engineering Exploiting and developing C4 C4 Curricula, programs C4 Experiences (delivery mechanism) C4 REUs, Internships, Fellowships Educational appliances • One component of C4 • A flexible, extensible platform for hands-on, laboriented education (on FutureGrid) • Need to support appliances representing clusters of resources • Virtual machines + social/virtual networking to create sandboxed modules – Virtual “Grid” appliances: self-contained, pre-packaged execution environments – Group VPNs: simple management of virtual clusters by students and educators Why use Virtualization? • Traditional ways of delivering hands-on training and education in parallel/distributed computing have non-trivial dependences on the environment • Difficult to replicate same environment on different resources (e.g. HPC clusters, desktops) • Difficult to cope with changes in the environment (e.g. software upgrades) • Virtualization technologies remove key software dependences through a layer of indirection Appliance Infrastructure - guiding principles • Fidelity: activities should use full-fledged, executable software: education/training modules – Learn using the proper tools • Reproducibility: Creators of content should be able to install, configure, and test their modules once, and be assured of the same functional behavior regardless of where the module is deployed – Incentive to invest effort in developing, testing and documenting new modules Appliance Infrastructure - guiding principles • Deployability: Students and users should be able to deploy modules in a simple manner, and in a variety of resources – Reduce barriers to entry; avoid dependences upon a particular infrastructure • Community-oriented: Modules should be simple to share, discover, reuse, and expand – Create conditions for “viral” growth Towards this vision in FutureGrid • Executable modules – virtual appliances – Deployable on FutureGrid resources – Deployable on other cloud platforms, as well as virtualized desktops • Community sharing – Web 2.0 portal, appliance image repositories – An aggregation hub for executable modules and documentation Virtual appliance example • Linux, Java, Hadoop, configuration scripts Hadoop image A Hadoop worker Another Hadoop worker instantiate copy Virtualization Layer Repeat… Virtual cluster appliances • Virtual appliance + virtual network Virtual network Hadoop + Virtual Network A Hadoop worker Another Hadoop worker instantiate copy Virtual machine Repeat…