BIG DATA IN EDUCATION 臺南市政府教育局 資訊中心主任:高誌健 • • • • • Technology trends Data analytics Trends Cloud computing for big data in education Big Data Analytics in the Cloud Educational Practices TECHNOLOGY TRENDS TOP TECHNOLOGY TRENDS FOR 2014 1. Emergence of the Mobile Cloud Mobile distributed computing paradigm will lead to explosion of new services. 2. From Internet of Things to Web of Things Need connectivity, internetworking to link physical and digital. 3. From Big Data to Extreme Data Simpler analytics tools needed to leverage the data deluge. 4. The Revolution Will Be 3D New tools, techniques bring 3D printing power to masses. 5. Supporting New Learning Styles Online courses demand seamless, ubiquitous approach. 6. Next-generation mobile networks Mobile infrastructure must catch up with user needs. 7. Balancing Identity and Privacy Growing risks and concerns about social networks. 8. Smart and Connected Healthcare Intelligent systems, assistive devices will improve health. 9. E-Government Interoperability a big challenge to delivering information. 10. Scientific Cloud Computing Key to solving grand challenges, pursuing breakthroughs. http://www.computer.org/portal/web/membership/Top-10-Tech-Trends-in-2014 BIG DATA: WHY NOW? 90% of the data in the world was created in the last 2 years. The average person today processes more data in a day than a person in the 1500’s entire lifetime. The LAPD is piloting a big data scheme to predict crime. An algorithm predicts where crime is likely to take place giving police teams in foothill LA the scheme.12% decrease in property crime, 26% decrease in burglary. Predictive policing is now being rolled out in 150 cities across America. The algorithm was initially developed to predict earthquakes, 43% of data gathered on people comes from social media. Twitter 100,000 tweets every minute, 650,000 shares on Facebook every minute, 144,000,000 Tweets and 936,000,000 Facebook shares every day. NETFLIX records 30 million users ‘plays’ a day, it analyses when users pause…, rewind… fast-forward…, and search… , it also knows what users like… But we’re just getting started. Augmented reality, the quantified self, the internet of things will all become ubiquitous. Data production will be 44X greater in 2020 than it was in 2009. Every day, the data mountain grows by 2.5 billion gigabytes. In 2013, all human knowledge is estimated to be 12 exabytes. 1 exabyte *1000 = 1 zetabyte = a hard drive Information is the oil of the 21st century, and analytics is the combustion engine. -Perter Sondergaard, senior vice president at Gartner https://www.youtube.com/watch?v=2D8oji5EKbM BIG DATA CHARACTERISTICS Volume Velocity Variety Unfathomable. Record-breaking. Vast. Untapped. 2.7 trillion gigabytes of data was created or replicated in 2012. Every day 2.5 quintillion bytes of data are created and the total amount of data doubles every two years. Analytics has the potential to unlock productivity growth and innovation. In 2011 there were 9 billion connected devices, and that is expected to grow to 24 billion in 2020. Value http://www.fico.com/en/Communities/Pages/BigData.aspx & https://www.youtube.com/watch?v=7D1CQ_LOizA TOP 10 MOST FUNDED BIG DATA STARTUPS Last update: March 31, 2014 Company Funding (million) Business Cloudera 1,040 Hadoop-based software, services and training Palantir 650 Analytics applications Domo 250 Business intelligence platform MongoDB 231 Document-oriented database Mu Sigma 208 Data-Science-as-a-Service Hortonworks 198 Hadoop-based software, services and training Opera Solutions 114 Data-Science-as-a-Service Talend 102 Application and business process integration platform Guavus 89 Big data analytics solution DataStax 83.7 Cassandra-based big data platform http://www.forbes.com/sites/gilpress/2013/10/30/top-10-most-funded-big-data-startups-updated/ DATA ANALYTICS TRENDS OPPORTUNITY IN TYPES OF BIG DATA Sentiment: understand how your students feel about your teaching and feedbacks-right now Clickstream: capture and analyze website visitors’ data trails and optimize your website Sensor/machine: discover patterns in data streaming automatically from remote sensors and machines Geographic: analyze location-based data to manage operations where they occur Server logs: research logs to diagnose process failures and prevent security branches Unstructured (text, video, pictures, etc…): understand patterns in files across millions of web pages, emails, and documents DATA ANALYTICS CHALLENGES Data capture at the user interaction level: in contrast to the client transaction level in the Enterprise context Summative to formative analysis As a consequence the amount of data increases significantly Greater need to analyze such data to understand user behaviors EDBT 2011 Tutorial CUSTOMER (CONSUMER) ANALYTICS Propensity and Best Next Action Sentiment analysis https://www.youtube.com/watch?v=Ga2jMY5nzzY &feature=player_embedded Behavior scoring models http://www.statsoft.com/Solutions/Cross-Industry/Customer-Analytics CLOUD COMPUTING FOR BIG DATA IN EDUCATION PARADIGM SHIFT IN COMPUTING EDBT 2011 Tutorial THE NIST DEFINITION OF CLOUD COMPUTING Essential Characteristics: Service Models: On-demand self-service. Broad network access. Resource pooling. Rapid elasticity. Measured service. Software as a Service (SaaS). Platform as a Service (PaaS). Infrastructure as a Service (IaaS). Deployment Models: Private cloud. Community cloud. Public cloud. Hybrid cloud. http://www.nist.gov/itl/cloud/ Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics (On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, Measured Service); three service models (Cloud Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud Infrastructure as a Service (IaaS)); and, four deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud). Key enabling technologies include: (1) fast widearea networks, (2) powerful, inexpensive server computers, and (3) high-performance virtualization for commodity hardware. CLOUD COMPUTING: WHY NOW? Experience with very large datacenters Technology factors Pervasive broadband Internet Maturity in Virtualization Technology Business factors Minimal capital expenditure Pay-as-you-go billing model EDBT 2011 Tutorial Unprecedented economies of scale Transfer of risk ECONOMICS OF CLOUD USERS Demand Resources Resources Capacity EDBT 2011 Tutorial • Pay by use instead of provisioning for peak Capacity Demand Time Static data center Time Data center in the cloud Unused resources Slide Credits: Berkeley RAD Lab ECONOMICS OF CLOUD USERS Demand 2 1 Time (days) Capacity 3 Lost revenue Demand 3 Resources 2 1 Time (days) Capacity EDBT 2011 Tutorial Resources Resources • Heavy penalty for under-provisioning Capacity Demand 2 1 Time (days) 3 Lost users Slide Credits: Berkeley RAD Lab CLOUD COMPUTING MODALITIES EDBT 2011 Tutorial “Can we outsource our IT software and hardware infrastructure?” Hosted Applications and services Pay-as-you-go model Scalability, fault-tolerance, elasticity, and self-manageability “We have terabytes of click-stream data – what can we do with it?” Very large data repositories Complex analysis Distributed and parallel data processing BIG DATA ANALYTICS IN THE CLOUD CHALLENGES Scalability to large data volumes: Scan 100 TB on 1 node @ 50 MB/sec = 23 days Scan on 1000-node cluster = 33 minutes Cost-efficiency: Commodity nodes (cheap, but unreliable) Commodity network Automatic fault-tolerance (fewer administrators) Easy to use (fewer programmers) EDBT 2011 Tutorial Divide-And-Conquer (i.e., data partitioning) PLATFORMS FOR BIG DATA ANALYSIS Parallel DBMS technologies Proposed in the late eighties Matured over the last two decades Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises Map Reduce pioneered by Google popularized by Yahoo! (Hadoop) EDBT 2011 Tutorial DATA ARCHITECTURE EXAMPLE 1 http://hortonworks.com/hadoop-modern-data-architecture/ DATA ARCHITECTURE EXAMPLE 2 ENTERPRISE PREDICTIVE ANALYTICS PLATFORMS FICO http://www.fico.com/ IBM SPSS http://www-01.ibm.com/software/analytics/spss/ KXEN http://www.kxen.com/ Oracle Advanced Analytics http://www.oracle.com/us/products/database/options/advanced-analytics/overview/index.html Revolution Analytics http://www.revolutionanalytics.com/ Salford Systems http://www.salford-systems.com/ SAP https://www54.sap.com/pc/analytics/business-intelligence/software/predictiveanalysis/index.html SAS http://www.sas.com/ Statsoft http://www.statsoft.com/ EXCEL DATA MINING ADD-INS 11Ants Model Builder http://www.11antsanalytics.com/ Alyuda ForecasterXL http://www.alyuda.com/forecasting-excelsoftware-with-neural-network.htm DataMinerXL http://www.dataminerxl.com/ Predixion Enterprise Insight http://www.predixionsoftware.com/predixion/ XLMiner http://www.solver.com/xlminer-data-mining OPEN SOURCE AND FREE DATA MINING TOOLS Knime http://www.knime.org/ R http://www.r-project.org/ Orange http://orange.biolab.si/ Rapid Miner http://rapid-i.com/ WEKA http://www.cs.waikato.ac.nz/~ml/ https://weka.waikato.ac.nz/ (Course) http://www.youtube.co m/watch?v=wCvnO96 d8h4 LEARNING R 中華R軟體學會 https://sites.google.com/site/zhonghuarru antixuehui/home Introducing R http://data.princeton.edu/R/default.html Try R http://tryr.codeschool.com/levels/1/challe nges/2 Data mining with R http://www.dcc.fc.up.pt/~ltorgo/DataMini ngWithR/ UCLA idre http://www.ats.ucla.edu/stat/r/ 4 MACHINE LEARNING STARTUPS Alpine Data Labs http://www.alpinedatalabs.com/ BigML https://bigml.com/ SkyTree http://www.skytree.net/ Wise.io http://about.wise.io/ EDUCATIONAL PRACTICES BIG DATA IN EDUCATION CLOUD COMPUTING IN EDUCATIONAL PRACTICES Two issues: Educational resources and necessary applications Examples: Providing lower level cloud services (such as data storage) Open educational resources were produced, researched, collected, and shared. Hosting learning management systems (LMSs) in the cloud. Providing individual bundled applications in the cloud. (e.g. Google Apps for education or Microsoft Live@edu with office 365) that combine tools for communication and collaboration, office tools for working with documents, and space to store and synchronize data on demand. CLOUD SERVICE NEEDS AND USES Cloud Computing in Education and Student's Needs by E. Krelja Kurelović, S. Rako, and J. Tomljanović About cloud service & computing in Tainan 150000 teacher&student single-sign-on->completed Iaas & paas & saas ->completed All over 168 application & data(resource) THANKS FOR YOUR ATTENTION