Midterm Review CSE4/587 B.Ramamurthy 5/28/2016 B.Ramamurthy 1 Exam Date October 22, 2010 Please bring Pencils, pens and erasers. This is a closed book exam. You can bring 4 sheets of any information you need. 5/28/2016 B.Ramamurthy 2 Topics Defining data intensive computing Relationship between data-intensive computing and cloud computing Lucene Enabling Technologies (ET): ET1: Web services ET2: Virtualization ET3: Special data structures and algorithms (MapReduce Engine, HDFS) Google App Engine Project 1 5/28/2016 B.Ramamurthy 3 Data-intensive computing competencies 1. Describe data-intensive computing concepts 2. Compute with data-intensive computing concepts 3. Recognize a data-intensive problem. 4. Identify the scale of data. 5. Analyze the data requirements of a problem. 6. Describe the data layout and define the data repository format 7. Decide the algorithms (Ex: MapReduce) 8. Define application-specific algorithms and analytics 9. Design the data-intensive program solution and system configuration. 10. Implement the data-intensive solution and test the solution for functional correctness and non-functional requirements. 11. Research algorithmic improvements and experiment with them. 12. Write a report summarizing the solution and results. 13. Incorporate services from cloud computing platforms. 14. Integrate semantic information in organizing data for contextawareness. 15. Apply collective intelligence methods for diverse data sources. 16. Formulate data-intensive visualization solutions for presenting the results. 5/28/2016 B.Ramamurthy 4 Take home questions: Q1 Read the list of competencies and check whether you have learned this competency in your project 1 and/or in the course lessons. Item# from slide 4 5/28/2016 Covered in Prj1 Covered in lessons B.Ramamurthy Where? How? What extent (%) 5 Take home questions: Q2 In order handle the massive load we had in out recent deployment of the Pop!World application Google App Engine (GAE), we deployed 20 instances of the application and gave the users the 20 links and let them choose their link that was available when they it ran out of quota. Design a front end (method) using GAE resources or others that will (i) hide the 20 instances or n instances and (ii) direct the user to the one of the unloaded app. (The policy could even be a simple round robin). 5/28/2016 B.Ramamurthy 6 Format of in-class exam 4 questions Enabling technologies: Webservices, virtualization, Mapreduce, HDFS, infrastructure management Amazon cloud environment (read the blue book) 5/28/2016 B.Ramamurthy 7 How to study? 5/28/2016 B.Ramamurthy 8