Set-01 Big Data Analytics with Hadoop / Spark Group A Time 2 Hrs Each question carries 2 marks 1. The data being captured can be in any form or structure. Which characteristic of Big Data are talking about? a. Volume b. Velocity c. Variety d. Value 2. Which of the following is not an external data sources? a. Data from CRM b. Data from Web Logs c. Data from government sources d. Data from market surveys 3. How Big data analytics help can prevents fraud? a. Analyze all the data b. Detect fraud in real time c. Use Predictive analytics d. All the above 4. Identify the technologies that enable fraud identification and the predictive modeling process a. Text Mining b. Social Media data analysis c. Regression Analysis d. All the above 5. Which message is generated by a DataNode to indicate its connectivity with NameNode? a. Beep b. Heartbeat c. Analog pulse d. Map 6. Which of the following service is provided by YARN? a. Global resource management b. Record reader c. MapReduce engine d. Data Mining 7. Why Big Data applications susceptible to latency? a. Big data may reside in a different location from the application b. The volume of Big data is too large to be analyzed rapidly c. Big data cannot use in-memory computing d. Big Data applications are still in the early stages of development 8. Which of the following is managed by the MapReduce environment? a. Web logs b. Images c. Structured data d. Unstructured data 9. Which of the following terms is used to denote the small subsets of a large files created by HDFS a. Name Node b. Data Node c. Blocks d. Namespace 10. In an HDFS cluster who manages cluster metadata a. Name Node b. Data Node c. Inode d. Namespace 11. Which of the following options most aptly explains the reason behind the creation of MapReduce? Select all that apply. a. Need to increase the processing power of new hardware b. Need to perform complex analysis of structured data c. Need to increase the number of web users d. Need to spread distributed computing 12. Which of the following describes the Mapper Function? a. It processes data to create a list of Key-Value pairs b. It indexes the data to list all the words occurring in it c. It convert a relational database to key-value pairs d. It tracks data across multiple tables and clusters in Hadoop 13. Which of the following describes the reduce function> a. It analyzes the map function results to show the most frequently occurring values b. It combines the map function results to return a list of the best matches for the query c. It adds the results of the map function to convert the key value pair lists to columnar database d. It processes map function results and creates a new Key value pair list to answer the query 14. In the MapReduce framework, map and reduce functions can be run in any order. Do you agree and why? a. Yes, because in the functional programming, the order of execution is not important b. Yes, because the functions use Key Value Pair as an input and output and order is not important c. No, because the output of the map function is the input for the reduce function d. No, because the output of the reduce function is the input for the map function 15. Which is the framework enables to store large volumes of data in a distributed manner across multiple clusters of machines a. Hadoop b. Hive c. Pig d. Scoop 16. Which type of the following jobs MapReduce is suitable for? a. Graph Processing b. Batch Processing c. Real Time Processing d. Stream Processing 17. YARN Stand For a. Yet Another Resource Name b. Yet Another Resource manager c. Yet Another Recovery Manager d. Yet Another Resource Navigator Group B Each question carries 3 marks 18. Discuss 3 advantages of YARN over MapReduce. 19. List some major functions of the Big Data architecture model. 20. What is Metadata ? What information does it provides? 21. List and define the four basic elements of Big Data? 22. Define the various data types of Big Data? 23. Define the function of Scoop and Hive?