19UCS942 BIG DATA ANALYTICS L T P C 2 0 2 3 PRE-REQUISITE: Objectives: 1. To study the basic technologies that forms the foundations of Big Data. 2. To understand the specialized aspects of big data including big data application, and big data analytics. 3. To study different types Case studies on the current research and applications of the Hadoop and big data in industry UNIT I INTRODUCTION TO BIG DATA ANALYTICS 10 Data – Data Life Cycle - History or Evolution and Definition of Big Data – Sources, Characteristics and Benefits of Big Data - Traditional Data Versus Big Data – Application of Big Data -Types of digital data: Structured, Unstructured and Semi-structured - Applications of Big Data. Data Analytics: Overview – Classification of Analytics – Analytical tools - Big Data Analytics Important - Data science and Scientist and Responsibility of the Data Scientist – Terminology Used in Big Data Environment UNIT II 10 BIG DATA TECHNOLOGIES Hadoop Overview – Hadoop Cluster: Architecture – Components – Work-Flow of file stored in Hadoop. Hadoop Distributed File System [HDFS]: HDFS Daemons – Anatomy of File Read – Anatomy of File Write – Replica Placement strategy – Working with HDFS Commands – Special Features of HDFS. Processing Data with Hadoop using MapReduce: Job Tracker and Task Tracker Interaction – Workflow and Architecture – Examples. Interaction with Hadoop Ecosystem: pig – Hive – HBase. 15 LAB EXERCISE 1. To install and configure Hadoop application USING Ubuntu 2. Working on the following HDFS Basic Shell Commands i. Creates the directory in HDFS, ii. Display the summary of file lengths, iii. To append the text from existing file text. And save the updated text on existing file, iv. Copying the file from the local file system to HDFS, v. Copying the file from HDFS to Local file system, vi. Copy single source or multiple sources from local file system to the designation file system. 3. Working on the following HDFS Basic Shell Commands i. Count the number of directories, files and bytes under the paths that much the specified file Pattern, ii. Remove the file from HDFS, iii. Remove the directory to HDFS, iv. Display the disk usage for all the files available under a given directory. 4. Develop Hadoop Application to count the number of Characters and words and each Character frequency for the given input. The content of the file is as follows: H ello I am GeeksforGeeks How can I help you How can I assist you Are you an engineer Are you looking for coding 5. Write a python program to count the occurrence of similar words. In a file.usepartioner to partition key based on alphabets Input Data : Welcome to hadoop session Introduction to hadoop Introducing HIVE HIVE Session 6. Write a python program to count the occurrence of similar words. Input data:- Welcome to SIT 7. Write a python program to count the number of characters in a given word Input Data:- Bigdata analytics 8. To write a Python program to count the number of characters and words and each Character frequency of the given input. Input data:- Student coordinator of SIT 9. To write a program using python to count the occurrence of similar words. Input Data:- Welcome to CSE UNIT III 10 BIG DATA DATABASE MongoDB: Introduction – Database Model – Installation and Configuration – Mongo Shell Command: Import CSV, TSV or JSON data into MongoDB, Create, Insert, Update, Delete, and Read the Database and collection - Query Documents: Selector, Projector – MongoDB Document Data Model Approach: Document Databases: Command setting up and running MongoDB, MangoDB Shell Commands, Reading and Writing Data to MangoDB LAB EXERCISE 15 1. MongoDB Commands For Creating The Database, Show The Current Working Databases, Insert Items In Database, Show The Existing Database And Dropping The Database 2. MongoDB Commands to Create The Collection, Show The Collection And Dropping The Collections. 3. MongoDB Commands to Perform Insert A Values In The Document And To Find The Values in the Document. 4. MongoDB Commands to Perform Update And Delete Document 5. MongoDB Commands to Perform Logical Query Operations 6. Adding an Elements into an Arrays 7. Consider a Restaurant and write a MongoDB Query to Various Operations 8. Sorting using Ascending Order, Descending Order, By Using Limit and Skip Methods 9. Create Database Emp And Make Collection With Name "Empl" And Perform Update, Add And Remove 10. MapReduce Using MongoDB TOTAL:60 Periods CO1 CO2 Elaborate the Big Data Analytics, Types, Tools, Database and the Big Data Technologies. Apply various Techniques and Tools of Processing data with Hadoop to solve the problems relevant to Big Data Analytics. Understand Apply CO3 CO4 CO5 CO6 Analyze various technologies, Tools and databases by means of Big Data Analytics. Create the program using various big data technologies and databases. Work individually and as a member in multidisciplinary teams Analysis Create Individual and Team Work Practice in groups to demonstrate the Big Data Technologies Affective Domain using any innovative tool. Text Book 1. Seema Acharya and Subhashini Chellappan, “Big Data and Analytics”, Wiley India Pvt. Ltd., 2016 Reference Books 1. “Big Data Science and Analytics: A Hands-On Approach” Arshdeep Bahga and Vijay Madisetti, ISBN: 978-1-949978-00-1 2. “Big Data Analytics Beyond Haddop”, Vijay Agneeswaran, Pearson Education, Inc 3. “Big Data for Dummies” by Judith Hurwitz, Alan Nugent, Dr. Fern Halper and Marcia Kaufman, Wiley Publications, 2014. 4. . “Hadoop: The definitive Guide”, Tom White, O'Reilly Media, 2010