Technology of Data Analytics INTRODUCTION OBJECTIVE Data Analytics mindset – shallow and wide, deep when you need it Quick overview, useful tidbits, provide a jumping off point AGENDA/ TOPICS Excel Tableua VBA Hadoop Access Analytical Packages: R/ SAS/ SPSS/ Minitab SQL SQUARE 1 Business and Technology Entity Attributes Schema Relational Database ETL - Extract Transform Load Data Mining START WITH EXCEL It’s the easiest and most available platform Can teach others to maintain Collect Store Analyze Report/ Visualize •Data Validation Drop Downs •vLookups •Formulas •If, And •Pivot Table •Charts •Conditional Formatting •Offset VISUAL BASIC FOR APPLICATIONS Microsoft language Object Oriented- noun.verb; noun.adjective=“adjective” Record macro and play around Modules and Userforms Cell Referencing - cells(x,y).select For loop – for index=startingnumber to ending number If logicalstuff then stuff that happens end if Use it for: Moving data Changing charts GOOGLE DOCS: COLLECTION Somebody already did everything for you Google people are smarter than you You can use the interwebs: instead of local drive ACCESS Beginning of databasing Table View • Like Excel spreadsheet • Tightly defined values allowed • Pulling info from tables using logic • A lasting query that is used to populate reports Form • Data input Report • Generates reports SQL Big Boy Access Same as Access without the bumpers and hand holding Real deal use in software world Can be used for maintenance and diagnosing software back ends Table • Like Excel spreadsheet • Tightly defined values allowed View • Pulling info from tables using logic • A lasting query that is used to populate reports Query • Viewing data Stored Procedures • Loading and moving data • I don’t really know SRS • Web based reports TABLEAU Connections Worksheets Views Dashboards Stories HADOOP Virtualizes multiple computers/ servers to create a cloud computing unit Hadoop Common – contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce – a programming model for large scale data processing. Get started at: http://hadoop.apache.org/docs/current/ Analyze: SAS/ R/ SPSS/ Minitab S.A.S. R • Academic/ Common • Open source S.P.S.S Minitab • IBM • Analytical Excel Other iTunes U: Data Visualization CoursEra: Introduction to Data Science Code Academy: other programming languages EDUCATION PROJECTS Open Source Education – BDAA Book of Knowledge Stats Cheat Sheet Excel Guide SQL Guide How to Guides in General….