Technology of Data Analytics - Big Data & Analytics Association

advertisement
Technology of Data
Analytics
INTRODUCTION
OBJECTIVE
 Data Analytics mindset – shallow and wide, deep when
you need it
 Quick overview, useful tidbits, provide a jumping off
point
AGENDA/ TOPICS
Excel
Tableua
VBA
Hadoop
Access
Analytical Packages: R/ SAS/ SPSS/ Minitab
SQL
SQUARE 1
Business and Technology
 Entity
 Attributes
 Schema
 Relational Database
 ETL - Extract Transform Load
 Data Mining
START WITH EXCEL
 It’s the easiest and most available platform
 Can teach others to maintain
Collect
Store
Analyze
Report/ Visualize
•Data Validation Drop
Downs
•vLookups
•Formulas
•If, And
•Pivot Table
•Charts
•Conditional
Formatting
•Offset
VISUAL BASIC FOR APPLICATIONS
Microsoft language
Object Oriented- noun.verb; noun.adjective=“adjective”
Record macro and play around
Modules and Userforms
Cell Referencing - cells(x,y).select
For loop – for index=startingnumber to ending number
If logicalstuff then stuff that happens end if
Use it for:
Moving data
Changing charts
GOOGLE DOCS: COLLECTION
 Somebody already did everything for you
 Google people are smarter than you
 You can use the interwebs: instead of local drive
ACCESS
 Beginning of databasing
Table
View
• Like Excel
spreadsheet
• Tightly defined
values
allowed
• Pulling info
from tables
using logic
• A lasting query
that is used to
populate
reports
Form
• Data input
Report
• Generates
reports
SQL
 Big Boy Access
 Same as Access without the bumpers and hand holding
 Real deal use in software world
 Can be used for maintenance and diagnosing software
back ends
Table
• Like Excel
spreadsheet
• Tightly defined
values allowed
View
• Pulling info from
tables using logic
• A lasting query
that is used to
populate reports
Query
• Viewing data
Stored Procedures
• Loading and
moving data
• I don’t really know
SRS
• Web based
reports
TABLEAU
 Connections
 Worksheets
 Views
 Dashboards
 Stories
HADOOP
 Virtualizes multiple computers/ servers to create a cloud computing
unit
 Hadoop Common – contains libraries and utilities needed by other Hadoop
modules.
 Hadoop Distributed File System (HDFS) – a distributed file-system that stores
data on commodity machines, providing very high aggregate bandwidth
across the cluster.
 Hadoop YARN – a resource-management platform responsible for
managing compute resources in clusters and using them for scheduling of
users' applications.
 Hadoop MapReduce – a programming model for large scale data
processing.
 Get started at: http://hadoop.apache.org/docs/current/
Analyze: SAS/ R/ SPSS/ Minitab
S.A.S.
R
• Academic/
Common
• Open source
S.P.S.S
Minitab
• IBM
• Analytical
Excel
Other
 iTunes U: Data Visualization
 CoursEra: Introduction to Data Science
 Code Academy: other programming languages
EDUCATION PROJECTS
Open Source Education – BDAA Book of Knowledge
 Stats Cheat Sheet
 Excel Guide
 SQL Guide
 How to Guides in General….
Download