Outline
• Class Intros
– What are your goals?
– What types of problems? datasets?
• Overview of Course
• Example Research Project
Breadth vs. Depth vs. Relevancy
Class
Project
Question
Hypothesis
Are height and weight related?
Data
Analytics
Charts
270
250
230
210
190
170
150
57 62 67
Height (inches)
72 77
Answer
Question Can we put a person on Mars by 2025?
Hypothesis
Data
Analytics
Charts
Answer
Question
Location Crime
Square Feet
Hypothesis
Data
Analytics
Charts
Answer
Number of
Variables
Analyzed
4
3
6+
5
2
1
1 2 3 4
Week
5 6 7
Software Statistics Data Analysis Data Mining
Predictive
Analytics
Data Visualization - Mathematics
Mean
Standard
Deviation
Correlation
Temperature Variation
Across Cities in 2011
Boston
30
San Francisco
30 60
San Diego
90
30 60 90
Austin
30 60
60
90
Tampa Bay
30 60
90
90
Normal Distribution
Distribution of Height
Normal Distribution
Outliers
Identify
Remove?
Correlation
• To what degree are two variables related?
270
250
230
210
190
170
150
57 62 67
Height (inches)
72 77
Excel Pivot Table
Excel
Analysis
Pak
R / R-Studio
Write Code/ Program
Input Data
Analyze
Graphics
Enter Commands
View Results
Datasets, etc.
Currently, how many R Packages?
At the command line enter:
dim(available.packages())
available.packages()
Correlation Matrix
Height
Y
Y
X1 X2 X3
X’s
X4