Data - John Verostek

advertisement

Outline

• Class Intros

– What are your goals?

– What types of problems? datasets?

• Overview of Course

• Example Research Project

Breadth vs. Depth vs. Relevancy

Class

Project

Question

Hypothesis

Are height and weight related?

Data

Analytics

Charts

270

250

230

210

190

170

150

57 62 67

Height (inches)

72 77

Answer

Question Can we put a person on Mars by 2025?

Hypothesis

Data

Analytics

Charts

Answer

Question

What determines housing prices?

Location Crime

Square Feet

Hypothesis

Data

Analytics

Charts

Answer

Number of

Variables

Analyzed

4

3

6+

5

2

1

1 2 3 4

Week

5 6 7

Software Statistics Data Analysis Data Mining

Predictive

Analytics

Data Visualization - Mathematics

Mean

Standard

Deviation

Correlation

Temperature Variation

Across Cities in 2011

Boston

30

San Francisco

30 60

San Diego

90

30 60 90

Austin

30 60

60

90

Tampa Bay

30 60

90

90

Normal Distribution

Distribution of Height

Normal Distribution

Outliers

Identify

Remove?

Correlation

• To what degree are two variables related?

270

250

230

210

190

170

150

57 62 67

Height (inches)

72 77

Excel Pivot Table

Excel

Analysis

Pak

R / R-Studio

Write Code/ Program

Input Data

Analyze

Graphics

Enter Commands

View Results

Datasets, etc.

Currently, how many R Packages?

At the command line enter:

 dim(available.packages())

 available.packages()

Correlation Matrix

Height

Multivariate Regression

Y

Y

X1 X2 X3

X’s

X4

Download