Data Analysis with MS® Excel™ 2007
(Special Session)
A short course by:
Stanley T. Schuyler, D.Sc.
Math and Computer Science Dept.
Edinboro University of PA.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
1
Data Analysis Orientation - 1
• The theme for today is data analysis using Excel™
– To understand “what does the data mean?” for
some business purpose.
– To do this requires manipulating the data set:
• To reduce the “mass” of data to a summarized set of
“useful” characterizations
• To identify patterns in the data
• To identify relationships or predictors in the data
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
2
Data Analysis Orientation -2
• We will begin with a model of a “marketing-like”
data set
– It contains survey participant information
represented in two forms:
• coded values (e.g. Likert scales about “something”)
• real values (e.g. a participants “age” in years)
• missing and erroneous values
– It contains mostly codes which represent
something else.
– The next slide displays the first 30 or so rows of
the “monster” data set
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
3
“Model” Data Set Situation
Location Age
3
21
3
25
3
25
3
25
1
25
3
25
3
25
1
27
2
27
3
27
3
29
1
30
2
31
3
31
2
32
3
35
2
36
1
39
3
41
3
42
1
48
3
48
1
50
2
51
3
53
1
53
3
56
3
57
3
81
3
21
2
22
3
22
3
24
1
25
Gender Fam-Size Pres-Add Own-Rent Income educ Employ Cleanliness Hours Prices Service Overall-Imp
1
4
1
1
2
3
1
2.00
5.00
5.00
5.00
8
1
4
1
2
2
1
2.00
3.00
2.00
2.00
10
1
4
1
1
2
5
1
3.00
5.00
5.00
5.00
8
1
3
1
1
2
5
1
3.00
3.00
3.00
1.00
10
1
2
1
2
3
3
1
5.00
5.00
5
1
2
1
2
3
5
1
5.00
5.00
10
1
2
1
2
3
5
1
3.00
4.00
4.00
4.00
5
1
3
1
2
3
5
1
4.00
5.00
5.00
6
1
3
1
1
3
5
1
4.00
5
1
3
1
1
3
3
1
4.00
4.00
4.00
8
1
2
1
1
4
4
1
3.00
4.00
1.00
3.00
5
1
5
1
1
3
2
1
4.00
4.00
4.00
3.00
8
1
4
1
2
3
2
1
2.00
5.00
5.00
5.00
7
1
3
1
1
4
2
1
3.00
5.00
5.00
3.00
5
1
4
1
1
3
8
1
3.00
5.00
5.00
4.00
9
1
5
1
1
3
2
1
7
1
2
1
1
3
2
1
5.00
5.00
1
5
1
2
2
2
2
4.00
5.00
3.00
3.00
5
1
4
1
1
4
9
1
3.00
5.00
5.00
3.00
8
1
6
1
1
2
3
2
3.00
3.00
3.00
1.00
8
1
2
1
1
3
6
1
5.00
3.00
10
1
2
1
2
4
6
1
4.00
10
1
2
1
1
2
1
3.00
5.00
8
1
2
1
1
4
4
1
4.00
4.00
9
1
2
1
1
3
1
1.00
1.00
5
1
2
1
1
2
5
1
4.00
5.00
4.00
4.00
8
1
2
1
1
2
1
1
1.00
2.00
1
1
1
4
4
1
3.00
3.00
3.00
2
1
2
1
1
4
6
2
3.00
3.00
3.00
3.00
2
3
1
2
1
2
2
4.00
5.00
4.00
3.00
10
2
2
1
2
1
4
1
5.00
5.00
4.00
3.00
8
2
1
1
2
2
3
1
2.00
4.00
3.00
4.00
7
2
2
1
1
2
5
1
3.00
3.00
3.00
2.00
10
2
2
1
2
2
5
1
4.00
6
CCDebt
p(purchase)
8369
0.57
5831
0.34
5831
0.18
5831
0.18
15468
0.40
15468
0.30
15468
0.30
13230
0.21
13230
0.21
13230
0.29
16525
0.21
10882
0.27
10282
0.25
14629
0.28
9749
0.08
8465
0.20
8117
0.19
2952
0.15
9508
0.05
2706
0.10
5627
0.05
7825
0.05
0
0.06
7317
0.07
0
0.04
2141
0.05
2041
0.21
6535
0.07
4916
0.04
3969
0.47
3612
0.25
7553
0.44
6311
0.22
5831
0.18
The rows of data go on and on …
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
4
The Data Analysis and Mining Process
1. Data Set input and information mapping setup
2. Data Cleaning
3. Qualitative Analysis
A. Reinterpretation: mapping codes to meanings
B. Descriptive Transformations
C. Descriptive Summarization
4. Quantitative Analysis
A. Planning and designing the needed results and views
B. Using descriptive statistics to explore data properties
C. Using tables and graphics to explore relationships
D. Using Statistical Inference (within Excel™ limitations)
5. Synthesizing Qualitative and Quantitative Results
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
5
1. Data Set input and information mapping setup
• Using IE or Windows Explorer go to this website:
http://users.edinboro.edu/sschuyler
• Scroll down until you see an entry for:
Data Analysis with Microsoft Excel 2007 (Special Session)
• Locate two files:
– MrktDataBase.xls
– VariableDefinitionsDataSet.doc
• Download these two files (left click, “save as”) to a folder
you will be working in today.
• Open the 97-2003 compatible workbook and re-save it as a
2007 workbook (*.xlsx)
• Open the definitions document
• We will begin here
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
6
Class Procedure Note
• At this point the slides indicate what our objectives
are, not step by step instructions for “how to do it.”
• During class the required aspects of using the
Excel™ user interface needed to perform lesson
operations will be pointed out, demonstrated, and
discussed as needed.
• The details of “how to” can be looked up in the
following highly recommended text:
Grauer, Robert T. & Mulbery, Keith & Scheeren, Judy.
(2009). Microsoft® Office Excel 2007
Comprehensive, 2nd Edition. Pearson Prentice Hall,
Saddle River, NJ. 07458.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
7
2. Data Set Cleaning Operations
• Most Data Sets have errors and/or omissions
• Therefore we need to:
– Encode and map variable codes to descriptive
definitions
– Identify missing and erroneous values
– Determine “bad data” and management strategies
• Tools to be used:
– Conditional Formatting
– Find and replace
– Using LOOKUP functions with mapping tables to
view codes descriptively
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
8
Impact of Initial Conditional Formatting
(seeing the missing)
Location Age
3
21
3
25
3
25
3
25
1
25
3
25
3
25
1
27
2
27
3
27
3
29
1
30
2
31
3
31
2
32
3
35
2
36
1
39
3
41
3
42
1
48
3
48
1
50
2
51
3
53
1
53
3
56
3
57
3
81
3
21
2
22
Gender Fam-Size Pres-Add Own-Rent Income educ Employ Cleanliness Hours Prices Service Overall-Imp
1
4
1
1
2
3
1
2.00
5.00
5.00
5.00
8
1
4
1
2
2
1
2.00
3.00
2.00
2.00
10
1
4
1
1
2
5
1
3.00
5.00
5.00
5.00
8
1
3
1
1
2
5
1
3.00
3.00
3.00
1.00
10
1
2
1
2
3
3
1
5.00
5.00
5
1
2
1
2
3
5
1
5.00
5.00
10
1
2
1
2
3
5
1
3.00
4.00
4.00
4.00
5
1
3
1
2
3
5
1
4.00
5.00
5.00
6
1
3
1
1
3
5
1
4.00
5
1
3
1
1
3
3
1
4.00
4.00
4.00
8
1
2
1
1
4
4
1
3.00
4.00
1.00
3.00
5
1
5
1
1
3
2
1
4.00
4.00
4.00
3.00
8
1
4
1
2
3
2
1
2.00
5.00
5.00
5.00
7
1
3
1
1
4
2
1
3.00
5.00
5.00
3.00
5
1
4
1
1
3
8
1
3.00
5.00
5.00
4.00
9
1
5
1
1
3
2
1
7
1
2
1
1
3
2
1
5.00
5.00
1
5
1
2
2
2
2
4.00
5.00
3.00
3.00
5
1
4
1
1
4
9
1
3.00
5.00
5.00
3.00
8
1
6
1
1
2
3
2
3.00
3.00
3.00
1.00
8
1
2
1
1
3
6
1
5.00
3.00
10
1
2
1
2
4
6
1
4.00
10
1
2
1
1
2
1
3.00
5.00
8
1
2
1
1
4
4
1
4.00
4.00
9
1
2
1
1
3
1
1.00
1.00
5
1
2
1
1
2
5
1
4.00
5.00
4.00
4.00
8
1
2
1
1
2
1
1
1.00
2.00
1
1
1
4
4
1
3.00
3.00
3.00
2
1
2
1
1
4
6
2
3.00
3.00
3.00
3.00
2
3
1
2
1
2
2
4.00
5.00
4.00
3.00
10
2
2
1
2
1
4
1
5.00
5.00
4.00
3.00
8
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
CCDebt
p(purchase)
8369
0.57
5831
0.34
5831
0.18
5831
0.18
15468
0.40
15468
0.30
15468
0.30
13230
0.21
13230
0.21
13230
0.29
16525
0.21
10882
0.27
10282
0.25
14629
0.28
9749
0.08
8465
0.20
8117
0.19
2952
0.15
9508
0.05
2706
0.10
5627
0.05
7825
0.05
0
0.06
7317
0.07
0
0.04
2141
0.05
2041
0.21
6535
0.07
4916
0.04
3969
0.47
3612
0.25
9
Transform Raw Data to avoid Errors
• The raw data is not the “Truth” (see the worksheet
named “RawData” or slide 4 above)
• For example: Income is coded 1,2,3 or 4 but represents four categorical range
descriptions
Annual household income:
1=Less than $10,000
Income Code
1
2=$10,000-$24,999
2
3=$25,000-$50,000
3
4=Over $50,000
4
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
Meaning
<$10,000
$10,000$24,999
$25,000$50,000
> $50,000
10
More “Truthful” Raw Data
Gender
Fam-Size
Pres-Add
Own-Rent
Income
educ
male
4
0-1 Years
owns
$10,000-$24,999
male
4
0-1 Years
male
male
4
3
0-1 Years
0-1 Years
owns
owns
$10,000-$24,999
$10,000-$24,999
male
2
0-1 Years
rents
$25,000-$50,000
male
male
male
male
2
2
3
3
0-1 Years
0-1 Years
0-1 Years
0-1 Years
rents
rents
rents
owns
$25,000-$50,000
$25,000-$50,000
$25,000-$50,000
$25,000-$50,000
male
3
0-1 Years
owns
$25,000-$50,000
male
2
0-1 Years
owns
> $50,000
male
5
0-1 Years
owns
$25,000-$50,000
male
4
0-1 Years
rents
$25,000-$50,000
male
3
0-1 Years
owns
> $50,000
male
4
0-1 Years
owns
$25,000-$50,000
male
5
0-1 Years
owns
$25,000-$50,000
male
2
0-1 Years
owns
$25,000-$50,000
male
5
0-1 Years
rents
$10,000-$24,999
male
4
0-1 Years
owns
> $50,000
male
6
0-1 Years
owns
$10,000-$24,999
male
male
2
2
0-1 Years
0-1 Years
owns
rents
$25,000-$50,000
> $50,000
male
2
0-1 Years
owns
unknown
unknown $10,000-$24,999
S.T.Schuyler, D.Sc. 01/08/2010
Technical or trade
school
High school diploma
or equivalent
Bachelor’s degree
Bachelor’s degree
Technical or trade
school
Bachelor’s degree
Bachelor’s degree
Bachelor’s degree
Bachelor’s degree
Technical or trade
school
Some college
High school diploma
or equivalent
High school diploma
or equivalent
High school diploma
or equivalent
Beyond MS
High school diploma
or equivalent
High school diploma
or equivalent
High school diploma
or equivalent
Beyond MS
Technical or trade
school
Graduate degree
Graduate degree
High school diploma
or equivalent
Employ Cleanliness Hours Prices Service
OverallCCDebt p(purchase)
Imp
1
2.00
5.00
5.00
5.00
8
8369
0.57
1
2.00
3.00
2.00
2.00
10
5831
0.34
1
1
3.00
3.00
5.00
3.00
5.00
3.00
5.00
1.00
8
10
5831
5831
0.18
0.18
1
5.00
5.00
5
15468
0.40
1
1
1
1
3.00
4.00
5.00
4.00
5.00
4.00
5.00
4.00
5.00
10
5
6
5
15468
15468
13230
13230
0.30
0.30
0.21
0.21
1
4.00
4.00
4.00
8
13230
0.29
1
3.00
4.00
1.00
3.00
5
16525
0.21
1
4.00
4.00
4.00
3.00
8
10882
0.27
1
2.00
5.00
5.00
5.00
7
10282
0.25
1
3.00
5.00
5.00
3.00
5
14629
0.28
1
3.00
5.00
5.00
4.00
9
9749
0.08
7
8465
0.20
8117
0.19
4.00
1
1
5.00
5.00
2
4.00
5.00
3.00
3.00
5
2952
0.15
1
3.00
5.00
5.00
3.00
8
9508
0.05
2
3.00
3.00
3.00
1.00
8
2706
0.10
1
1
5.00
4.00
3.00
10
10
5627
7825
0.05
0.05
1
3.00
5.00
8
0
0.06
Analyzing Data Using Excel (tm)
11
How do we translate the RawData
• Link to Supplied Variable Definitions Document
• Mapping to Excel Lookup Tables (example) For HLOOKUP(…)
Location
Store identifier (1,2,3)
Age
Customer Age (in years)
Gender
Customer Gender
Family size (number of persons
in household)
How long has the customer
resided at the present address:
1=0-1 years
Pres-Add 2=2-5 years
3=6-10 years
4=11-20 years
5=more than 20 years
1
Oil City
2
Meadville
1
male
2
female
Fam-Size
Own-Rent Residence ownership status
S.T.Schuyler, D.Sc. 01/08/2010
3
Erie
For VLOOKUP(…)
Year Code
Meaning
1
0-1 Years
2
2-5 years
3
6-10 years
4
11-20 years
5
>20 years
1
2
Analyzing Data Using Excel (tm)
owns
rents
12
Using VLOOKUP and HLOOKUP to tell the Truth!
• To Produce the Real Data
– Copy the relevant text from the document into a
blank sheet and build translation tables
•
•
•
•
Restructure to form translation tables
Put codes in the first column (vertical lookup)
or first row (horizontal lookup)
Put translations for each code in 2nd column (vertical) or
second row (horizontal)
Horizontal Code Translation Table
Store identifier (1,2,3)
1
Oil City
2
Meadville
3
Erie
Vertical Translate Table
Rent Code Meaning
1
owns
2
rents
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
13
Using VLOOKUP and HLOOKUP to tell the Truth!
– Make a copy of the encoded Data sheet
– In the second copy we replace the codes in the
cells with formulas that locate the text description
for the code represented.
– For each cell in a coded category:
• We use “IF” to bypass blank cells or cells that have
codes that mean there is “no data” or it is missing.
• We use HLOOKUP or VLOOKUP to “lookup the code,
corresponding to the original data sheet, in the
appropriate translate table for the cell’s category.”
• The lookup function returns the text representing the
code (as illustrated on slide 11 above).
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
14
HLOOKUP and LOOKUP Basics
• Syntax:
HLOOKUP(<value to lookup>,<lookup table>,<row #>)
– <value to lookup>: usually a relative cell reference
– <lookup table>: a horizontally defined table with two
or more rows, where the first row contains a set of
values to match on; these must be sorted in ascending
alphanumeric order; the value to lookup is compared
to the values in the first row, if a match is found, the
relative column number it is found in is noted.
– <row #>: a number relative to the tables first row, to
select a return value from using the relative column
number where the match was found.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
15
VLOOKUP and LOOKUP Basics
• Syntax:
VLOOKUP(<value to lookup>,<lookup table>,<column #>)
– <value to lookup>: usually a relative cell reference
– <lookup table>: a vertically defined table with two or
more columns, where the first column contains a set
of values to match on; these must be sorted in
ascending alphanumeric order; the value to lookup is
compared to the values in the first column, if a match
is found, the relative row number it is found in is
noted.
– <column #>: a number relative to the tables first
column, to select a return value from using the
relative row number where the match was found.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
16
Avoiding or Detecting Function Failures
• Special Conditional functions we will employ with
“IF” statements
– ISBLANK (…)
– ISERR (…)
– ISNUMBER(…)
– ISNA (…)
– AND (…)
– OR (…)
– NOT (…)
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
17
Consulting Discussion
• Before proceeding with a pre-planned program I want
to engage a discussion:
– What is the nature of the data you typically work
with?
Discuss how it differs from the course model data set?
– What are you typically trying to learn from the data
you get?
• What is the nature of the outputs you are producing?
• Why do you think you can get more leverage out of Excel™
since you already use it?
– What do your stakeholders want that is different from
what you already produce?
– Do you already know what you need help learning to
do with Excel™?
Lets itemize these needs.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
18
3. Qualitative Analysis
• Descriptive Transformations (using Tables)
– Converting either or both the “RawData” or
“DescriptiveData” sheets to data tables
– Using Filters to explore data tables
• Using IF conditionals and formulas
– To manage anomalies
– To produced derived variables
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
19
Filtering out the “unknown” and “blank”
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
20
Part of the Filtered View
Pres-Add
Own-Rent
Income
0-1 Years
owns
$10,000-$24,999
0-1 Years
0-1 Years
0-1 Years
0-1 Years
owns
owns
rents
owns
$10,000-$24,999
$10,000-$24,999
$25,000-$50,000
> $50,000
0-1 Years
owns
$25,000-$50,000
0-1 Years
rents
$25,000-$50,000
0-1 Years
owns
> $50,000
0-1 Years
owns
$25,000-$50,000
0-1 Years
rents
$10,000-$24,999
0-1 Years
owns
> $50,000
0-1 Years
owns
$10,000-$24,999
0-1 Years
owns
$10,000-$24,999
0-1 Years
rents
<$10,000
0-1 Years
rents
<$10,000
0-1 Years
rents
$10,000-$24,999
0-1 Years
owns
$10,000-$24,999
0-1 Years
owns
$25,000-$50,000
0-1 Years
owns
$25,000-$50,000
0-1 Years
rents
<$10,000
Erie
86
female
1
Total
699
699
3
S.T.Schuyler, D.Sc. 01/08/2010
>20 years
educ
Technical or trade
school
Bachelor’s degree
Bachelor’s degree
Bachelor’s degree
Some college
High school diploma
or equivalent
High school diploma
or equivalent
High school diploma
or equivalent
Beyond MS
High school diploma
or equivalent
Beyond MS
Technical or trade
school
Bachelor’s degree
High school diploma
or equivalent
Some college
Technical or trade
school
Bachelor’s degree
Technical or trade
school
Some college
High school diploma
or equivalent
owns
<$10,000
Employ
Cleanliness
Hours
Prices
1
2.00
5.00
5.00
1
1
1
1
3.00
3.00
3.00
3.00
5.00
3.00
4.00
4.00
5.00
3.00
4.00
1.00
1
4.00
4.00
4.00
1
2.00
5.00
5.00
1
3.00
5.00
5.00
1
3.00
5.00
5.00
2
4.00
5.00
3.00
1
3.00
5.00
5.00
2
3.00
3.00
3.00
1
4.00
5.00
4.00
2
4.00
5.00
4.00
1
5.00
5.00
4.00
1
2.00
4.00
3.00
1
3.00
3.00
3.00
1
3.00
5.00
3.00
1
3.00
4.00
2.00
2
4.00
4.00
4.00
Less than high
school diploma
2
3.00
4.00
3.00
699
Analyzing Data Using Excel (tm)
21
4. Quantitative Analysis
•
Planning and designing the required results and
views
– What questions are you trying to answer?
– What views do your stakeholders need?
• Descriptive Summarization (on Data Tables)
– Built-in summary and statistical functions
– Built-in conditional statistical functions
• Exploring relationships using charting tools
• Descriptive Summarization (using Pivot Tables)
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
22
Exploring Graphically: Hunches, Ideas
• The next few slides depict plots from a much
reduced data set (all unknowns and blanks
removed)
– Histogram of Education levels of participants
– Scatter plot of Income vs. CC Debt
– Scatter plot of Education level vs. CC Debt
– Scatter plot of Age vs. CC Debt.
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
23
Histogram from Data Analysis tools
Education Levels in reduced Sample
Bin
1
2
3
4
5
6
7
Histogram
250
200
Frequency
More
Frequency
33
238
76
116
80
61
56
0
150
100
50
0
1
2
3
4
5
6
7
More
Bin
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
24
Scatter Plot of Income vs. Credit Card Debt.
Income and Credit Card debt (CCDebt)
$25,000
$20,000
CCDebt
$15,000
$10,000
$5,000
$$-
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
Income
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
25
More Exploring – Education and CCDebt
$25,000
$20,000
CCDebt
$15,000
$10,000
$5,000
$0
1
2
3
4
5
6
7
8
Education Level Reported
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
26
Exploring Age and CCDebt
Age vs. CCDebt
$25,000
$20,000
$15,000
$10,000
$5,000
$-
0
10
20
30
40
50
60
70
80
90
100
Age in Years
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
27
Pivot Table Notes
• To Produce:
– Identify patterns in data
– Alternate views of data
– Summarize data within categories
– Reorganize (“pivot”) data summaries
– Expand or collapse views
– Queries over the categories
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
28
Pivot Table Notes
• Input requirements
– Raw data table must have column headings
– At least one column must have duplicate text
values
• e.g. cities, states, products, departments
• These become the categories in the pivot tables
– At least one column must have numeric values
– No blank rows
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
29
Pivot Table Notes
• Problem Solving Forethoughts:
– You need to have the target design of the pivot
table you want sketched out!!!
• What are the headings of the target pivot table?
• What are the row headings?
• What numeric summaries do you want (sums, averages,
max, min, etc.).
– You need to anticipate data transformations you
will need.
• Such as: Likert scales or numeric codes that really
represent categorical descriptive information
• e.g. when 1, 2, 3 correspond to income levels “under
$10K”, “$10K to $30K”, and “greater than $30K”
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
30
Creating and Manipulating Pivot Tables
• Two approaches
– Quick trial with the raw data
– Pivot tables from translated data
• Operations to cover
– Data selection
– Selecting fields
– Selecting Areas (values, rows, columns, filters)
– Selecting variable summary calculation functions
– Sorting and ordering fields and values in tables
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
31
Simple Pivot Table Example
Location by Income
Erie
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Meadville
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Oil City
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Grand Total
S.T.Schuyler, D.Sc. 01/08/2010
Average of Average of
CCDebt
p(purchase)
$
5,108
11.2%
$
5.5%
$
1,140
12.8%
$
2,142
11.5%
$
6,068
11.6%
$
8,729
11.2%
$
5,247
9.7%
$
3.9%
$
1,181
12.6%
$
2,044
9.4%
$
6,189
10.7%
$
8,417
9.4%
$
4,798
10.1%
$
4.8%
$
1,082
11.1%
$
2,104
10.6%
$
6,034
11.1%
$
8,166
10.0%
$
5,061
10.6%
Analyzing Data Using Excel (tm)
32
A Second Pivot Table to Examine Probability of
Purchase: is it related to “Overall Impression”?
Comparing Overall Impression with Probability of Purchase (using codes)
Average of p(purchase)
Location by Income
Erie
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Meadville
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Oil City
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Grand Total
Column Labels
S.T.Schuyler, D.Sc. 01/08/2010
1
2
10.5% 10.7%
5.2%
13.7%
13.3% 11.6%
8.6% 11.5%
8.4% 7.5%
11.5% 8.8%
2.1%
10.0%
14.1%
5.8%
7.8%
2.1%
13.1%
8.5%
6.9%
5.1%
9.8%
3
4
9.3% 10.7%
6.3%
16.1%
9.0% 10.6%
9.5% 8.5%
9.9% 11.1%
6.8% 7.3%
12.4% 7.3% 7.6%
15.2%
8.2%
5.4% 5.0% 5.7%
10.5% 10.3% 12.1%
6.3% 6.3% 6.3%
9.6%
7.5% 10.4% 9.6%
12.8% 9.7% 15.4%
9.4% 12.2% 7.3%
10.3% 9.4% 10.6%
5
10.2%
5.4%
11.0%
10.5%
11.4%
9.5%
9.4%
5.0%
13.8%
9.8%
9.7%
8.8%
11.2%
5.0%
14.5%
9.9%
12.2%
12.0%
10.4%
6
7
8
9
10
11.6% 11.4% 12.5% 9.4% 11.5%
3.1% 2.5% 5.7% 4.2% 5.5%
14.1% 7.6% 11.7% 11.3% 14.5%
9.6% 14.6% 10.6% 10.9% 12.3%
14.8% 9.8% 10.6% 9.8% 12.2%
9.4% 10.8% 17.3% 8.4% 10.0%
8.7% 9.5% 10.5% 8.9% 10.1%
3.4% 6.3% 2.3% 4.4%
8.5% 9.8% 25.1%
11.8%
5.6% 9.9% 10.3% 9.7% 10.3%
10.1% 11.2% 11.0% 9.2% 10.8%
9.5% 9.0% 9.7% 9.2% 10.6%
8.6% 9.8% 10.2% 11.8% 9.5%
2.5% 5.3% 5.2% 7.3% 3.9%
9.8% 3.5% 12.7% 14.6% 8.9%
10.8% 13.6% 11.7% 15.9% 9.9%
10.9% 8.8% 10.6% 11.3% 11.0%
5.1% 12.5% 9.5% 10.7% 9.0%
10.1% 10.5% 11.4% 9.5% 10.7%
Analyzing Data Using Excel (tm)
12.0%
6.2%
12.8%
11.8%
17.2%
8.5%
9.7%
3.0%
12.8%
6.4%
13.9%
9.9%
8.5%
4.7%
9.7%
9.9%
9.8%
6.1%
11.0%
Grand Total
11.2%
5.5%
12.8%
11.5%
11.6%
11.2%
9.7%
3.9%
12.6%
9.4%
10.7%
9.4%
10.1%
4.8%
11.1%
10.6%
11.1%
10.0%
10.6%
33
Same comparison using Descriptives from VLOOKUP
Store Impression compared with Income on p(purchase)
Average of p(purchase)
very very very
poor
poor
Column Labels
Location by Income
Erie
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Meadville
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Oil City
unknown
<$10,000
$10,000-$24,999
$25,000-$50,000
> $50,000
Grand Total
S.T.Schuyler, D.Sc. 01/08/2010
very
very
very
slightly neutral neutral slightly
plus positive positive positive positive
minus
poor
poor
3
2
1
9.4% 10.7% 10.0%
6.3%
5.2%
12.9%
12.3% 12.1% 9.0%
8.1% 10.6% 10.9%
8.0% 12.2%
11.0% 14.1% 7.5%
2.1%
24.8% 9.9%
12.7% 15.2%
5.0%
5.8%
7.8% 10.8% 9.5%
2.1% 6.3% 6.3%
9.6%
13.1%
10.1%
7.9%
13.1% 7.4%
5.8% 8.1% 11.4%
8.9% 11.2% 9.6%
4
8.7%
8.7%
8.9%
6.9%
10.9%
6.6%
8.8%
4.4%
4.5%
8.2%
6.3%
6.9%
9.6%
7.4%
8.2%
5
10.0%
5.4%
11.0%
11.5%
10.3%
9.7%
9.9%
5.0%
13.8%
9.9%
10.6%
9.5%
11.2%
5.0%
19.4%
9.7%
12.3%
13.2%
10.4%
8.5%
5.8%
10.9%
7.1%
9.7%
2.5%
7
11.0%
2.5%
7.6%
15.6%
9.7%
9.0%
8.1%
3.4%
9.8%
9.2%
10.1%
6.9%
9.5%
5.3%
12.6%
13.9%
6.1%
10.5%
12.6%
8.4%
13.2%
9.8%
6
12.4%
3.1%
16.1%
8.4%
17.5%
9.9%
8.2%
Analyzing Data Using Excel (tm)
8
13.7%
5.7%
12.1%
9.4%
10.7%
22.7%
9.3%
6.3%
9
9.3%
4.2%
11.1%
11.4%
9.0%
8.7%
8.6%
2.3%
8.9%
10.3%
8.6%
9.9%
5.2%
14.0%
11.7%
10.5%
9.9%
11.6%
9.6%
9.8%
7.1%
11.3%
7.3%
15.9%
7.9%
11.6%
9.2%
10
11.1%
5.5%
13.3%
12.7%
11.5%
10.3%
10.0%
4.4%
11.7%
11.7%
10.4%
11.3%
9.7%
3.9%
11.2%
9.7%
11.6%
9.0%
10.5%
Grand
Total
11.1%
5.2%
12.2%
11.5%
10.8%
12.5%
9.3%
4.0%
11.6%
10.0%
10.4%
8.6%
10.0%
4.8%
12.5%
10.7%
11.1%
10.5%
10.4%
34
Data Mining and Statistical Inference
Problems using inferential statistics with survey data!
• Direct Approach: What question is being
addressed?
– Assumes you have a hunch or hypothesis
– Does one or more variables (the independents)
predict another (the dependent)?
– Using Regression Analysis
• Indirect Approach: Looking for covariates!
– Your just fishing!
– You might just catch a “bottom fish!”
• Limitations using Excel™ with large data sets
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
35
Correlation: Exploring Age, Income, Education and
Credit Card Debt
Input:
Age
Income
Education
CCDebt
Function: Excel™ Data Analysis – Correlation
Output:
Correlation
Age
Income
educ
CCDebt
S.T.Schuyler, D.Sc. 01/08/2010
Age
1
-0.33
-0.16
-0.71
Income
educ
CCDebt
1
0.27
0.75
1
0.20
1
Analyzing Data Using Excel (tm)
36
A Regression Example: Income and CCDebt
Input:
Income (independent variable)
CCDebt (Dependent variable)
Function: Excel™ Data Analysis – Regression (Excel™ is limited to single regression)
Output:
Regression Statistics
Multiple R
0.75033731
R Square
0.563006078
Adjusted R Square
0.562341954
Standard Error
2306.625707
Observations
660
ANOVA
df
Regression
Residual
Total
S.T.Schuyler, D.Sc. 01/08/2010
SS
MS
F
Significance F
1 4.51E+09 4.51E+09 847.7418
2.1543E-120
658 3.5E+09 5320522
659 8.01E+09
Analyzing Data Using Excel (tm)
37
$10,000
$10,000
$10,000
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$17,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$37,500
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
$50,000
Y
Plot of Actual vs. Predicted CCDebt using Income
X Variable 1 Line Fit Plot
$25,000
$20,000
$15,000
$10,000
$5,000
$-
Income
Y
S.T.Schuyler, D.Sc. 01/08/2010
Predicted Y
Analyzing Data Using Excel (tm)
38
5. Synthesizing Qualitative and Quantitative Results
(Time dependent)
• Methods applicable to presenting tables, charts and
graphs into MS Word™ documents
– Paste special
– Paste Link
• Course Wrap-up
S.T.Schuyler, D.Sc. 01/08/2010
Analyzing Data Using Excel (tm)
39