Excel has a function which allows data frequency tables to be

advertisement
STATISTICAL ANALYSIS USING EXCEL
By Dr. Wisuttorn Jitaree
Faculty of Business Administration
Chiang Mai University
Thailand
1
1.
Understanding statistical techniques
Before commencing the discussion of inferential statistics it is necessary to introduce learners
to a few other concepts and the first issue we address is the shape of the data through data frequency
tables.
2.
Data Frequency Tables
In this example we have obtained 30 completed and usable questionnaires from students
from the School of Business and another 30 completed and usable questionnaires from students from
the School of Accounting. We have 60 data points in all.
The 30 respondents from the School of Business supplied the following number of months working
experience:
Table 2.1: School of Business, number of months working experience
23
54
28
55
29
56
34
56
34
65
39
65
43
65
44
67
45
73
45
76
48
76
48
77
49
78
54
87
54
92
28
56
29
65
32
67
33
76
Respondents from the School of Accounting replied with the following data:
Table 2.2: School of Accounting, number of months working experience
10
12
12
16
19
20
22
23
23
23
26
34
34
41
43
43
44
45
45
54
56
56
Excel has a function which allows data frequency tables to be constructed
which is called =frequency().
Next, the required data distribution in intervals has been entered into the range I6 through I11
Data
distribution
25
26
48
60
70
>72
Excel has a function which allows data frequency tables to be constructed which is called
=frequency().
=frequency(C3:C32,I6:I11)
F2 + [CTRL + Shift + Enter]
2
Table 2.3: School of Business Frequency Table of the number of students
and the number of months working experience.
Months
Experience
No. of
Students
Under 25
1
26-36
4
37-48
7
49-60
7
61-72
4
Above 72
7
Total
30
Table 2.4: School of Business Frequency Table with relative frequency
Months
Experience
No. of Students
Relative
frequency
Under 25
1
3.33
26-36
4
13.3
37-48
7
23.3
49-60
7
23.3
61-72
4
13.3
Above 72
7
23.3
30
100
Total
Table 2.5: School of Business Frequency Table with relative frequency and cumulative relative
frequency
Cumulative
Months
No. of Students
Relative
relative
Experience
frequency frequency
Under 25
1
3.33
3.33
26-36
4
13.3
16.67
37-48
7
23.3
40.00
49-60
7
23.3
63.33
61-72
4
13.3
76.67
Above 72
7
23.3
100.00
30
100
Total
These frequency tables may be plotted as histograms. Figure 2.1 shows the results for the School of
Business and. Figure 2.2 shows the results for the School of Accounting.
3
No. of Students
8
6
4
2
0
Under 25 26-36
37-48
49-60
61-72
Above 72
Figure 2.1: A histogram used to examine the shape of the data frequency table for the School of
Business
No. of Students
12
10
8
6
4
2
0
Under 25 26-36
37-48
49-60
61-72 Above 72
Figure 2.2: A histogram used to examine the shape of the data frequency table for the School of
Accounting
These two data sets can be plotted on the same axis using a line graph shown in Figure 2.3
Number of students with differing preiods
of work experience from School of
Business and School of Accounting
15
10
5
0
Under 25
26-36
37-48
SOB
49-60
61-72
Above 72
SOA
Figure 2.3: The School of Business and the School of Accounting data as traces on one graph
3.
Descriptive statistics
The Descriptive Statistics analysis tool generates a report of univariate statistics for data in the
input range, providing information about the central tendency and variability of your data.
4
The results is shown in figure 3.1
Column1
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
Figure 3.1: The descriptive statistics of sample data
55.3
3.256170094
54
54
17.83477811
318.0793103
-0.61398127
0.1850153
69
23
92
1659
30
6.659615595
4.
Covariance
The Correlation and Covariance tools can both be used in the same setting, when you have N
different measurement variables observed on a set of individuals. The Correlation and Covariance
tools each give an output table, a matrix that shows the correlation coefficient or covariance,
respectively, between each pair of measurement variables. The difference is that correlation
coefficients are scaled to lie between -1 and +1 inclusive. Corresponding covariances are not scaled.
Both the correlation coefficient and the covariance are measures of the extent to which two variables
"vary together."
5
The results is shown in figure 4.1
Column 1
Column 1
Column 2
307.4767
299.8456
5.
Anova
The Anova analysis tools provide different types of variance analysis. The tool that you should
use depends on the number of factors and the number of samples that you have from the
populations that you want to test
Anova: Single Factor
This tool performs a simple analysis of variance on data for two or more samples. The
analysis provides a test of the hypothesis that each sample is drawn from the same
underlying probability distribution against the alternative hypothesis that underlying probability
distributions are not the same for all samples. If there are only two samples, you can use the
worksheet function T.TEST. With more than two samples, there is no convenient
generalization of T.TEST, and the Single Factor Anova model can be called upon instead.
6
The results is shown in figure 5.1
Anova: Single Factor
SUMMARY
Groups
Count
Sum
Average
Variance
Column 1
30
1659
55.3
318.0793
Column 2
30
1087
36.23333
310.1851
MS
F
P-value
F crit
17.35915
0.000104
4.006873
ANOVA
Source of Variation
SS
df
Between Groups
5453.067
1
5453.067
Within Groups
18219.67
58
314.1322
Total
23672.73
59
6.
Correlation
The correlation coefficient, like the covariance, is a measure of the extent to which two
measurement variables "vary together." Unlike the covariance, the correlation coefficient is
scaled so that its value is independent of the units in which the two measurement variables
are expressed. (For example, if the two measurement variables are weight and height, the
value of the correlation coefficient is unchanged if weight is converted from pounds to
kilograms.) The value of any correlation coefficient must be between -1 and +1 inclusive.
The results is shown in figure 6.1
Column 1
Column 2
Column 1
1
Column 2
0.986474
7.
1
F-Test Two-Sample for Variances
The F-Test Two-Sample for Variances analysis tool performs a two-sample F-test to compare
two population variances.
For example, you can use the F-Test tool on samples of times in a swim meet for each of two
teams. The tool provides the result of a test of the null hypothesis that these two samples
come from distributions with equal variances, against the alternative that the variances are
not equal in the underlying distributions.
The tool calculates the value f of an F-statistic (or F-ratio). A value of f close to 1 provides
evidence that the underlying population variances are equal. In the output table, if f < 1 "P(F
<= f) one-tail" gives the probability of observing a value of the F-statistic less than f when
population variances are equal, and "F Critical one-tail" gives the critical value less than 1 for
7
the chosen significance level, Alpha. If f > 1, "P(F <= f) one-tail" gives the probability of
observing a value of the F-statistic greater than f when population variances are equal, and "F
Critical one-tail" gives the critical value greater than 1 for Alpha.
The results is shown in figure 7.1
F-Test Two-Sample for Variances
Variable 1
Mean
Variable 2
55.3
36.23333333
318.0793103
310.1850575
Observations
30
30
df
29
29
F
1.025450139
P(F<=f) one-tail
0.473256022
F Critical one-tail
1.860811435
Variance
8.
Histogram
The Histogram analysis tool calculates individual and cumulative frequencies for a cell range
of data and data bins. This tool generates data for the number of occurrences of a value in a
data set.
The results is shown in figure 8.1
8
Histogram
Frequency
1.5
1
Frequency
0.5
0
Bin
Bin
Frequency
28
1
40.8
4
53.6
7
66.4
9
79.2
6
More
9.
2
Regression
The Regression analysis tool performs linear regression analysis by using the "least squares"
method to fit a line through a set of observations. You can analyze how a single dependent
variable is affected by the values of one or more independent variables. For example, you can
analyze how an athlete's performance is affected by such factors as age, height, and weight.
You can apportion shares in the performance measure to each of these three factors, based
on a set of performance data, and then use the results to predict the performance of a new,
untested athlete.
The results is shown in figure 9.1
SUMMARY
OUTPUT
Regression Statistics
0.986988
Multiple R
827
0.974146
R Square
945
Adjusted R
0.972231
Square
903
Standard
1.466979
Error
035
Observations
30
9
ANOVA
df
Residual
27
SS
2189.39
5
58.1047
4
Total
29
2247.5
Coefficien
ts
8.558426
59
0.329390
252
0.161264
37
Standar
d Error
Regression
Intercept
X Variable 1
X Variable 2
2
1.88544
2
0.09318
0.09435
8
Significa
nce F
MS
1094.6
98
2.1520
27
F
508.68
2
3.71E-22
t Stat
P-value
Lower
95%
4.5392
2
3.5349
82
1.7090
61
0.0001
05
0.0014
93
0.0989
12
Upper
95%
4.6898
2
0.5205
8
0.3548
72
-12.427
0.1382
-0.03234
Lower
95.0%
Upper
95.0%
-12.427
4.68982
0.1382
0.03234
0.52058
0.35487
2
10.
t-Test
The Two-Sample t-Test analysis tools test for equality of the population means that underlie
each sample. The three tools employ different assumptions: that the population variances are
equal, that the population variances are not equal, and that the two samples represent beforetreatment and after-treatment observations on the same subjects.
The results is shown in figure 10.1
t-Test: Paired Two Sample for Means
Variable 1
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
Variable 2
55.3
36.23333333
318.0793103
310.1850575
30
30
0.98647353
0
29
35.7211422
P(T<=t) one-tail
7.92407E-26
t Critical one-tail
1.699127027
P(T<=t) two-tail
1.58481E-25
t Critical two-tail
2.045229642
10
11.
z-Test
The z-Test: Two Sample for Means analysis tool performs a two sample z-Test for means
with known variances. This tool is used to test the null hypothesis that there is no difference
between two population means against either one-sided or two-sided alternative hypotheses.
If variances are not known, the worksheet function Z.TEST should be used instead.
The results is shown in figure 11.1
11
Pivot Tables
Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the
significance from a large, detailed data set. Our data set consists of 214 rows and 6 fields. Order ID,
Product, Category, Amount, Date and Country.
Insert a Pivot Table
To insert a pivot table, execute the following steps.
1. Click any single cell inside the data set.
2. On the Insert tab, click PivotTable.
Drag fields
The PivotTable field list appears. To get the total amount exported of each product, drag the following
fields to the different areas.
1. Product Field to the Row Labels area.
2. Amount Field to the Values area.
3. Country Field to the Report Filter area.
12
Below you can find the pivot table. Bananas are our main export product. That's how easy pivot tables
can be!
Sort
To get Banana at the top of the list, sort the pivot table.
1. Click any cell inside the Total column.
2. The PivotTable Tools contextual tab activates. On the Options tab, click the Sort Largest to
Smallest button (ZA).
13
Result.
Filter
Because we added the Country field to the Report Filter area, we can filter this pivot table by Country.
For example, which products do we export the most to France?
1. Click the filter drop-down and select France.
Result. Apples are our main export product to France.
14
Note: you can use the standard filter (triangle next to Product) to only show the totals of specific
products.
Change Summary Calculation
By default, Excel summarizes your data by either summing or counting the items. To change the type
of calculation that you want to use, execute the following steps.
1. Click any cell inside the Total column.
2. Right click and click on Value Field Settings...
3. Choose the type of calculation you want to use. For example, click Count.
15
4. Click OK.
Result. 16 out of the 28 orders to France were 'Apple' orders.
Two-dimensional Pivot Table
If you drag a field to the Row Labels area and Column Labels area, you can create a two-dimensional
pivot table. For example, to get the total amount exported to each country, of each product, drag the
following fields to the different areas.
1. Country Field to the Row Labels area.
2. Product Field to the Column Labels area.
3. Amount Field to the Values area.
4. Category Field to the Report Filter area.
16
Below you can find the two-dimensional pivot table.
To easily compare these numbers, create a pivot chart and apply a filter. Maybe this is one step too
far for you at this stage, but it shows you one of the many other powerful pivot table features Excel
has to offer.
17
18
Pivot Chart
A pivot chart is the visual representation of a pivot table in Excel. Pivot charts and pivot tables are
connected with each other.
Below you can find a two-dimensional pivot table. Go back to Pivot Tables to learn how to create this
pivot table.
Insert Pivot Chart
To insert a pivot chart, simply insert a chart.
1. Click any cell inside the pivot table.
2. On the Insert tab, click Column and select one of the subtypes. For example, Clustered Column.
19
Below you can find the pivot chart. This pivot chart will amaze and impress your boss.
20
Note: any changes you make to the pivot chart are immediately reflected in the pivot table and vice
versa.
Filter Pivot Chart
To filter this pivot chart, execute the following steps.
1a. Use the standard filters (triangles next to Product and Country). For example, use the Country
filter to only show the total amount of each product exported to the United States.
21
1b. Because we added the Category field to the Report Filter area, we can filter this pivot chart (and
pivot table) by Category. For example, use the Category filter to only show the vegetables exported to
each country.
22
Change Pivot Chart Type
You can change to a different type of pivot chart at any time.
1. Select the chart.
2. The PivotChart tools contextual tab activates. On the Design tab, click Change Chart Type.
3. Choose Pie.
23
4. Click OK.
24
Note: pie charts always use one data series (in this case, Apple). To get a pivot chart of a country,
swap the data over the axis. Select the chart. The PivotChart tools contextual tab activates. On the
Design tab, click Switch Row/Column.
25
SORT
You can sort your Excel data on one column or multiple columns. You can sort in ascending or
descending order.
One Column
To sort on one column, execute the following steps.
1. Click any cell in the column you want to sort.
2. To sort in ascending order, on the Data tab, click AZ.
Result:
26
Note: to sort in descending order, click ZA.
Multiple Columns
To sort on multiple columns, execute the following steps.
1. On the Data tab, click Sort.
The Sort dialog box appears.
2. Select Last Name from the 'Sort by' drop-down list.
27
3. Click on Add Level.
4. Select Sales from the 'Then by' drop-down list.
5. Click OK.
Result. Records are sorted by Last Name first and Sales second.
28
Conditional formatting in Excel enables you to highlight cells with a certain color, depending on the
cell's value.
Highlight Cells Rules
To highlight cells that are greater than a value, execute the following steps.
1. Select the range A1:A10.
2. On the Home tab, click Conditional Formatting, Highlight Cells Rules, Greater Than...
29
3. Enter the value 80 and select a formatting style.
4. Click OK.
Result. Excel highlights the cells that are greater than 80.
30
5. Change the value of cell A1 to 81.
Result. Excel changes the format of cell A1 automatically.
Note: you can also highlight cells that are less than a value, between a low and high value, etc.
Clear Rules
To clear a conditional formatting rule, execute the following steps.
1. Select the range A1:A10.
31
2. On the Home tab, click Conditional Formatting, Clear Rules, Clear Rules from Selected Cells.
Top/Bottom Rules
To highlight cells that are above the average of the cells, execute the following steps.
1. Select the range A1:A10.
32
2. On the Home tab, click Conditional Formatting, Top/Bottom Rules, Above Average...
3. Select a formatting style.
33
4. Click OK.
Result. Excel calculates the average (42.5) and formats the cells that are above this average.
Note: you can also highlight the top 10 items, the top 10 %, etc. The sky is the limit!
VLookup Function
Learn all about Excel's lookup & reference functions such as the VLOOKUP, HLOOKUP, MATCH,
and CHOOSE function.
VLookup
The VLOOKUP (Vertical lookup) function looks for a value in the leftmost column of a table, and then
returns a value in the same row from another column you specify.
1. Insert the VLOOKUP function shown below.
Explanation: the VLOOKUP function looks for the ID (104) in the leftmost column of the range
$E$4:$G$7 and returns the value in the same row from the third column (third argument is set to 3).
The fourth argument is set to FALSE to return an exact match or a #N/A error if not found.
34
2. Drag the VLOOKUP function in cell B2 down to cell B11.
Note: when we drag the VLOOKUP function down, the absolute reference ($E$4:$G$7) stays the
same, while the relative reference (A2) changes to A3, A4, A5, etc.
HLookup
In a similar way, you can use the HLOOKUP (Horizontal lookup) function.
Match
The MATCH function returns the position of a value in a given range.
35
Note: Yellow found at position 3 in the range E4:E7. The third argument is optional. Set this argument
to 0 to return the position of the value that is exactly equal to lookup_value (A2) or a #N/A error if not
found.
Note: 97 found at position 3 in the range E4:E7.
Choose
The CHOOSE function returns a value from a list of values, based on a position number.
Note: Boat found at position 3.
36
Data Validation
Use data validation in Excel to make sure that users enter certain values into a cell.
Data Validation Example
In this example, we restrict users to enter a whole number between 0 and 10.
Create Data Validation Rule
To create the data validation rule, execute the following steps.
1. Select cell C2.
2. On the Data tab, click Data Validation.
On the Settings tab:
3. In the Allow list, click Whole number.
4. In the Data list, click between.
5. Enter the Minimum and Maximum values.
Input Message
Input messages appear when the user selects the cell and tell the user what to enter.
On the Input Message tab:
1. Check 'Show input message when cell is selected'.
37
2. Enter a title.
3. Enter an input message.
Error Alert
If users ignore the input message and enter a number that is not valid, you can show them an error
alert.
On the Error Alert tab:
1. Check 'Show error alert after invalid data is entered'.
2. Enter a title.
3. Enter an error message.
4. Click OK.
Data Validation Result
1. Select cell C2.
38
2. Try to enter a number higher than 10.
Result:
Note: to remove data validation from a cell, select the cell, on the Data tab, click Data Validation, and
then click Clear All.
39
Logical Function
Learn how to use Excel's logical functions such as the IF, AND and OR function.
If Function
The IF function checks whether a condition is met, and returns one value if TRUE and another value if
FALSE.
1. Select cell C2 and enter the following function.
The IF function returns Correct because the value in cell A1 is higher than 10.
And Function
The AND Function returns TRUE if all conditions are true and returns FALSE if any of the conditions
are false.
1. Select cell D2 and enter the following formula.
The AND function returns FALSE because the value in cell B2 is not higher than 5. As a result the IF
function returns Incorrect.
Or Function
The OR function returns TRUE if any of the conditions are TRUE and returns FALSE if all conditions
are false.
1. Select cell E2 and enter the following formula.
The OR function returns TRUE because the value in cell A1 is higher than 10. As a result the IF
function returns Correct.
General note: the AND and OR function can check up to 255 conditions.
40
Download