Uploaded by ramosshana77

SASA Statistical Analysis Lesson

advertisement
1
SASA211: Statistical Analysis
LESSON 1: Evaluating Data in the Real world
CONCEPT STATISTICS
 The interpretation and collection of data
 It deals to measure and analyze variability
Those variability's can be: Height, Weight, Food preferences, Hair color.
There are TWO kind of Statistics, and these are:
1)
2)
Inferential Statistics
 It is concerned with inferring or drawing conclusions about the population based on samples.
Descriptive Statistics
 Aims to describe a sample, rather than use the data to learn about the population.
TOOLS





Inferential Statistics
Hypothesis testing
Correlation
Regression Analysis
Examples:
A manager of Jollibee selected 20 samples of large 
fries to check if correct amount of fries is served.

A researcher wants to know if there is a relation 
between the studying time and the performance of
a student in Math.



Descriptive Statistics
Tables and Graphs
Measures of Central Tendency
Measures of Variability
Examples:
Average rainfall in the Philippines last year
Number of car thefts in Pampanga in 2017
Percentage of Males in our class
SAMPLE AND POPULATION:
1)
2)
Population
 is defined as a group of people, animals, places, things, or ideas.
Sample
 a subgroup of the population.
What is a variable?
It is a characteristic or attribute that can assume different values.
 All experiments examine some kind of variable(s).
 A variable is not only something that we MEASURE, but also something that we can
MANIPULATE and something we can CONTROL for.
2
TYPE OF VARIABLES:
1)
2)
Quantitative variable (Numerical)
 are quantities that can be counted with your bare hands
 can be measured with the use of some measuring devices or
 can be calculated with the use of mathematical formula.
Qualitative variable (Categorical)
 are non-measurable characteristics that cannot assume a numerical value
 but can be classified into two or more categories.
PROBABILITY
 When statisticians make decisions, they express their confidence about those decisions in terms
of probability.
 They can never be certain about what they decide.
 They can only tell you how probable their conclusions are.
ACTIVITIES:
Population
 Is the collection of all elements of interest in a particular study.
Parameter
 Is a characteristic of a population.
Sample
 Is a subset of the poplation from which information is collected.
Statistic
 Is a characteristic of a sample.
Qualitative Variable
 A variable that may be classified into categories. Examples: hair color, religion, political party,
profession.
Quantitative Variable
 Variables whose values result from counting or measuring something. Examples: height, weight,
time in the 100 yard dash, number of items sold to a shopper.
Discrete Variable
 Can take either a finite or a countable number of values like number of people in front of you at
the bank.
Continuous Variable
 Can take infinitely many values, forming an interval on the number line like height or weight.
3
Identify the population and the sample:
a)
A survey of 1353 American households found that 18% of the households own computer.
Population – American households
Sample – 1353 American households surveyed.
b) A recent survey of 2625 elementary school children found that 28% of the children could be
classified obese.
Population – all elementary school children
Sample – 2625 elementary children surveyed
c) The average weight of every sixth person entering the mall within 3-hour period was 146 lb.
Population – all people entering the mall within the assigned 3-hour period
Sample – every 6th person entering the mall within the 3-hour period
d) The latest SWS presidential survey result
e) The percentage of vaccinated employees in OLFU
f) A company sends a survey out to a sample of 1000 recent customers, asking whether they are
satisfied with the products they received. Ninety percent indicated satisfaction.
Population – Company customers.
Sample – 1000 recent customers.
Parameter – percent of satisfaction of all customers.
Statistics – 90% satisfaction rate.
Identify whether the statement describes inferential statistics or descriptive statistics:
D
a. The average age of the students in a statistics class is 23 years.
I
b. The chances of winning the Philippines Lottery are one chance in twenty-two million.
I
c. There is a relationship between smoking cigarettes and getting emphysema.
I
d. From past figures, it is predicted that 29% of the registered voters in the Philippines will vote in
the June primary.
I
e. Percentage of Males in our class
Determine whether the data are qualitative or quantitative:
a) the colors of automobiles on a used car lot – qualitative
b) the numbers on the shirts of a girl’s soccer team – qualitative
c) the number of seats in a movie theater – quantitative
d) the temperature recorded within 24 hours – quantitative
e) The political color affiliation of the candidates – qualitative
f) a list of house numbers on your street – qualitative
g) the ages of a sample of 350 employees of a large hospital – quantitative
4
What is EXCEL?
A software program created by Microsoft that uses spreadsheets to organize numbers and data
with formulas and functions.
Excel analysis is ubiquitous around the world and used by businesses of all sizes to
perform financial analysis.

Is used for Electronic spreadsheet programs were originally based on paper spreadsheets
used for accounting.

The basic layout of computerized spreadsheets is the same as the paper ones. Related data
is stored in tables — which are a collection of small rectangular boxes or cells organized into
rows and columns.

All versions of Excel and other spreadsheet programs can store several spreadsheet pages
in a single computer file.

The saved computer file is often referred to as a workbook and each page in the workbook
is a separate worksheet
Spreadsheet Cells and Cell References:



The horizontal rows are identified by numbers (1, 2, 3) and the vertical columns by letters of
the alphabet (A, B, C).
The intersection point between a column and a row is the small rectangular box known as
a cell.
A cell reference is a combination of the column letter and the row number such as A3, B6
Data Types, Formulas, and Functions:
The types of data that a cell can hold include:

Numbers

Text

Dates and times

Boolean values

Formulas
Excel and Financial Data:
Spreadsheets are often used to store financial data. Formulas and functions that are
used on this type of data include:

Performing basic mathematical operations such as summing columns or rows of numbers

Finding values such as profit or loss

Calculating repayment plans for loans or mortgages

Finding the average, maximum, minimum and other statistical values in a specified range of
data

Carrying out What-If analysis on data, where variables are modified one at a time to see how
the change affects other data, such as expenses and profits
Excel's Other Uses:
Other common operations that Excel can be used for include:

Graphing or charting data to assist users in identifying data trends

Formatting data to make important data easy to find and understand
5




Printing data and charts for use in reports
Sorting and filtering data to find specific information
Linking worksheet data and charts for use in other programs such as Microsoft PowerPoint
and Word
Importing data from database programs for analysis
What is Excel used for?











Data entry
Data management
Accounting
Financial analysis
Charting and graphing
Programming
Time management
Task management
Financial modeling
Customer relationship management (CRM)
Almost anything that needs to be organized!
Data functions, formulas, and shortcuts:
The Excel software program includes many functions, formulas, and shortcuts that can be used
to enhance its functionality.
6
LESSON 2: Understanding Excel’s Statistical Capabilities
Working with worksheet functions:
 The Function Library presents the categories of formulas you can use and makes it convenient for you
to access them.
 The final selection of each category menu (like the Statistical Functions menu) is called Insert Function
 The Formula Bar is like a clone of a cell you select: Information entered into the Formula Bar goes into
the selected cell, and information entered in the selected cell appears in the Formula Bar.
 The Name box is something like a running record of what you do in the worksheet
 This dialog box enables you to search for a function that fits
your needs, or to scroll through a list of Excel functions.
 To open the Insert Function dialog box, you can also press
Shift+F3.
The steps in using a worksheet function are:
1. Type your data into a data array and select a cell for the result.
2. Select the appropriate formula category and choose your function from its pop-up menu.
3. In the Function Arguments dialog box, type the appropriate values for the function’s arguments.
4. Click OK to put the result into the selected cell.
7
Creating a shortcut to statistical functions:
Quickly accessing statistical functions
You can get to Excel’s statistical functions by selecting Formulas | More Functions | Statistical
Getting an array of results:
Array functions
An array function calculates multiple values and puts those values into an array of cells, rather than
into a single cell.
1. Enter the scores into an array of cells and enter the intervals into an array.
2. Select an array for the frequencies.
3. From the Statistical Functions menu, select FREQUENCY to open the Function Arguments
dialog box.
4. In the Function Arguments dialog box, enter the appropriate values for the arguments.
5. Press Ctrl+Shift+Enter to close the Function Arguments dialog box and put the values in the
selected array. For the Mac, it’s Command+Enter
Naming Array:
Note: If you apply meaningful names to these arrays, it helps you keep straight what you’re doing.
Four rules to follow when you supply a name for a range of cells:
1.
2.
3.
4.
Begin a name with an alphabetic character — a letter rather than a number or a punctuation
mark
As I just mentioned, make sure that the name contains no spaces or symbols. Use an
underscore to denote a space between words in the name.
Be sure that the name is unique within the worksheet
Be sure that the name doesn’t duplicate any cell reference in the worksheet
How to define a name:
Then,
1. Put a descriptive name at the top of a column (or to the left of a row) you want to name
2. Select the range of cells you want to name
3. Right-click on the selected range
4. From the pop-up menu, select Define Name.
5. Click OK
1. Entering a formula directly into a cell opens these boxes. Using the named array, then, the
formula is =SUM(score)
2. To keep track of the names in a worksheet, select Formulas | Name Manager
8
FORMULA
SUMIF and SUMIFS, add a set of numbers if specific conditions in one cell range (SUMIF) or in
more than one cell range (SUMIFS) are met.
How to use the SUMIF Formula:
1.
2.
Select a cell for the formula result.
Select the appropriate formula category and choose a function from its pop-up menu.
SUMIF has three arguments.
 The first, Range, is the range of cells to evaluate for the condition to include in the sum
 The second, Criteria, is the specific value in the Range.
 The third, Sum_range, holds the values to be summed up.
Eg.
=SUMIF(Region, “North”, Revenue_Millions)
How to use the SUMIFS Formula:
1. Select a cell for the formula result
2. Select the appropriate formula category and choose your function from its pop-up menu.
3. In the Function Arguments dialog box, enter the appropriate values for the arguments
4. The formula in the Formula bar is =SUMIFS(Revenue_Millions,Year,”<2008”,Region,”North”)
5. Click OK.
Sum_range: Revenue
Criteria_range1: Year
Criteria1: <2008
Criteria_range2: Region
Criteria2: North
Eg.
=SUMIFS(Number_students,Campus,"Pampanga",College,"CBA")
Creating own array formulas:
1. Select the array that will hold the answers to the array formula.
2. Into the selected way, type the formula.
3. Press Ctrl + Shift + Enter (not enter)
9
Tooling around with analysis
Tool
Excel’s Data Analysis Tools
What It Does
Anova: Single Factor
 Analysis of variance for two or more samples
Anova: Two Factor with Replication
 Analysis of variance with two independent variables, and multiple observations in each
combination of the levels of the variables
Anova: Two Factor without Replication
 Analysis of variance with two independent variables, and one observation in each combination of
the levels of the variables. It’s also a Repeated Measures Analysis of variance.
Correlation
 With more than two measurements on a sample of individuals, calculates a matrix of correlation
coefficients for all possible pairs of the measurements
Covariance
 With more than two measurements on a sample of individuals, calculates a matrix of covariances
for all possible pairs of the measurements
Descriptive Statistics
 Generates a report of central tendency, variability, and other characteristics of values in the
selected range of cells
Exponential Smoothing
 In a sequence of values, calculates a prediction based on a preceding set of values, and on a
prior prediction for those values
F-Test Two Sample for Variances
 Performs an F-test to compare two variances
Histogram
 Tabulates individual and cumulative frequencies for values in the selected range of cells
Moving Average
 In a sequence of values, calculates a prediction which is the average of a specified number of
preceding values
Random Number Generation
 Provides a specified amount of random numbers generated from one of seven possible
distributions
Rank and Percentile
 Creates a table that shows the ordinal rank and the percentage rank of each value in a set of
values
Regression
 Creates a report of the regression statistics based on linear regression through a set of data
containing one dependent variable and one or more independent variables
Sampling
 Creates a sample from the values in a specified range of cells
t-Test: Two Sample
 Three t-test tools test the difference between two means. One assumes equal variances in the
two samples. Another assumes unequal variances in the two samples. The third assumes
matched samples.
10
z-Test: Two Sample for Means
 Performs a two-sample z-test to compare two means when
the variances are known
In order to use these tools, you first have to load them into Excel.
To start, click File | Options
Doing this opens the Excel Options dialog box. Then follow these steps:
1. In the Excel Options dialog box, select Add-Ins.
Oddly enough, this opens a list of add-ins.
2. Near the bottom of the list, you see a drop-down list labeled Manage.
From this list, select Excel Add-Ins.
3. Click Go.
This opens the Add-Ins dialog box. (See Figure 2-23.)
4. Select the check box next to Analysis Toolpak and then click OK.
Note: When Excel finishes loading the Toolpak, you’ll find a Data Analysis button in the Analysis area of
the Data tab.
In general, the steps for using a data analysis tool are:
1. Enter your data into an array.
2. Click Data | Data Analysis to open the Data Analysis dialog box.
3. In the Data Analysis dialog box, select the data analysis tool you want to work with.
4. Click OK (or just double-click the selection) to open the dialog box for the selected tool.
5. In the tool’s dialog box, enter the appropriate information.
6. Click OK to close the dialog box and see the results.
Sample: Descriptive Statistics tool.
1. Enter your data into an array
2. Click Data | Data Analysis to open the Data Analysis dialog box.
3. Click Descriptive Statistics and click OK (or just double-click Descriptive Statistics) to open the
Descriptive Statistics dialog box.
4. Identify the data array.
5. Select the Columns radio button to indicate that the data are organized by columns.
6. Select the Labels in First Row check box, because the Input Range includes the column heading.
7. Select the New Worksheet Ply radio button, if it isn’t already selected. This tells Excel to create a
new tabbed sheet within the current worksheet, and to send the results to the newly created sheet.
8. Click the Summary Statistics check box and leave the others unchecked. Click OK.
11
Download