1 SASA211: Statistical Analysis LESSON 1: Evaluating Data in the Real world CONCEPT STATISTICS The interpretation and collection of data It deals to measure and analyze variability Those variability's can be: Height, Weight, Food preferences, Hair color. There are TWO kind of Statistics, and these are: 1) 2) Inferential Statistics It is concerned with inferring or drawing conclusions about the population based on samples. Descriptive Statistics Aims to describe a sample, rather than use the data to learn about the population. TOOLS Inferential Statistics Hypothesis testing Correlation Regression Analysis Examples: A manager of Jollibee selected 20 samples of large fries to check if correct amount of fries is served. A researcher wants to know if there is a relation between the studying time and the performance of a student in Math. Descriptive Statistics Tables and Graphs Measures of Central Tendency Measures of Variability Examples: Average rainfall in the Philippines last year Number of car thefts in Pampanga in 2017 Percentage of Males in our class SAMPLE AND POPULATION: 1) 2) Population is defined as a group of people, animals, places, things, or ideas. Sample a subgroup of the population. What is a variable? It is a characteristic or attribute that can assume different values. All experiments examine some kind of variable(s). A variable is not only something that we MEASURE, but also something that we can MANIPULATE and something we can CONTROL for. 2 TYPE OF VARIABLES: 1) 2) Quantitative variable (Numerical) are quantities that can be counted with your bare hands can be measured with the use of some measuring devices or can be calculated with the use of mathematical formula. Qualitative variable (Categorical) are non-measurable characteristics that cannot assume a numerical value but can be classified into two or more categories. PROBABILITY When statisticians make decisions, they express their confidence about those decisions in terms of probability. They can never be certain about what they decide. They can only tell you how probable their conclusions are. ACTIVITIES: Population Is the collection of all elements of interest in a particular study. Parameter Is a characteristic of a population. Sample Is a subset of the poplation from which information is collected. Statistic Is a characteristic of a sample. Qualitative Variable A variable that may be classified into categories. Examples: hair color, religion, political party, profession. Quantitative Variable Variables whose values result from counting or measuring something. Examples: height, weight, time in the 100 yard dash, number of items sold to a shopper. Discrete Variable Can take either a finite or a countable number of values like number of people in front of you at the bank. Continuous Variable Can take infinitely many values, forming an interval on the number line like height or weight. 3 Identify the population and the sample: a) A survey of 1353 American households found that 18% of the households own computer. Population – American households Sample – 1353 American households surveyed. b) A recent survey of 2625 elementary school children found that 28% of the children could be classified obese. Population – all elementary school children Sample – 2625 elementary children surveyed c) The average weight of every sixth person entering the mall within 3-hour period was 146 lb. Population – all people entering the mall within the assigned 3-hour period Sample – every 6th person entering the mall within the 3-hour period d) The latest SWS presidential survey result e) The percentage of vaccinated employees in OLFU f) A company sends a survey out to a sample of 1000 recent customers, asking whether they are satisfied with the products they received. Ninety percent indicated satisfaction. Population – Company customers. Sample – 1000 recent customers. Parameter – percent of satisfaction of all customers. Statistics – 90% satisfaction rate. Identify whether the statement describes inferential statistics or descriptive statistics: D a. The average age of the students in a statistics class is 23 years. I b. The chances of winning the Philippines Lottery are one chance in twenty-two million. I c. There is a relationship between smoking cigarettes and getting emphysema. I d. From past figures, it is predicted that 29% of the registered voters in the Philippines will vote in the June primary. I e. Percentage of Males in our class Determine whether the data are qualitative or quantitative: a) the colors of automobiles on a used car lot – qualitative b) the numbers on the shirts of a girl’s soccer team – qualitative c) the number of seats in a movie theater – quantitative d) the temperature recorded within 24 hours – quantitative e) The political color affiliation of the candidates – qualitative f) a list of house numbers on your street – qualitative g) the ages of a sample of 350 employees of a large hospital – quantitative 4 What is EXCEL? A software program created by Microsoft that uses spreadsheets to organize numbers and data with formulas and functions. Excel analysis is ubiquitous around the world and used by businesses of all sizes to perform financial analysis. Is used for Electronic spreadsheet programs were originally based on paper spreadsheets used for accounting. The basic layout of computerized spreadsheets is the same as the paper ones. Related data is stored in tables — which are a collection of small rectangular boxes or cells organized into rows and columns. All versions of Excel and other spreadsheet programs can store several spreadsheet pages in a single computer file. The saved computer file is often referred to as a workbook and each page in the workbook is a separate worksheet Spreadsheet Cells and Cell References: The horizontal rows are identified by numbers (1, 2, 3) and the vertical columns by letters of the alphabet (A, B, C). The intersection point between a column and a row is the small rectangular box known as a cell. A cell reference is a combination of the column letter and the row number such as A3, B6 Data Types, Formulas, and Functions: The types of data that a cell can hold include: Numbers Text Dates and times Boolean values Formulas Excel and Financial Data: Spreadsheets are often used to store financial data. Formulas and functions that are used on this type of data include: Performing basic mathematical operations such as summing columns or rows of numbers Finding values such as profit or loss Calculating repayment plans for loans or mortgages Finding the average, maximum, minimum and other statistical values in a specified range of data Carrying out What-If analysis on data, where variables are modified one at a time to see how the change affects other data, such as expenses and profits Excel's Other Uses: Other common operations that Excel can be used for include: Graphing or charting data to assist users in identifying data trends Formatting data to make important data easy to find and understand 5 Printing data and charts for use in reports Sorting and filtering data to find specific information Linking worksheet data and charts for use in other programs such as Microsoft PowerPoint and Word Importing data from database programs for analysis What is Excel used for? Data entry Data management Accounting Financial analysis Charting and graphing Programming Time management Task management Financial modeling Customer relationship management (CRM) Almost anything that needs to be organized! Data functions, formulas, and shortcuts: The Excel software program includes many functions, formulas, and shortcuts that can be used to enhance its functionality. 6 LESSON 2: Understanding Excel’s Statistical Capabilities Working with worksheet functions: The Function Library presents the categories of formulas you can use and makes it convenient for you to access them. The final selection of each category menu (like the Statistical Functions menu) is called Insert Function The Formula Bar is like a clone of a cell you select: Information entered into the Formula Bar goes into the selected cell, and information entered in the selected cell appears in the Formula Bar. The Name box is something like a running record of what you do in the worksheet This dialog box enables you to search for a function that fits your needs, or to scroll through a list of Excel functions. To open the Insert Function dialog box, you can also press Shift+F3. The steps in using a worksheet function are: 1. Type your data into a data array and select a cell for the result. 2. Select the appropriate formula category and choose your function from its pop-up menu. 3. In the Function Arguments dialog box, type the appropriate values for the function’s arguments. 4. Click OK to put the result into the selected cell. 7 Creating a shortcut to statistical functions: Quickly accessing statistical functions You can get to Excel’s statistical functions by selecting Formulas | More Functions | Statistical Getting an array of results: Array functions An array function calculates multiple values and puts those values into an array of cells, rather than into a single cell. 1. Enter the scores into an array of cells and enter the intervals into an array. 2. Select an array for the frequencies. 3. From the Statistical Functions menu, select FREQUENCY to open the Function Arguments dialog box. 4. In the Function Arguments dialog box, enter the appropriate values for the arguments. 5. Press Ctrl+Shift+Enter to close the Function Arguments dialog box and put the values in the selected array. For the Mac, it’s Command+Enter Naming Array: Note: If you apply meaningful names to these arrays, it helps you keep straight what you’re doing. Four rules to follow when you supply a name for a range of cells: 1. 2. 3. 4. Begin a name with an alphabetic character — a letter rather than a number or a punctuation mark As I just mentioned, make sure that the name contains no spaces or symbols. Use an underscore to denote a space between words in the name. Be sure that the name is unique within the worksheet Be sure that the name doesn’t duplicate any cell reference in the worksheet How to define a name: Then, 1. Put a descriptive name at the top of a column (or to the left of a row) you want to name 2. Select the range of cells you want to name 3. Right-click on the selected range 4. From the pop-up menu, select Define Name. 5. Click OK 1. Entering a formula directly into a cell opens these boxes. Using the named array, then, the formula is =SUM(score) 2. To keep track of the names in a worksheet, select Formulas | Name Manager 8 FORMULA SUMIF and SUMIFS, add a set of numbers if specific conditions in one cell range (SUMIF) or in more than one cell range (SUMIFS) are met. How to use the SUMIF Formula: 1. 2. Select a cell for the formula result. Select the appropriate formula category and choose a function from its pop-up menu. SUMIF has three arguments. The first, Range, is the range of cells to evaluate for the condition to include in the sum The second, Criteria, is the specific value in the Range. The third, Sum_range, holds the values to be summed up. Eg. =SUMIF(Region, “North”, Revenue_Millions) How to use the SUMIFS Formula: 1. Select a cell for the formula result 2. Select the appropriate formula category and choose your function from its pop-up menu. 3. In the Function Arguments dialog box, enter the appropriate values for the arguments 4. The formula in the Formula bar is =SUMIFS(Revenue_Millions,Year,”<2008”,Region,”North”) 5. Click OK. Sum_range: Revenue Criteria_range1: Year Criteria1: <2008 Criteria_range2: Region Criteria2: North Eg. =SUMIFS(Number_students,Campus,"Pampanga",College,"CBA") Creating own array formulas: 1. Select the array that will hold the answers to the array formula. 2. Into the selected way, type the formula. 3. Press Ctrl + Shift + Enter (not enter) 9 Tooling around with analysis Tool Excel’s Data Analysis Tools What It Does Anova: Single Factor Analysis of variance for two or more samples Anova: Two Factor with Replication Analysis of variance with two independent variables, and multiple observations in each combination of the levels of the variables Anova: Two Factor without Replication Analysis of variance with two independent variables, and one observation in each combination of the levels of the variables. It’s also a Repeated Measures Analysis of variance. Correlation With more than two measurements on a sample of individuals, calculates a matrix of correlation coefficients for all possible pairs of the measurements Covariance With more than two measurements on a sample of individuals, calculates a matrix of covariances for all possible pairs of the measurements Descriptive Statistics Generates a report of central tendency, variability, and other characteristics of values in the selected range of cells Exponential Smoothing In a sequence of values, calculates a prediction based on a preceding set of values, and on a prior prediction for those values F-Test Two Sample for Variances Performs an F-test to compare two variances Histogram Tabulates individual and cumulative frequencies for values in the selected range of cells Moving Average In a sequence of values, calculates a prediction which is the average of a specified number of preceding values Random Number Generation Provides a specified amount of random numbers generated from one of seven possible distributions Rank and Percentile Creates a table that shows the ordinal rank and the percentage rank of each value in a set of values Regression Creates a report of the regression statistics based on linear regression through a set of data containing one dependent variable and one or more independent variables Sampling Creates a sample from the values in a specified range of cells t-Test: Two Sample Three t-test tools test the difference between two means. One assumes equal variances in the two samples. Another assumes unequal variances in the two samples. The third assumes matched samples. 10 z-Test: Two Sample for Means Performs a two-sample z-test to compare two means when the variances are known In order to use these tools, you first have to load them into Excel. To start, click File | Options Doing this opens the Excel Options dialog box. Then follow these steps: 1. In the Excel Options dialog box, select Add-Ins. Oddly enough, this opens a list of add-ins. 2. Near the bottom of the list, you see a drop-down list labeled Manage. From this list, select Excel Add-Ins. 3. Click Go. This opens the Add-Ins dialog box. (See Figure 2-23.) 4. Select the check box next to Analysis Toolpak and then click OK. Note: When Excel finishes loading the Toolpak, you’ll find a Data Analysis button in the Analysis area of the Data tab. In general, the steps for using a data analysis tool are: 1. Enter your data into an array. 2. Click Data | Data Analysis to open the Data Analysis dialog box. 3. In the Data Analysis dialog box, select the data analysis tool you want to work with. 4. Click OK (or just double-click the selection) to open the dialog box for the selected tool. 5. In the tool’s dialog box, enter the appropriate information. 6. Click OK to close the dialog box and see the results. Sample: Descriptive Statistics tool. 1. Enter your data into an array 2. Click Data | Data Analysis to open the Data Analysis dialog box. 3. Click Descriptive Statistics and click OK (or just double-click Descriptive Statistics) to open the Descriptive Statistics dialog box. 4. Identify the data array. 5. Select the Columns radio button to indicate that the data are organized by columns. 6. Select the Labels in First Row check box, because the Input Range includes the column heading. 7. Select the New Worksheet Ply radio button, if it isn’t already selected. This tells Excel to create a new tabbed sheet within the current worksheet, and to send the results to the newly created sheet. 8. Click the Summary Statistics check box and leave the others unchecked. Click OK. 11