STATISTICAL ANALYSIS APPLICATIONS WITH SOFTWARE LESSON 1 - NATURE OF STATISTICS • Lesson 1.1. Basic terminologies used in Statistics Statistics and Accountancy ๏ Statisticians and Accountants • • • • • • • • • • • • Statisticians work with quantitative and qualitative data; different types of data (e.g., drug efficacy, citizens' living conditions, consumer purchasing patterns, performance of an individual/company/institution) design data collection methods, such as surveys identify digital sources for data collection oversee data collection compile and analyze data collected using statistical methods and software report findings and prevent misinterpretation of results provide recommendations for using results Accountants work mostly with quantitative data than qualitative data financial figures Accountants review financial documents to make sure clients manage funds appropriately Prepare financial documents and report them as necessary Ensure proper procedures are being used for collecting, storing, and reporting financial information Make recommendations for improving financial performance research conducted, industry financial statements, business periodicals, government reports. DATA COLLECTION: - Primary Data – data gathered by the researcher - Secondary Data – data of other sources - Census survey – complete enumeration in which every member of the population is included - Sample survey – survey of a portion of the population Statistics and statistic • • • Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data. A statistic is the descriptor of a set of sample data. The descriptors of population data are referred to as population parameters. Population parameters and sample statistics and their relevant symbols Population and Sample Variable • • a characteristic or measurement that can be determined for each member of a population. Variables may be numerical or categorical. - Numerical variables take on values with equal units such as weight in pounds and time in hours. - Categorical variables place the person or thing into a category - We could do some math with numerical values, but it makes no sense to do math with categorical values. Data • • Are the actual values of the variable. They may be numbers or they may be words. Sources of Data: - Primary data – data that come from an original source, and are intended to answer specific research questions; may be taken by interview, mail-in questionnaire, survey, or experimentation. - Secondary data – data that are taken from previously recorded data - information in Descriptive and Inferential Statistics • • • • Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. ! Not used to make conclusions beyond the data which have been analyzed ! Not used to reach conclusions regarding any hypotheses made. Inferential statistics is a branch of statistics that makes the use of various analytical tools to draw inferences about the population data from sample data. Two general types of statistic that are used to describe data: • Measures of central tendency: these are ways of describing the central position of a frequency distribution for a group of data. The • central position may be described using the mode, median, and mean. Measures of spread: these are ways of summarizing a group of data by describing how spread out the scores are. To describe this spread, a number of statistics are available. These include the range, quartiles, absolute deviation, variance and standard deviation. • • Two main types of inferential statistics: hypothesis test and regression analysis • • • • • • A hypothesis test is a type of inferential statistics that is used to test assumptions and draw conclusions about the population from the available sample data. - Involves setting up a null hypothesis (H0) and an alternative hypothesis (Ha) - Followed by conducting a statistical test of significance. A conclusion is drawn based on the value of the test statistic, the critical value and the confidence intervals. A hypothesis test can be left-tailed, right-tailed, and two-tailed. A regression analysis is used to quantify how one variable will change with respect to another variable. There are many types of regressions available such as: - Simple Linear regression - Multiple Linear regression - Nominal regression - Logistic regression - Ordinal regression The most commonly used regression in inferential statistics is linear Regression which checks the effect of a unit change of the independent variable in the dependent variable. Lesson 1.2. Levels of Measurement • • • • Dependent variable and independent variable An independent variable is used to test the effects on the dependent Variable. - changed or controlled - “cause” A dependent variable is the variable being tested and Measured in a scientific experiment. - changes in response to the independent variable - depends upon the values and/or changes of the independent variable - “effect” by the changes in independent variable is seen - Ex. 1. Drug/medicine (dosage) and its effect on patients’ blood pressure - Ex. 2. Amount of fertilizer given to plants and its effect on plant growth NOMINAL SCALE - a scale of measurement that uses a label or category to define an attribute of an element. Nominal data may be recorded with a nonnumeric description or with a numeric code ORDINAL SCALE - a scale of measurement that has the properties of a nominal scale and can be used to RANK or ORDER the observations. Ordinal data may be recorded with a nonnumeric description or with a numeric code - Ex. Restaurant evaluation - Variable: Customer Service INTERVAL SCALE - a scale of measurement that has the properties of an ordinal scale and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. RATIO SCALE - a scale of measurement that has the properties of an interval scale and the ratio of observations is meaningful. Ratio data are always numeric. - A requirement of a ratio scale is that a ZERO value is inherently defined in the scale. Specifically, it must indicate nothing exists for the variable at the zero point. - Examples: distance, height, weight, time, cost Cost of three cars A, B, and C: 0, 1.5 million, 3.0 million respectively Car A is free, Car C is twice as expensive as Car B -> 3,000,000/1,500,000 Fish bowl – 1, 2, 3, 4 If 1 is drawn, then the number pointed to at the TRN will be a single digit number. If the number pointed to is 7, then your samples are the 7th, 27th, 47th, 67th…. If 2 is drawn, then the number pointed to at the TRN will be a two-digit number. If the number pointed to is 2 Lesson 1.3. Sampling Methods • • RANDOM (or PROBABLITY) SAMPLE - sample obtained from a population where all members (of the sample) are chosen without particular preference - ! all members of the population have equal chances of being selected - Examples: getting a sample of 5 senior officials and 5 junior officials from a population of 90 officials (45 junior and 45 senior officials); getting a sample of 10 male and 10 female respondents from a population of 208 employees (104 male and 104 female employees) - Simple Random Sampling - Create a list with label and choose random samples by fish bowl, roulette, OR use fish bowl, roulette/online random picker without creating a list - Systematic Sampling, StratifiedProportional Sampling, Cluster Sampling, - Multi-stage Sampling - Combination of different sampling techniques NON-RANDOM (or NON-PROBABILITY) SAMPLE - sample obtained from a population where all members of the sample are picked on the basis of some preference - Examples: getting a sample of 10 senior officials from a population of 90 junior and senior officials; getting a sample of 10 male respondents from a population of 208 male and female employees - Quota Sampling, Purpose Sampling, Convenience Sampling Kth value ๐ฒ= ๐ต ๐ ๐๐๐ Quota Sampling- a sampling method of gathering representative data from a group Application of this method ensures that sample group represents certain characteristics of the population chosen by the researcher. ! A sample should be a good estimate of a population parameter Simple Random Sample Without Replacement and Simple Random Sample With Replacement - Sample With Replacement does not change the probabiility of the second, third… nth pick - The number of different simple random samples of size n that can be selected from a finite population of size N is - ๐ต! ๐!(๐ต−๐)! Lesson 1.4. Summation Notations The summation sign Σ โช Denotes the addition of a series of numbers ๐ ∑ ๐ฅ๐ = ๐ฅ1 + ๐ฅ2 + ๐ฅ3 + โฏ + ๐ฅ๐ ๐=1 i is called the index of summation ๐ต ๐ฒ= ๐ ๐๐๐๐ • Ex. The sum of n observations of variable x from x1 to xn, that is x1 + x2 + x3 + . . . + xn is denoted as Example: Consider a population size of 2000 (4 digits) and sample size = 100 ๐ฒ= NON-RANDOM SAMPLING 1 and n are the lower and upper limits of summation = 20 Difference between โ that the annual salary and the management training participation information for all 2500 managers have been obtained from the firm’s personnel records โ that the population mean and the population standard deviation have been computed using the following computations: ๐ ๐ (∑ ๐๐ ) ๐=๐ and ๐ ∑(๐๐ ) ๐ ๐=๐ ๐๐๐๐ข๐๐๐ก๐๐๐ ๐๐๐๐: ๐ = ∑ ๐ฅ๐ ๐ = ∑ ๐ฅ๐ 2500 = $41,800 ∑(๐ฅ๐ −๐)2 Adding a constant to each observation in the given data in the previous Example may be expressed as ๐๐๐๐ข๐๐๐ก๐๐๐ ๐ ๐ก๐๐๐๐๐๐ ๐๐๐ฃ๐๐๐ก๐๐๐: ๐ = √ √ ∑(๐ฅ๐ −๐)2 2500 ๐ = = $4000 ๐ ∑(๐๐ + ๐) ๐=๐ Example 3. Given: 156, 205, 270, 309, 311 ; c = 4 Assume that a review of the 2500 records shows that 1500 managers have completed the training program ๐๐๐๐๐๐๐ก๐๐๐ ๐๐ ๐กโ๐ ๐๐๐๐ข๐๐๐ก๐๐๐ โ๐๐ฃ๐๐๐ ๐๐๐๐๐๐๐ก๐๐ ๐กโ๐ ๐ก๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐ (๐) ๐ = 1500⁄2500 = 0.60 ๐ ∑(๐ฅ๐ + ๐) = ∑(๐ฅ1 + ๐) + (๐ฅ2 + ๐) + (๐ฅ3 + ๐) ๐=1 + (๐ฅ4 + ๐) + (๐ฅ5 + ๐) Numerical characteristic of the population→ parameters = ∑[(156 + 4) + (205 + 4) + (270 + 4) + (309 + 4) + (311 + 4)] ๐ ๐= $41,800 ๐ ∑(๐๐ + ๐) = ∑ ๐๐ + ๐๐ ๐=๐ ๐=๐ EAI Managers Example 4. Given: 156, 205, 270, 309, 311 ; c = 4 ๐= ๐ = 0.60 $4000 ๐ ∑(๐๐ฅ๐ ) = (∑[4(156) + 4(205) + 4(270) + 4(309) ๐=1 + 4(311)] ) = 624+820+1080+1236+1244 = 5004 ILLUSTRATIVE PROBLEM Suppose that we would like to take a sample of 100 families in a barangay that is composed of 2000 families which is not homogenous. It consists of Low, Middle, and High-Income brackets. ILLUSTRATIVE PROBLEM Simple Random or Systematic method? Electronics Associates Inc. is an international company that manufactures a diverse line of products in plants located throughout the United States, Canada, and Europe. STRATIFIED RANDOM SAMPLING INCOME BRACKET NUMBER OF FAMILIES The firm’s director of personnel has been assigned the task of developing a profile of the company’s 2500 managers. The group includes department heads, plant superintendents, and division managers. High Income 400 families The characteristics that are to be identified include the mean annual salary for the managers and the proportion of managers having completed the company’s management training program. Middle Income 600 families Low Income 1000 families TOTAL 2000 milies Assume: ๐๐๐๐๐๐๐ก๐๐๐ ๐ โ๐๐๐/๐๐๐๐๐๐๐ก๐๐๐ (๐๐ ) ๐๐ = ๐๐ ∗๐ ๐ INCOME BRACKET NUMBER OF FAMILIES High 400 400 = 0.2 ∗ 100 2000 = 20 600 600 = 0.3 ∗ 100 2000 = 30 1000 1000 = 0.5 ∗ 100 2000 = 50 Income Middle Income Low Income PERCENTAGE SHARE (PROPORTION) Suppose that we would like to take a sample of 250 families INCOME BRACKET NUMBER OF FAMILIES PERCENTAGE SHARE High 400 400 = 0.2 ∗ 250 = 50 2000 600 600 = 0.3 ∗ 250 = 75 2000 1000 1000 = 0.5 ∗ 250 = 125 2000 Income Middle Income Low Income (PROPORTION)