PPT ON BUSINESS STATISTICS CLASS-BBA SECTION-3 SUBJECT CODE-BBT 201 Statistics What is statistics? • a branch of mathematics that provides techniques to analyze whether or not your data is significant (meaningful) • Statistical applications are based on probability statements • Nothing is “proved” with statistics • Statistics are reported • Statistics report the probability that similar results would occur if you repeated the experiment Statistics deals with numbers • Need to know nature of numbers collected – Continuous variables: type of numbers associated with measuring or weighing; any value in a continuous interval of measurement. • Examples: – Weight of students, height of plants, time to flowering – Discrete variables: type of numbers that are counted or categorical • Examples: – Numbers of boys, girls, insects, plants Sample Populations avoiding Bias • Individuals in a sample population – Must be a fair representation of the entire pop. – Therefore sample members must be randomly selected (to avoid bias) – Example: if you were looking at strength in students: picking students from the football team would NOT be random Statistical Computations (the Math) • If you are using a sample population – Arithmetic Mean (average) The sum of all the scores divided by the total number of scores. – The mean shows that ½ the members of the pop fall on either side of an estimated value: mean http://en.wikipedia.org/wiki/Table_of_mathematical_symbols Mode and Median • Mode: most frequently seen value (if no numbers repeat then the mode = 0) • Median: the middle number – If you have an odd number of data then the median is the value in the middle of the set – If you have an even number of data then the median is the average between the two middle values in the set. Variance (s2) • Mathematically expressing the degree of variation of scores (data) from the mean • A large variance means that the individual scores (data) of the sample deviate a lot from the mean. • A small variance indicates the scores (data) deviate little from the mean Calculating the variance for a whole population Σ = sum of; X = score, value, µ = mean, N= total of scores or values OR use the VAR function in Excel Calculating the variance for a Biased SAMPLE population Σ = sum of; X = score, value, n -1 = total of scores or values-1 (often read as “x bar”) is the mean (average value of xi). Note the sample variance is larger…why? http://www.mnstate.edu/wasson/ed602calcvardevs.htm Standard Deviation • An important statistic that is also used to measure variation in biased samples. • S is the symbol for standard deviation • Calculated by taking the square root of the variance Time Series Analysis and Forecasting Introduction to Time Series Analysis • A time-series is a set of observations on a quantitative variable collected over time. • Examples – Dow Jones Industrial Averages – Historical data on sales, inventory, customer counts, interest rates, costs, etc • Businesses are often very interested in forecasting time series variables. • Often, independent variables are not available to build a regression model of a time series variable. • In time series analysis, we analyze the past behavior of a variable in order to predict its future behavior. Methods used in Forecasting • Regression Analysis • Time Series Analysis (TSA) – A statistical technique that uses timeseries data for explaining the past or forecasting future events. – The prediction is a function of time (days, months, years, etc.) – No causal variable; examine past behavior of a variable and and attempt to predict future behavior Components of TSA (Cont.) • Cycle – An up-and-down repetitive movement in demand. – repeats itself over a long period of time • Seasonal Variation – An up-and-down repetitive movement within a trend occurring periodically. – Often weather related but could be daily or weekly occurrence • Random Variations – Erratic movements that are not predictable because they do not follow a pattern Time Series Plot Actual Sales $3,000 Sales (in $1,000s) $2,500 $2,000 $1,500 $1,000 $500 $0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Time Period Components of TSA (Cont.) • Difficult to forecast demand because... – There are no causal variables – The components (trend, seasonality, cycles, and random variation) cannot always be easily or accurately identified Moving Averages Yt Yt-1 Yt- k +1 Yt 1 k No general method exists for determining k. We must try out several k values to see what works best. INDEX NUMBERS An index number is a statistical value that measures the change in a variable with respect to time • Two variables that are often considered in this analysis are price and quantity • With the aid of index numbers, the average price of several articles in one year may be compared with the average price of the same quantity of the same articles in a number of different years • We will examine index numbers that are constructed from a single item only • Such indexes are called simple index numbers • Current period = the period for which you wish to find the index number • Base period = the period with which you wish to compare prices in the current period • The choice of the base period should be considered very carefully • The notation we shall use is: – pn = the price of an item in the current period – po = the price of an item in the base period • Price relative – The price relative of an item is the ratio of the price of the item in the current period to the price of the same item in the base perIod • Simple aggregate index (cont…) – Even though the simple aggregate index is easy to calculate, it has serious disadvantages: 1. An item with a relatively large price can dominate the index 2. If prices are quoted for different quantities, the simple aggregate index will yield a different answer 3. It does not take into account the quantity of each item sold – Disadvantage 2 is perhaps the worst feature of this index, since it makes it possible, to a certain extent, to manipulate the value of the index Weighted index numbers • The use of a weighted index number or weighted index allows greater importance to be attached to some items • Information other than simply the change in price over time can then be used, and can include such factors as quantity sold or quantity consumed for each item • Laspeyres index – The Laspeyres index is also known as the average of weighted relative prices – In this case, the weights used are the quantities of each item bought in the base period CONSUMER PRICE INDEX • The measure most commonly used in Australia as a general indicator of the rate of price change for consumer goods and services is the consumer price index • The Indian CPI assumes the purchase of a constant ‘basket’ of goods and services and measures price changes in that basket alone • The description of the CPI commonly adopted by users is in terms of its perceived uses; hence there are frequent references to the CPI as – a measure of inflation – a measure of changes in purchasing power, or – a measure of changes in the cost of living Introduction to Probability Theory • Experiment: toss a coin twice • Sample space: possible outcomes of an experiment – S = {HH, HT, TH, TT} • Event: a subset of possible outcomes – A={HH}, B={HT, TH} • Probability of an event : an number assigned to an event Pr(A) – Axiom 1: Pr(A) 0 – Axiom 2: Pr(S) = 1 – Axiom 3: For every sequence of disjoint events – Example: Pr(A) = n(A)/N: frequentist statistics • Consider the experiment of tossing a coin twice • Example I: – A = {HT, HH}, B = {HT} – Will event A independent from event B? • Example II: – A = {HT}, B = {TH} – Will event A independent from event B? • Disjoint Independence • If A is independent from B, B is independent from C, will A be independent from C? BAYES THEOREM Pr( AB) Pr( A | B) Pr( B) Pr( B | A) Pr( A) Pr( A) RANDOM VARIABLE • A random variable X is a numerical outcome of a random experiment • The distribution of a random variable is the collection of possible outcomes along with their probabilities: – Discrete case – Continuous case: • The outcome of an experiment can either be success (i.e., 1) and failure (i.e., 0). • Pr(X=1) = p, Pr(X=0) = 1-p, or • E[X] = p, Var(X) Simple Linear Regression and Correlation 17.31 Linear Regression Analysis… • Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). • Dependent variable: denoted Y • Independent variables: denoted X1, X2, …, Xk • If we only have ONE independent variable, the model is 17.32 Correlation Analysis… “-1 < < 1” • If we are interested only in determining whether a relationship exists, we employ correlation analysis. Example: Student’s height and weight. Plot of Height vs Weight Plot of Height vs Weight 7 7 6.6 6.2 Height Height 6.6 5.8 5.4 6.2 5.8 5 4.6 100 140 180 220 5.4 260 100 140 Weight 180 220 260 Weight Plot of Height vs Weight Plot of Height vs Weight 6.8 6.6 6.2 6.2 Height Height 6.5 5.9 5.6 5.8 5.4 5.3 100 140 180 Weight 220 260 5 100 140 180 220 260 Weight 17.33 Correlation Analysis… “-1 < < 1” • If the correlation coefficient is close to +1 that means you have a strong positive relationship. • If the correlation coefficient is close to -1 that means you have a strong negative relationship. • If the correlation coefficient is close to 0 that means you have no correlation. 17.34