Lesson 1 Objective 1: List examples in business and economics in which various statistical methods are applied. Statistics - The science concerned with developing and studying methods, for collecting, analyzing, interpreting and presenting empirical data and numerical facts. • • • Statistics refers to numerical facts such as averages, medians, percentages, and maximums that helps us understand a variety of business and economic situations. Statistics is the art and science of collecting, analyzing, presenting, and interpreting data. The datas collected, analyzed and presented are often shown to decline makers or policymakers. Applications in Business and Economics Global business and economic environment can access vast amount of statistical information. • Accounting - Accounting firms use statistical sampling procedures when conducting audits for their clients. Auditors draws a conclusion by reviewing the data sample in the clients accounts and will decide whether the accounts receivable shown on the Client’s balance sheet is acceptable. • Finance - Financial analyst use a variety of statistical information to guide their investment recommendations. Analysts review financial data such as price/earnings ratios and dividends yield. The data collected in regards to the stock market then would give the analyst an idea whether it is a good business for investment. • Marketing - Electronic scanners at retail checkout counters collect data for a variety of marketing research h applications. Manufacturers will spend thousands of dollars per product category to collect the information from this scanner. This is to give a better understanding of consumer activities and what brings consumer in to the market. • Production - A variety of statistical quality control charts are used To monitor the output of production process. • Economics - Economists frequently provide forecasts about the future of the economy or some aspects of it. They used the data collected which are statistical informations to create such forecast. • Information Systems - Information systems administrators are responsible for the day to day operation of an organization’s computer networks. Statistical information helps administrators asses the performanc of computer networks, including the local area networks (LANs), wide area networks (WANs), network segments, intranets, and other data communication systems. Applications of statistics and those described in this section are integral part.. Statistics in practice applications shops the importance of statistics in wide variety of business and economic situations. Objective 2: Identify the elements, variables, and observations of the dataset for given set of data. Data - The facts and figures collected, analyzed, and summarized for presentation and interpretation. Data set - Data collected in particular study. Elements - Elements are the entities on which data are collected. Variable - A characteristics of interest for the element. Measurements collected on each variable for every element in a study provide the data. Objective 3: Identify whether a variable is quantitative or categorical. Scales of Measurement: • Nominal • Ordinal • Interval • Ratio Nominal Scale - The data for a variable consist of labels or names used to identify an attribute of the element. Ordinal Scale - The data exhibit the properties of nominal scale and the order of rank of the data is meaningful. • Rating letters provide the similar to nominal data, but in addition, the data can also be ranked or ordered based on the credit rating. Interval Scale - The data have all the properties of ordinal data and the interval between the values is expressed in terms of fixed unit measurements. • always numerical Ratio Scale - The data have all the properties of interval scale and the ratio of two is meaningful. Variables such as distance, height, weight, and time use the ratio scale of measurement. • this scale requires that a zero value be included to indicate that nothing exists for the variable at the zero point. - - - - - - - - - - - - - - - - - - - - - - - - # - - # - Categorical and Quantitative Data Categorical data - Data that can be grouped by specific categories. • This type of data use either the nominal or ordinal scale of measurements. Quantitative Data - Data that uses numeric values to indicate how much or how many. • This type of data is obtained using either interval or action scale of measurement. Categorical Variable - A variable with categorical data Quantitative Variable - A variable with quantitative data If variable is categorical the statistics analysis is limited. This data can be summarized by counting the number of observations in each category or by computing the proportion of the observations in each category. Although numerical codes are provided arithmetic method will not be able to apply such as addition, subtraction, multiplication or division. - - - - - - - - - - - - Cross-Sectional and Time Series Data Cross Sectional Data - Data collected at the same approximately the same point in time. · Time Series Data - Data collected over several time periods. Graphs of time series data are frequently found in business and economic publications. This graphs helps analyst understand what happened in the past, identify the trends over time, and project future values for time series. Data Sources Data needed for particular application already exist. Often times companies maintain a variety of databases about their employees, customers, and their business operations. • companies will collect and maintain detailed data about their customers and employees. > • Organizations that specializes in collecting and maintaining data make available substantial amounts of business and economic data. • Companies are able to access data from these organization through purchasing a leasing access for these information. = F Objective 4: Describe descriptive statistics and statistical interference. Descriptive Statistics - Summaries of data, which may be tabular, graphical, or numerical. • Statistical information in the media, company reports, and other publications consists of data that a re summarized and presented in a form that is easy for the reader to understand. • Numerical Descriptive statistics are used to summarize data’s. • The most common numerical measure is the average, or mean. Statistical Interference - The process of drawing conclusions about an underlying population based on a sample or a subset of the data. - - - - Population - A population is the set of all elements of interest in particular study. Sample - A sample is the subset of population. Census - The process of conducting a survey to collect data for the entire population. Sample Survey - The process of conducting a survey to collect data for a sample • • As one of its major contributions, statistics uses data from a sample to make estimates and test hypotheses about the characteristics of a population through a process referred to as statistical interference. Statisticians use as a sample toy estimate a population characteristics of interest they usually provide a statement of the quality. Objective 5: Identify the population for a given study and the contrast population and sample. A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from. The size of sample is always less than the total size of the population. I research population doesn’t always reference to people. - - - Objective 6: Describe the role of random samples in statistical interference. The use of randomization in sampling allows for that analysis of results using the methods of statistical interference. • Random sampling ensures that results obtained from your sample - should approximately what would have been obtained if the entire population had been measured. Objective 7: Describe the importance of big data and data mining. Big Data - The large, diverse sets of information that grow at everincreasing rates. • It encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the data points being covered (known as the “three v’s of big data”) • Big data analytics helps organizations harness their data an use it to identify new opportunities. • This helps leads to smarter business moves, more efficient operations, higher profits and happier customers. Data Mining - The process of finding anomalies, patterns and correlations within large data sets to predict outcomes. • The application of statistics in the form of exploratory data analysis ad predictive models to reveal patters and trends in very large data sets. • • • Statistics form the core portion of data mining which covers the entire process of data analysis. Statistics help in identifying patterns that further help identify differences between random noise and significant findings. Data mining is key to sentimental analysis, price optimization, database markleti9ng, credit risk management, training and support, fraud detection, healthcare and medical diagnoses, risk assessment, recommendation systems. Objective 8: Identify the ethical guidelines for statistical practice. Ethical behaviour is something we should strive in all what we do. Ethical issues arise in statistics because the of the important role statistics play in the collection, analysis, presentation, and interpretation of data. • When performing your own statistical work, it is encouraged to be fair, thorough, objective, and neutral as you collect data, conduct analyses, make oral presentation,. And present within reports containing information developed. • As a consumer of statistics, you should also be aware of possibility of unethical statistical behaviours by others. • When you see statistics in the media ensure too check the sources on where, how and when is the data collected. • Americans Statistical Association - The nation’s leading professional organization for statistics and statisticians, developed the report for “ Ethical Guidelines for Statistical Practice” to help statistical practitioners make and communicate ethical decisions and assist students in learning how to perform statistical work responsibly. • The report contains 52 guidelines organized into eight topic areas: professional Integrity and Accountability; Integrity of Data and Methods; Responsibilities to Science/Public/Funder/ Client;Responsibilities to Research Subjects; Responsibilities to Research Team Colleagues; Responsibilities to Other Statisticians or Statisticians Practitioners; Responsibilities Regarding Allegations of misconduct; and Responsibilities of Employers Including Organizations, Individuals,. Attorney, or Other Clints Employing Statistical Practitioners. Lesson 2: Descriptive Statistics: Tabular and Graphical Displays Objective 1: Determine the frequency distribution and the relative frequency distribution for a categorical variable. Categorical Data - Use labels or names to identify categories of like items. Quantitative Data - Numerical values that indicate how much or how many. Tabular or Graphical Displays - Used to summarize data for a single variable. Data Visualization - A term often used to describe the use of graphical displays to summarize and present information about a data set. Summarizing Data for a Categorical Variable Frequency Distribution - A tabular summary of data showing the number (frequency) . . . of observation in each of several non overlapping categories or classes. - Frequency Distribution of Soft Drink Purchase Soft Drink Coca-Cola Diet Coke Dr.Pepper Pepsi Sprite Total Frequency 19 8 5 13 5 50 Relative Frequency and Percent Frequency Distribution A frequency distribution shows the number (frequency) of observations in each of several non overlapping classes. The relative frequency of a class equals the fraction or proportion of observation belonging to a class. In a data set n = observations Relative frequency of a class = Frequency of the class N The percent frequency of a class is the given relative frequency multiplied by 100. Relative Frequency Distribution - It gives a tabular summary of data showing the relative frequency for each class. Percent Frequency Distribution - It summarizes the percent frequency of the data for each class. Relative Frequency and Percentage Frequency Distributions of Soft Drink Purchases Soft Drink Coca-Cola Diet Coke Dr.Pepper Pepsi Sprite Total Relative Frequency .38 .16 .10 .26 .10 1.00 Percent Frequency 38 16 10 26 10 100 To get relative frequency divide number by total Objective 2: Draw a bar chart and pie chart, and provide basic interpretations of them. Bar Chart - A graphical display for depicting categorical data summarized in a frequency, relative frequency distribution. • One one axis of the chart(usually the horizontal axis),. We specify the labels that are used for classes (categories). • A frequency, relative frequency, or percent frequency scale can be used for the other axis of the chart (usually the vertical axis) Pie Chart - A graphical display for presenting relative frequency and percent frequency distributions for categorical data. Additional Notes 1. Often the number of classes in a frequency distribution is the same number as the categories found in data. 2. Most statisticians recommend that classes with smaller frequencies be grouped into an aggregate class called “other”. 3. The sum of frequencies in any frequency distribution always equals the number of observations. 4. The sum of the relative frequencies in any relative frequency distribution always equals 1.00, and the sum of the percentage in a percentage frequency distribution always equals 100. Objective 3: Describe the three steps in defining class intervals for a frequency distribution for a quantitative variable. A frequency distribution is a tabular summary of data showing the number (frequency) of observations in each of several no overlapping categories or classes. Three necessary steps to define the classes for a frequency distribution with quantitative data are: 1. Determine the number of non overlapping classes 2. Determine the width of each classes 3. Determine the class limits Number of Classes - Classes are formed by specifying ranges that will be used to group the data. -The goal is to use enough classes to show a variation of data. Width of the Classes - The second in constructing frequency distribution for quantitative data is to choose width for the classes. -Same width for each classes is recommended -Larger number of classes means smiler width of classes and vice versa. -To determine an approximate class width, we begin by identifying the largest and smallest data Values. Then, with the desired number of classes specified, we can use the following expression to determine the approximate class width. Approximate Class Width = Largest Data Value - Smallest Data Value Number of Classes Class Limits - Class limits must be chosen so that each data item belongs to one and only one class. -The lower class limit identifies the smallest possible data value assigned to the class. The upper class limit identifies the largest possible data value assigned to the class. Objective 4: Define cumulative frequency and cumulative relative frequency distribution, and provide basic interpretations of them. We define relative frequency and percent frequency distribution for quantitative data in the same manners as for categorical data. First recall the relative frequency is the proportion of the observations belonging to a class. N= O bservations Frequency of the Class Relative Frequency of a Class = N The percent frequency of a class is the relative frequency multiplied by 100. Example: Relative Frequency is .40 the percent frequency would be 40%. Objective 5: Draw a histogram for a given frequency distribution or relative frequency distribution, and provide a basic interpretation of it. Dot Plot - One of the sample graphical summaries of data is a dot plot . -A horizontal axis shows the range for the data. Each data value represented by a dot placed above the axis. -dot plot shows the details of the data and are useful for comparing distributions of the data’s for the two out more variables. Histogram - A common graphical display of quantitative data is histogram. The graphical display can be prepared for data previously summarized either a frequency, relative frequency, or percent frequency distribution. -A histogram is constructed placing the variable of interest on the horizontal axis and the frequency , relative frequency, or percent frequency on the vertical axis. -The frequency, relative frequency, or percent frequency of each class is shown by drawing a rectangle whose base is determined by the class limits on the horizontal axis a and whose height corresponding frequency, relative frequency, or percent frequency. Differing Levels of Skewness Panel A: A histogram is said to be skewed to the left if its tail extends farther to the left. Panel B: A histogram is said to be skewed to the right if its tail extends to the right. Panel C: A symmetric histogram, the left tail mirrors the shape of the right tail. Panel D: Data from applications in business and economics often lead to the histograms that are skewed to the right. Example: Housing prices, salaries, purchases amounts. Objective 6: Describe how stem-and-leaf display is constructed, and list some of the advantages and disadvantages of the display as compared to a histogram. Cumulative Distributions A variation of frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution. The cumulative frequency distribution uses the number of classes, class widths, and class limits developed for the frequency distribution. Rather than showing frequency of each class, the cumulative frequency of distribution shows the number of data items with values less that or equal to the upper class limit of each class. - Stem-and-Leaf Display A stem-and-leaf display is a graphical display used to show simultaneously the rank order and shape of a distribution of a data. -To illustrate the use of a stem-and-leaf display, Objective 7: Explain the values of scatter diagrams Scatter Diagrams and Trend line Scatter Diagram - A graphical display of the relationship between two quantitative variables. Trtendline - A line that provides and approximation of the relationship. Side-by-Side and Stacked Bar Charts Side-by-side bar chart is a graphical display for depicting multiple bar charts on the same display. Additional Notes 1. Time series is a sequence of observation on a variable measured at a successive points in time or over successive periods of time. 2. A scatter diagram in which the value of time is shown on the horizontal axis and the time series values shown on the vertical axis is referred to in time series analysis as a time series pot. 3. A stacked bar chart can also be used to display frequencies rather that percentage frequencies.