Faculty Name: Dr. M. Massarrat Ali Khan Course Name: Introduction to Statistics Email: mokhan@iba.edu.pk Week 1 Business Analytics: • What is business analytics? • Business analytics (BA) is a set of disciplines and technologies for solving business problems using data analysis, statistical models and other quantitative methods. It involves an iterative, methodical exploration of an organization's data, with an emphasis on statistical analysis, to drive decision-making. • Business analytics is the scientific process of transforming data into insight for making better decisions. • Business analytics is used for data-driven or fact-based decision making, which is often seen as more objective than other alternatives for decision making • Types of business analytics • Different types of business analytics include the following: • descriptive analytics, which tracks key performance indicators (KPIs) to understand the present state of a business; • predictive analytics, which analyzes trend data to assess the likelihood of future outcomes; and • prescriptive analytics, which uses past performance to generate recommendations for handling similar situations in the future. Descriptive Analytics • Descriptive analytics encompasses the set of techniques that describes what has happened inthe past. Examples are data queries, reports, descriptive statistics, data visualization including data dashboards, some data-mining techniques, and basic what-if spreadsheet models. >A data query is a request for information with certain characteristics from a database. >Data dashboards are collections of tables, charts, maps, and summary statistics that are updated as new data become available. >Data mining is the use of analytical techniques for better understanding patterns and relationships that exist in large data sets. For example, by analyzing text on social network platforms like Twitter, data-mining techniques (including cluster analysis and sentiment analysis) are used by companies to better understand their customers. Predictive Analytics: • Predictive analytics consists of techniques that use models constructed from past data to predict the future or ascertain the impact of one variable on another. For example, past data on product sales may be used to construct a mathematical model to predict future sales. >Linear regression, time series analysis, some data-mining techniques, and simulation, often referred to as risk analysis, all fall under the banner of predictive analytics. >Simulation involves the use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision. Prescriptive Analytics: • Prescriptive analytics differs from descriptive and predictive analytics in that prescriptive analytics indicates a course of action to take; that is, the output of a prescriptive model is a decision. >Predictive models provide a forecast or prediction, but do not provide a decision. However, a forecast or prediction, when combined with a rule, becomes a prescriptive model. What is Statistics? • Meaning of the word Statistics: There are three meaning of the word statistics: 1. Statistics( plural): Facts and figures itself is called statistics e.g. Import statistics, production statistics, results statistics 2. Statistics( plural): It is a subject of social science which deals with the collection, organization or presentation, analysis of data and interpretation the results about the population from the data based on samples for decision making. 3. Statistic(Singular): It is characteristic of sample in numerical values Statistics and Business Analytics: • Statistics is an important tool of Business Analytics. • The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data. The field of analytics is concerned with applying statistical methods to practical business problems. • Business analysts often use descriptive statistics to summarize data related to the finances of companies. • Statistics is the foundation of business analytics. Since we know that business analytics is a combination of computer science and statistics. There are various statistical methods which are the foundation of business analytics methods like sampling, hypothesis testing, correlation, regression and so forth. Big Data: • Big data is any set of data that is too large or too complex to be handled by standard data-processing techniques and typical desktop software. Big data refers to data that is so large, fast or complex that it's difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around for a long time. Big data analytics describes the process of uncovering trends, patterns, and correlations in large amounts of raw data to help make data-informed decisions. These processes use familiar statistical analysis techniques—like clustering and regression—and apply them to more extensive datasets with the help of newer tools. The 4 V’s of Big Data: • Volume >Because data are collected electronically, we are able to collect more of it. To be useful, these data must be stored, and this storage has led to vast quantities of data. Many companies now store in excess of 100 terabytes of data (a terabyte is 1,024 gigabytes). • Velocity >Real-time capture and analysis of data present unique challenges both in how data are stored, and the speed with which those data can be analyzed for decision making. For example, the New York Stock Exchange collects 1 terabyte of data in a single trading session, and having current data and real-time rules for trades and predictive modeling are important for managing stock portfolios • Variety >In addition to the sheer volume and speed with which companies now collect data, more complicated types of data are now available and are proving to be of great value to businesses. Text data are collected by monitoring what is being said about a company’s products or services on social media platforms such as Twitter. Audio data are collected from service calls (on a service call, you will often hear “this call may be monitored for quality control”). Video data collected by in-store video cameras are used to analyze shopping behavior. Analyzing information generated by these nontraditional sources is more complicated in part because of the processing required to transform the data into a numerical form that can be analyzed • Veracity >Veracity has to do with how much uncertainty is in the data. For example, the data could have many missing values, which makes reliable analysis a challenge. Inconsistencies in units of measure and the lack of reliability of responses in terms of bias also increase the complexity of the data. Key Terms In Statistics • Population: The totality of the data with which we are concerned is called population. • Sample: It is the subset or portion of the population. • Parameter: The characteristics of population in numerical terms is called parameter. e.g. Population A.M, the notation of which is µ • Statistic: The characteristics of sample in numerical terms is called statistic e.g. A.M of sample, the notation of which is 𝑥 Variable: It is a characteristic or measurement that can be determined for each member of a population . Variable is denoted by X or Y Variable may be Numerical and Categorical 1. Numerical: Numerical variable take one value with equal unit of measurement such a weight in pounds, time in hours etc. Numerical Variable are of two types: Discrete: A variable whose value is obtained by counting e.g. no of calls in a day Continuous: A continuous variable take all values in a given interval. The value of continuous variable is obtained by measuring e.g. temperature 20o to 30o . 2. Categorical: It is a variable that can take values on the basis of some qualitative property e.g. person affiliation with a political party. X PTI, PPP, NML. Type of Data Data: are the actual values of the variable. They may be numbers or words. So the actual data can be divided into: • Quantitative Data discrete and continuous variable • Qualitative Data Categorical or subjective variable Organizing Qualitative Data • Tabular Presentation • Frequency Distributions: A frequency distribution of qualitative data is a listing of the distinct values and their frequencies. • To Construct a Frequency Distribution of Qualitative Data Step 1 List the distinct values of the observations in the data set in the first column of a table. Step 2 For each observation, place a tally mark in the second column of the table in the row of the appropriate distinct value. Step 3 Count the tallies for each distinct value and record the totals in the third column of the table. Example Political party affiliations of the students in introductory statistics Table for constructing a frequency distribution for the political party affiliation Relative-frequency distribution • A relative-frequency distribution of qualitative data is a listing of the distinct values and their relative frequencies. • To Construct a Relative-Frequency Distribution of Qualitative Data Step 1 Obtain a frequency distribution of the data. Step 2 Divide each frequency by the total number of observations. Example • Figure out your first relative frequency by dividing the count by the total. For the category of dogs we have 16 out of 56, so 16/56=0.29 Graphical Presentation of Qualitative Data Pie Charts Pie Charts Another method for organizing and summarizing data is to draw a picture of some kind. The old saying “a picture is worth a thousand words” has particular relevance in statistics—a graph or chart of a data set often provides the simplest and most efficient display. Two common methods for graphically displaying qualitative data are pie charts and bar charts -A pie chart is a disk divided into wedge-shaped pieces proportional to the relative frequencies of the qualitative data. • To Construct a Pie Chart Step 1 Obtain a relative-frequency distribution of the data. Step 2 Divide a disk into wedge-shaped pieces proportional to the relative frequencies. Step 3 Label the slices with the distinct values and their relative frequencies. Example of Pie Chart: Political Party Affiliations of the students Bar Chart • Bar Charts : Another graphical display for qualitative data is the bar chart. - A bar chart displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percent) of those values on a vertical axis. The relative frequency of each distinct value is represented by a vertical bar whose height is equal to the relative frequency of that value. The bars should be positioned so that they do not touch each other. Example of Bar Chart: Organizing Quantitative Data Quantitative Data Tabular Presentation • Frequency Distribution Table -To organize quantitative data, we first group the observations into classes (also known as categories or bins) and then treat the classes as the distinct values of qualitative data. Consequently, once we group the quantitative data into classes, we can construct frequency and relative-frequency distributions of the data in exactly the same way as we did for qualitative data. Three important guidelines for grouping quantitative data into classes are: 1. The number of classes should be small enough to provide an effective summa large enough to display the relevant characteristics of the data. A rule of thum the number of classes should be between 5 and 20. Guide Line for number of classes: Number of Classes = 1 + 3.3 log N If N=100 No. of classes = 1+ 3.3 log 100 = 1+3.3x2.0 = 8 2. Each observation must belong to only one class. 3. Whenever feasible, all classes should have the same width. Roughly speaking possible, all classes should cover the same number of possible values. • Class Width = Range/No. of Classes Scales of Measurement Data collection requires one of the following scales of measurement: nominal, ordinal, interval, or ratio. The scale of measurement determines the amount of information contained in the data and indicates the most appropriate data summarization and statistical analyses. When the data for a variable consist of labels or names used to identify an attribute of the element, the scale of measurement is considered a nominal scale. For example, referring to the data in Table 1.1, the scale of measurement for the WTO Status variable is nominal because the data “member” and “observer” are labels used to identify the status category for the nation. In cases where the scale of measurement is nominal, a numerical code as well as a non numerical label may be used. For example, to facilitate data collection and to prepare the data for entry into a computer database, we might use a numerical code for the WTO Status variable by letting 1 denote a member nation in the World Trade Organization and 2 denote an observer nation. The scale of measurement is nominal even though the data appear as numerical values. • The scale of measurement for a variable is considered an ordinal scale if the data exhibit the properties of nominal data and in addition, the order or rank of the data is meaningful. • For example, referring to the data in Table 1.1, the scale of measurement for the Fitch Rating is ordinal because the rating labels, which range from AAA to F, can be rank ordered from best credit rating (AAA) to poorest credit rating (F). The rating letters provide the labels similar to nominal data, but in addition, the data can also be ranked or ordered based on the credit rating, which makes the measurement scale ordinal. • Ordinal data can also be recorded by a numerical code, for example, your class rank in school. • The scale of measurement for a variable is an interval scale if the data have all the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numerical. College admission SAT scores are an example of interval-scaled data. For example, three students with SAT math scores of 620, 550, and 470 can be ranked or ordered in terms of best performance to poorest performance in math. In addition, the differences between the scores are meaningful. For instance, student 1 scored 620 − 550 = 70 points more than student 2, while student 2 scored 550 − 470 = 80 points more than student 3. • The scale of measurement for a variable is a ratio scale if the data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale of measurement. This scale requires that a zero value be included to indicate that nothing exists for the variable at the zero point. For example, consider the cost of an automobile. A zero value for the cost would indicate that the automobile has no cost and is free. In addition, if we compare the cost of $30,000 for one automobile to the cost of $15,000 for a second automobile, the ratio property shows that the first automobile is $30,000/$15,000 = 2 times, or twice, the cost of the second automobile. Quantitative & Qualitative Variables • 2. Comparing Tablet Computers. Tablet PC Comparison provides a wide variety of • information about tablet computers. The company’s website enables consumers to easily compare different tablets using factors such as cost, type of operating system, display size, battery life, and CPU manufacturer. A sample of 10 tablet computers is shown in Table 1.6 (Tablet PC Comparison website). • a. How many elements are in this data set? • b. How many variables are in this data set? • c. Which variables are categorical and which variables are quantitative? • d. What type of measurement scale is used for each of the variables? • Data The facts and figures collected, analyzed, and summarized for presentation and interpretation. • Data mining The process of using procedures from statistics and computer science to extract useful information from extremely large databases. • Elements The entities on which data are collected. • Observation The set of measurements obtained for a particular element. • Nominal scale The scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric. Scale of Measurement: • Nominal scale The scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric. • Ordinal scale The scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. Ordinal data may be nonnumeric or numeric. • Interval scale The scale of measurement for a variable if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric. • Ratio scale The scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric. The following table provides a summary of the variables in each measurement scale: Property Nominal Ordinal Interval Ratio Has a natural “order” NO YES YES YES Mode can be calculated YES YES YES YES YES YES YES Mean can be calculated YES YES Exact difference between values YES YES Median can be calculated Has a “true zero” value YES