History of statistics The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of state" (then called political arithmetic in English). It acquired the meaning of the collection and classification of data generally in the early 19th century. It was introduced into English in 1791 by Sir John Sinclair when he published the first of 21 volumes titled Statistical Account of Scotland. Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide frequently updated information about the population. The first book to have 'statistics' in its title was "Contributions to Vital Statistics" (1845) by Francis GP Nelson, actuary to the Medical Invalid and General Life Office. in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states. The evolution of statistics was, in particular, intimately connected with the development of European states following the peace of Westphalia (1648), and with the development of probability theory, which put statistics on a firm theoretical basis. In early times, the meaning was restricted to information about states, particularly demographics such as population. This was later extended to include all collections of information of all types, and later still it was extended to include the analysis and interpretation of such data. In modern terms, "statistics" means both sets of collected information, as in national accounts and temperature record, and analytical work which requires statistical inference. Statistical activities are often associated with models expressed using [probability probabilities], hence the connection with probability theory. The large requirements of data processing have made statistics a key application of computing. A number of statistical concepts have an important impact on a wide range of sciences. These include the design of experiments and approaches to statistical inference such as Bayesian inference, each of which can be considered to have their own sequence in the development of the ideas underlying modern statistics. Development of modern statistics Although the origins of statistical theory lie in the 18th-century advances in probability, the modern field of statistics only emerged in the late-19th and early-20th century in three stages. The first wave, at the turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. The second wave of the 1910s and 20s was initiated by William Sealy Gusset, and reached its culmination in the insights of Ronald Fisher. This involved the development of better design of experiments models, hypothesis testing and techniques for use with small data samples. The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between Egon Pearson and Jerzy Neymar in the 1930s. Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology. Statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative"). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. The earliest writings on probability and statistics, statistical methods drawing from probability theory, date back to Arab mathematicians and cryptographers, notably Al-Khalil (717–786) and Al-Kandi (801– 873). In the 18th century, statistics also started to draw heavily from calculus. In more recent years statistics has relied more on statistical software.[1 Introduction Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of data, or as a branch of mathematics. Some consider statistics to be a distinct mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty and decision making in the face of uncertainty. In applying statistics to a problem, it is common practice to start with a population or process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Ideally, statisticians compile data about the entire population (an operation called census). This may be organized by governmental statistical institutes. Descriptive statistics can be used to summarize the population data. Numerical descriptors include mean and standard deviation for continuous data (like income), while frequency and percentage are more useful in terms of describing categorical data (like education). When a census is not feasible, a chosen subset of the population called a sample is studied. Once a sample that is representative of the population is determined, data is collected for the sample members in an observational or experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, drawing the sample contains an element of randomness; hence, the numerical descriptors from the sample are also prone to uncertainty. To draw meaningful conclusions about the entire population, inferential statistics is needed. It uses patterns in the sample data to draw inferences about the population represented while accounting for randomness. These inferences may take the form of answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation), and modeling relationships within the data (for example, using regression analysis). Inference can extend to forecasting, prediction, and estimation of unobserved values either in or associated with the population being studied. It can include extrapolation and interpolation of time series or spatial data, and data mining. Statistical data Sampling When full census data cannot be collected, statisticians collect sample data by developing specific experiment designs and survey samples. Statistics itself also provides tools for prediction and forecasting through statistical models. To use a sample as a guide to an entire population, it is important that it truly represents the overall population. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. A major problem lies in determining the extent that the sample chosen is actually representative. Statistics offers methods to estimate and correct for any bias within the sample and data collection procedures. There are also methods of experimental design for experiments that can lessen these issues at the outset of a study, strengthening its capability to discern truths about the population. Null hypothesis and alternative hypothesis Interpretation of statistical information can often involve the development of a null hypothesis which is usually (but not necessarily) that no relationship exists among variables or that no change occurred over time. The best illustration for a novice is the predicament encountered by a criminal trial. The null hypothesis, H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained unless H1 is supported by evidence "beyond a reasonable doubt". However, "failure to reject H0" in this case does not imply innocence, but merely that the evidence was insufficient to convict. So, the jury does not necessarily accept H0 but fails to reject H0. While one cannot "prove" a null hypothesis, one can test how close it is to being true with a power test, which tests for type II errors. What statisticians call an alternative hypothesis is simply a hypothesis that contradicts the null hypothesis. Uses of Statistics In Our Day to Day Life 1: government to check out what education schedule can be beneficial for students. 2: It will help us to get prepared for an emergency. 3: It also helps the political parties to know how many candidates are in their support in a particular voting zone. 4: Statistics help the sport person to get the idea about his/her performance in the particular sports 5: The figures help us make predictions about something that is going to happen in the future. Based on what we face in our daily lives, we make predictions. 6: It also helps us to understand how many have died from the same disease. 7: It also helps the investor to take the decision of investment in the particular stock. 8; with the help of new input sensory, the robots continuously update themselves and give priority to the current actions. 9; They will help you out in clear your doubts regarding statistics. And help you to make the best statistics assignment. 10; The uses of statistics in research play an essential role in the work of researchers. Characteristics of a Good Questionnaire 1 Short in terms of the number and length of questions 2 Simple and easily understood 3 Avoid personal and sensitive matters 4 Elicit the data and information required 5 Questions must be orderly and systematically laid out 6 Clear instructions with respect to completing the questionnaire 7 Avoid ambiguous questions 8 Questions must not lead to a certain answer Data collection the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research component in all study fields, including physical and social sciences, humanities,[2] and business. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed.Data collection and validation consists of four steps when it involves taking a census and seven steps when it involves sampling[3 Data integrity issues[6][edit] The main reason for maintaining data integrity is to support the observation of errors in the data collection process. Those errors may be made intentionally (deliberate falsification) or non-intentionally (random or systematic errors). There are two approaches that may protect data integrity and secure scientific validity of study results invented by Craddick, Crawford, Rhodes, Redican, Rukenbrod and Laws in 2003: Quality assurance – all actions carried out before data collection Quality control – all actions carried out during and after data collection Quality assurance[edit] Further information: quality assurance Its main focus is prevention which is primarily a cost-effective activity to protect the integrity of data collection. Standardization of protocol best demonstrates this cost-effective activity, which is developed in a comprehensive and detailed procedures manual for data collection. The risk of failing to identify problems and errors in the research process is evidently caused by poorly written guidelines. Listed are several examples of such failures: Uncertainty of timing, methods and identification of the responsible person Partial listing of items needed to be collected Vague description of data collection instruments instead of rigorous step-by-step instructions on administering tests Failure to recognize exact content and strategies for training and retraining staff members responsible for data collection Unclear instructions for using, making adjustments to, and calibrating data collection equipment No predetermined mechanism to document changes in procedures that occur during the investigation Quality control[edit] Further information: quality control Since quality control actions occur during or after the data collection all the details are carefully documented. There is a necessity for a clearly defined communication structure as a precondition for establishing monitoring systems. Uncertainty about the flow of information is not recommended as a poorly organized communication structure leads to lax monitoring and can also limit the opportunities for detecting errors. Quality control is also responsible for the identification of actions necessary for correcting faulty data collection practices and also minimizing such future occurrences. A team is more likely to not realize the necessity to perform these actions if their procedures are written vaguely and are not based on feedback or education. Data collection problems that necessitate prompt action: Systematic errors Violation of protocol Fraud or scientific misconduct Errors in individual data items Individual staff or site performance problems Data collection on z/OS[edit] z/OS is a widely used operating system for IBM mainframe. It is designed to offer a stable, secure, and continuously available environment for applications running on the mainframe. Operational data is data that z/OS system produces when it runs. This data indicates the health of the system and can be used to identify sources of performance and availability issues in the system. The analysis of operational data by analytics platforms provide insights and recommended actions to make the system work more efficiently, and to help resolve or prevent problems. IBM Z Common Data Provider collects IT operational data from z/OS systems, transforms it to a consumable format, and streams it to analytics platforms.[7] IBM Z Common Data Provider supports the collection of the following operational data:[8] System Management Facilities (SMF) data Log data from the following sources: o Job log, the output which is written to a data definition (DD) by a running job o z/OS UNIX log file, including the UNIX System Services system log (syslogd) o Entry-sequenced Virtual Storage Access Method (VSAM) cluster o z/OS system log (SYSLOG) o IBM Tivoli NetView for z/OS messages o IBM WebSphere Application Server for z/OS High Performance Extensible Logging (HPEL) log o IBM Resource Measurement Facility (RMF) Monitor III reports User application data, the operational data from users' own applications DMPs and data collection[edit] DMP is the abbreviation for data management platform. It is a centralized storage and analytical system for data. Mainly used by marketers, DMPs exist to compile and transform large amounts of data into discernible information.[9] Marketers may want to receive and utilize first, second and third-party data. DMPs enable this, because they are the aggregate system of DSPs (demand side platform) and SSPs (supply side platform). When in comes to advertising, DMPs are integral for optimizing and guiding marketers in future campaigns. This system and their effectiveness is proof that categorized, analyzed, and compiled data is far more useful than raw data. Commodity Commodity price Eba 200 Rice 300 Beans 180 Yam 150 Dodo 70 Sales Dodo; 70 Eba; 200 Yam; 150 Beans; 180 Rice; 300 Eba Rice Beans Yam Dodo 6) Class interval Class Boundaries frequency 1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 5.5 15.5 25.5 35.5 45.5 0 0 1 3 4 Cumulative frequency 0 0 1 4 8 51 - 60 61-70 71-80 81-90 91-100 55.5 65.5 75.5 85.5 95.5 7 5 3 3 2 15 20 23 26 28 ∑ = 28 Cumulative Frequency Curve Histogram