Uploaded by joshbeeola2000

assignment stat one

advertisement
History of statistics
The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and
the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced by Gottfried
Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of
state" (then called political arithmetic in English). It acquired the meaning of the collection and
classification of data generally in the early 19th century. It was introduced into English in 1791 by Sir
John Sinclair when he published the first of 21 volumes titled Statistical Account of Scotland.
Thus, the original principal purpose of Statistik was data to be used by governmental and (often
centralized) administrative bodies. The collection of data about states and localities continues, largely
through national and international statistical services. In particular, censuses provide frequently
updated information about the population.
The first book to have 'statistics' in its title was "Contributions to Vital Statistics" (1845) by Francis GP
Nelson, actuary to the Medical Invalid and General Life Office.
in the modern sense of the word, began evolving in the 18th century in response to the novel needs of
industrializing sovereign states. The evolution of statistics was, in particular, intimately connected with
the development of European states following the peace of Westphalia (1648), and with the
development of probability theory, which put statistics on a firm theoretical basis.
In early times, the meaning was restricted to information about states, particularly demographics such
as population. This was later extended to include all collections of information of all types, and later still
it was extended to include the analysis and interpretation of such data. In modern terms, "statistics"
means both sets of collected information, as in national accounts and temperature record, and
analytical work which requires statistical inference. Statistical activities are often associated with models
expressed using [probability probabilities], hence the connection with probability theory. The large
requirements of data processing have made statistics a key application of computing. A number of
statistical concepts have an important impact on a wide range of sciences. These include the design of
experiments and approaches to statistical inference such as Bayesian inference, each of which can be
considered to have their own sequence in the development of the ideas underlying modern statistics.
Development of modern statistics
Although the origins of statistical theory lie in the 18th-century advances in probability, the modern field
of statistics only emerged in the late-19th and early-20th century in three stages. The first wave, at the
turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed statistics
into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics
as well. The second wave of the 1910s and 20s was initiated by William Sealy Gusset, and reached its
culmination in the insights of Ronald Fisher. This involved the development of better design of
experiments models, hypothesis testing and techniques for use with small data samples. The final wave,
which mainly saw the refinement and expansion of earlier developments, emerged from the
collaborative work between Egon Pearson and Jerzy Neymar in the 1930s. Today, statistical methods are
applied in all fields that involve decision making, for making accurate inferences from a collated body of
data and for making decisions in the face of uncertainty based on statistical methodology.
Statistics
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and
presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional
to begin with a statistical population or a statistical model to be studied. Populations can be diverse
groups of people or objects such as "all people living in a country" or "every atom composing a crystal".
Statistics deals with every aspect of data, including the planning of data collection in terms of the design
of surveys and experiments.
When census data cannot be collected, statisticians collect data by developing specific experiment
designs and survey samples. Representative sampling assures that inferences and conclusions can
reasonably extend from the sample to the population as a whole. An experimental study involves taking
measurements of the system under study, manipulating the system, and then taking additional
measurements using the same procedure to determine if the manipulation has modified the values of
the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data
from a sample using indexes such as the mean or standard deviation, and inferential statistics, which
draw conclusions from data that are subject to random variation (e.g., observational errors, sampling
variation). Descriptive statistics are most often concerned with two sets of properties of
a distribution (sample or population): central tendency (or location) seeks to characterize the
distribution's central or typical value, while dispersion (or variability) characterizes the extent to which
members of the distribution depart from its center and each other. Inferences on mathematical
statistics are made under the framework of probability theory, which deals with the analysis of random
phenomena.
A standard statistical procedure involves the collection of data leading to test of the
relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized
model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is
compared as an alternative to an idealized null hypothesis of no relationship between two data sets.
Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which
the null can be proven false, given the data that are used in the test. Working from a null hypothesis,
two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false
positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between
populations is missed giving a "false negative"). Multiple problems have come to be associated with this
framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.
Measurement processes that generate statistical data are also subject to error. Many of these errors are
classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when
an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result
in biased estimates and specific techniques have been developed to address these problems.
The earliest writings on probability and statistics, statistical methods drawing from probability theory,
date back to Arab mathematicians and cryptographers, notably Al-Khalil (717–786) and Al-Kandi (801–
873). In the 18th century, statistics also started to draw heavily from calculus. In more recent years
statistics has relied more on statistical software.[1
Introduction
Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or
explanation, and presentation of data, or as a branch of mathematics. Some consider statistics to be a
distinct mathematical science rather than a branch of mathematics. While many scientific investigations
make use of data, statistics is concerned with the use of data in the context of uncertainty and decision
making in the face of uncertainty.
In applying statistics to a problem, it is common practice to start with a population or process to be
studied. Populations can be diverse topics such as "all people living in a country" or "every atom
composing a crystal". Ideally, statisticians compile data about the entire population (an operation
called census). This may be organized by governmental statistical institutes. Descriptive statistics can be
used to summarize the population data. Numerical descriptors include mean and standard
deviation for continuous data (like income), while frequency and percentage are more useful in terms of
describing categorical data (like education).
When a census is not feasible, a chosen subset of the population called a sample is studied. Once a
sample that is representative of the population is determined, data is collected for the sample members
in an observational or experimental setting. Again, descriptive statistics can be used to summarize the
sample data. However, drawing the sample contains an element of randomness; hence, the numerical
descriptors from the sample are also prone to uncertainty. To draw meaningful conclusions about the
entire population, inferential statistics is needed. It uses patterns in the sample data to draw inferences
about the population represented while accounting for randomness. These inferences may take the
form of answering yes/no questions about the data (hypothesis testing), estimating numerical
characteristics of the data (estimation), describing associations within the data (correlation), and
modeling relationships within the data (for example, using regression analysis). Inference can extend
to forecasting, prediction, and estimation of unobserved values either in or associated with the
population being studied. It can include extrapolation and interpolation of time series or spatial data,
and data mining.
Statistical data
Sampling
When full census data cannot be collected, statisticians collect sample data by developing
specific experiment designs and survey samples. Statistics itself also provides tools for prediction and
forecasting through statistical models.
To use a sample as a guide to an entire population, it is important that it truly represents the overall
population. Representative sampling assures that inferences and conclusions can safely extend from the
sample to the population as a whole. A major problem lies in determining the extent that the sample
chosen is actually representative. Statistics offers methods to estimate and correct for any bias within
the sample and data collection procedures. There are also methods of experimental design for
experiments that can lessen these issues at the outset of a study, strengthening its capability to discern
truths about the population.
Null hypothesis and alternative hypothesis
Interpretation of statistical information can often involve the development of a null hypothesis which is
usually (but not necessarily) that no relationship exists among variables or that no change occurred over
time.
The best illustration for a novice is the predicament encountered by a criminal trial. The null hypothesis,
H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the
defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in
opposition to H1 and is maintained unless H1 is supported by evidence "beyond a reasonable doubt".
However, "failure to reject H0" in this case does not imply innocence, but merely that the evidence was
insufficient to convict. So, the jury does not necessarily accept H0 but fails to reject H0. While one cannot
"prove" a null hypothesis, one can test how close it is to being true with a power test, which tests
for type II errors.
What statisticians call an alternative hypothesis is simply a hypothesis that contradicts the null
hypothesis.
Uses of Statistics In Our Day to Day Life
1: government to check out what education schedule can be beneficial for students.
2: It will help us to get prepared for an emergency.
3: It also helps the political parties to know how many candidates are in their support in a particular
voting zone.
4: Statistics help the sport person to get the idea about his/her performance in the particular sports
5: The figures help us make predictions about something that is going to happen in the future. Based on
what we face in our daily lives, we make predictions.
6: It also helps us to understand how many have died from the same disease.
7: It also helps the investor to take the decision of investment in the particular stock.
8; with the help of new input sensory, the robots continuously update themselves and give priority to
the current actions.
9; They will help you out in clear your doubts regarding statistics. And help you to make the best
statistics assignment.
10; The uses of statistics in research play an essential role in the work of researchers.
Characteristics of a Good Questionnaire
1 Short in terms of the number and length of questions
2 Simple and easily understood
3 Avoid personal and sensitive matters
4 Elicit the data and information required
5 Questions must be orderly and systematically laid out
6 Clear instructions with respect to completing the questionnaire
7 Avoid ambiguous questions
8 Questions must not lead to a certain answer
Data collection
the process of gathering and measuring information on targeted variables in an established system,
which then enables one to answer relevant questions and evaluate outcomes. Data collection is
a research component in all study fields, including physical and social
sciences, humanities,[2] and business. While methods vary by discipline, the emphasis on ensuring
accurate and honest collection remains the same. The goal for all data collection is to capture quality
evidence that allows analysis to lead to the formulation of convincing and credible answers to the
questions that have been posed.Data collection and validation consists of four steps when it involves
taking a census and seven steps when it involves sampling[3
Data integrity issues[6][edit]
The main reason for maintaining data integrity is to support the observation of errors in the data
collection process. Those errors may be made intentionally (deliberate falsification) or non-intentionally
(random or systematic errors).
There are two approaches that may protect data integrity and secure scientific validity of study results
invented by Craddick, Crawford, Rhodes, Redican, Rukenbrod and Laws in 2003:


Quality assurance – all actions carried out before data collection
Quality control – all actions carried out during and after data collection
Quality assurance[edit]
Further information: quality assurance
Its main focus is prevention which is primarily a cost-effective activity to protect the integrity of data
collection. Standardization of protocol best demonstrates this cost-effective activity, which is developed
in a comprehensive and detailed procedures manual for data collection. The risk of failing to identify
problems and errors in the research process is evidently caused by poorly written guidelines. Listed are
several examples of such failures:






Uncertainty of timing, methods and identification of the responsible person
Partial listing of items needed to be collected
Vague description of data collection instruments instead of rigorous step-by-step instructions on
administering tests
Failure to recognize exact content and strategies for training and retraining staff members
responsible for data collection
Unclear instructions for using, making adjustments to, and calibrating data collection equipment
No predetermined mechanism to document changes in procedures that occur during the
investigation
Quality control[edit]
Further information: quality control
Since quality control actions occur during or after the data collection all the details are carefully
documented. There is a necessity for a clearly defined communication structure as a precondition for
establishing monitoring systems. Uncertainty about the flow of information is not recommended as a
poorly organized communication structure leads to lax monitoring and can also limit the opportunities
for detecting errors. Quality control is also responsible for the identification of actions necessary for
correcting faulty data collection practices and also minimizing such future occurrences. A team is more
likely to not realize the necessity to perform these actions if their procedures are written vaguely and
are not based on feedback or education.
Data collection problems that necessitate prompt action:





Systematic errors
Violation of protocol
Fraud or scientific misconduct
Errors in individual data items
Individual staff or site performance problems
Data collection on z/OS[edit]
z/OS is a widely used operating system for IBM mainframe. It is designed to offer a stable, secure, and
continuously available environment for applications running on the mainframe. Operational data is data
that z/OS system produces when it runs. This data indicates the health of the system and can be used to
identify sources of performance and availability issues in the system. The analysis of operational data by
analytics platforms provide insights and recommended actions to make the system work more
efficiently, and to help resolve or prevent problems. IBM Z Common Data Provider collects IT
operational data from z/OS systems, transforms it to a consumable format, and streams it to analytics
platforms.[7]
IBM Z Common Data Provider supports the collection of the following operational data:[8]



System Management Facilities (SMF) data
Log data from the following sources:
o Job log, the output which is written to a data definition (DD) by a running job
o z/OS UNIX log file, including the UNIX System Services system log (syslogd)
o Entry-sequenced Virtual Storage Access Method (VSAM) cluster
o z/OS system log (SYSLOG)
o IBM Tivoli NetView for z/OS messages
o IBM WebSphere Application Server for z/OS High Performance Extensible Logging (HPEL) log
o IBM Resource Measurement Facility (RMF) Monitor III reports
User application data, the operational data from users' own applications
DMPs and data collection[edit]
DMP is the abbreviation for data management platform. It is a centralized storage and analytical system
for data. Mainly used by marketers, DMPs exist to compile and transform large amounts of data into
discernible information.[9] Marketers may want to receive and utilize first, second and third-party data.
DMPs enable this, because they are the aggregate system of DSPs (demand side platform)
and SSPs (supply side platform). When in comes to advertising, DMPs are integral for optimizing and
guiding marketers in future campaigns. This system and their effectiveness is proof that categorized,
analyzed, and compiled data is far more useful than raw data.
Commodity
Commodity price
Eba
200
Rice
300
Beans
180
Yam
150
Dodo
70
Sales
Dodo; 70
Eba; 200
Yam; 150
Beans; 180
Rice; 300
Eba
Rice
Beans
Yam
Dodo
6)
Class interval
Class Boundaries
frequency
1 - 10
11 - 20
21 - 30
31 - 40
41 - 50
5.5
15.5
25.5
35.5
45.5
0
0
1
3
4
Cumulative
frequency
0
0
1
4
8
51 - 60
61-70
71-80
81-90
91-100
55.5
65.5
75.5
85.5
95.5
7
5
3
3
2
15
20
23
26
28
∑ = 28
Cumulative Frequency Curve
Histogram
Download