# Introduction to probability and statistics: Probability is the ```Introduction to probability and statistics:
Probability is the expectation of something to occur. In probability theory The Law of the Large
Numbers needs almost infinitely many experiments.
In more practical statistics the number of experiments is much smaller than in probability theory.
In statistics the probability is the relative frequency.
The students are required to explain why statistics is needed by them personally.
The students are encouraged to produce something publishable related to statistics.
Statistics is the most important knowledge in life because statistics is the best in making the right
decisions.
At financial markets people make billions of dollars by knowing statistics.
Our rector has written a great book about introduction to statistics, history of statistics, future of
statistics, etc.
A random variable is impossible or too hard to predict but there may be patterns even in the random
variable if it is not chaotic and falls into some probability distribution function.
The main types of statistical data and the random variables are continuous and discrete random
variables.
The simplest distribution function is a uniform distribution for which each outcome is equally likely. The
uniform distribution can be approximately generated using the mod function. Excel program can
generate uniformly distributed pseudorandom numbers. Digits in irrational numbers are uniformly
distributed. The answers to the exams and tests are often uniformly distributed. Uniform distribution is
similar to the chaos (no clear distribution) because in chaos every outcome is often almost equally likely
to occur.
The Central Limit Theorem says that if many random variables are added up, the result is normal or
binomial distribution regardless of the distributions of each of the added random variable. This is due to
the cancellation of many different factors which results in the higher frequencies of the average than
the marginal values.
Analyzing the sample of heights of approximately 20 students shows that the random variable follows
binomial or normal distribution.
Tossing several coins gives binomial distribution because each coin is uniformly distributed and
according to the Central Limit Theorem the resulting distribution must be binomial.
Averages (mean, median and more) are probably the most important in statistics, especially for
binomial and normal distributions, for which mean, median and mode are the same.
Normal distribution function is similar to arctangent but different. The real normal distribution density
function is express through the natural exponent. The distribution function is the anti-derivative or
indefinite integral (cumulative) of the distribution density function.
It is all about confidence interval, outliers, 3-sigma rule and the tails. The distributions with heavy tails
are complex problems in practical applications. In the practical cases, the 1.5 of the interquartile range
rule is used since 1977.
Not all random variables are normally or binomially distributed: Benford’s law says that the first digit of
a number set is 1 with the probability of approximately 30%, 2 – 18%, 3 – less than 18%, etc.: the larger
in the first digit of the number - the lower is the probability. In the USA the data must be legally checked
using the Benford’s to avoid fraud.
Justifying the sample for the whole population is a significant problem which does not have an easy
solution. Usually, education, age and gender of the people are very significant in justifying the samples.
It is possible to simulate many distribution functions on a computer and to use them to simulate the
outcomes of complex events (games, etc.) using Monte Carlo simulation. It is used to model soccer
games, financial markets, in science, etc.
```