Uploaded by rqxuheebljhrhytqsd

statistics-wk-1-5-lecture-notes-mid-term

advertisement
lOMoARcPSD|7884987
Statistics Wk 1-5 Lecture Notes (Mid-Term)
Business Statistics (University of Technology Sydney)
StuDocu is not sponsored or endorsed by any college or university
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Statistics Lecture Notes
Stats: Week 1 Lecture
Internal Statistics
- About 2 key ideas:
1) The science of samples → key idea of sampling distribution
2) Choosing between rare events and reinterpretation
● EG) Coin toss 50 x and unbiased. If there are 50 tails, it could be a rare event (unbiased coins
will do this rarely). OR, you may doubt that its unbiased
Descriptive Statistics
Data Types
Qualitative/ Categorical Data
- Non-numerical data
- Categories with mutually exclusive labels (1 label can’t mean 2 things)
- If labeled with numbers, they have no mathematical meaning
Nominal Data
-
Ordinal Data
Numbers have no mathematical meaning (just
used as labels)
● female & male. 0=female, 1=male
● 1=pop, 2=EDM, 3=rap
-
Ordering/ranking has or can be interpreted
Numerical labels → ordering
● Dissatisfied, neutral, satisfied
-1 = Dissatisfied
0 = Neutral
1 = Satisfied
Quantitative/ Numerical Data
- Numbers are used to record certain events
- Have mathematical meaning
Interval Data
-
-
Ratio Data
(temperature)
Quantity in difference is meaningful, BUT in ratio
is not
● 30℃ is 15℃ higher than 15℃, but not 2x
warmer than 15℃
0 has no natural meaning → hard to interpret 0
● 0℃ doesn't mean there is no heat
-
-
EG) Y in dollars, height, waiting time in mins
Besides difference, ratio of 2 quantities IS
meaningful
● Lucy earns twice as much as her
husband
0 is meaningful
● The waiting time at the cinema was
0mins → don't need to wait
Working with Categorical data
- Imperative to put data into a table and visualise it
● Commonly used technique called Frequency Distribution
● You use Frequency Counts: the total number of occurrences for each category
● Relative Frequency: the fraction of the total number of data items belonging to the category
● % Frequency: relative frequency x 100 (%)
- EG) people who are single. Frequency: 102
Relative Frequency: 0.1262 (102 ÷ total 808)
% Frequency: 12.62% (0.1262 x 100)
- To visualise data, we can use a Histogram or Pie Chart
- In Histograms:
● Categories should be on the x-axis
● Frequency, relative frequency (which equals probability) or % frequency should be on y-axis
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
The Language of Statistics
Random Variable (r.v)
A variable whose values are uncertain before you collect them
● Height
● Height of females 25-40
Population
The complete pool of a cartian random variable
● All human heights
● Height of all females 25-40
Sample
A random collection of a curtain size from the population
● The height of 50 randomly chosen people
● Heights of 50 randomly chosen females aged 25-40
Probability Distribution
The general shape of probability for values that a random variable may assume
● What the most common number of children in a family/ range of values
-
-
-
Random variables (r.v) are usually named by X, Y (capital letters)
● X: number of kids in household
● Y: amount of time spend on housework per day by dad
Realisations/ observations of a r.v are named by xi, yi (lowercase), with subscripts iE (1,2,..., N) or iE….
● x1: number of kids in household
● y137: amount of time dad 137 spend on housework per
N and n denote the size or number of observations
● N: referred to the population size (usually large)
● n: denotes the sample size (number of data points we collect in a sample)
Central Tendency
-
-
Definition: measures of central tendency yield information about the centre of distribution of a r.v. They give
us some idea about what a typical, middle or average value that a r.v can take. Sometimes called measures
of location
There's 3 measures of Central Tendency
● Mean: average value
● Median: middle value in an ordered array
● Mode: the most commonly occurring value
Mean
- We can talk about either population mean or sample mean. If we denote the r.v. by 𝑋, we have:
EXAMPLE: r.v. is the height of females aged between 25 and 40. John has a sample of randomly chosen females
aged 25 and 40, heights are 157cm, 163cm, 166cm, 148cm, 174cm, 165cm, 168cm. So...
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Mode & Median
- The Mode is the most commonly occurring value
- The Median is the middle value in an ordered array
EXAMPLE: Waiting times of people in a queue.We record the following: 2,3,3,3,4,2,2,2,2,3,3,3,1,1 minutes.
- Mode: 3 mins → occurs 6 times (more than any other waiting times)
-
Median: 2.5 mins → it lies in the middle of the 14 numbers 1,1,2,2,2,2,2,(
2+3
2
=2.5),3,3,3,3,3,3,4
EXAMPLE: Categorical / qualitative (Nominal or Ordinal) data
University major of employees. We label 1= marketing, 2= finance, 3= economics, 4= law, 5= others.
- We record the following: 2,5,3,1,4,2,5,3,4,2,1 (ordered: 1,1,2,2,2,3,3,4,4,5,5)Mode is 2, median is 3, mean is
2.9091.
- The only one which makes sense is the Mode (most occurring number), as this data isnt quantitative/
numerical (numbers don't have mathematical meaning)
Variability
-
-
Definition: Measures of variability yield information about the likelihood of a realisation of the r.v.is away from
the centre of its distribution. They give us some idea of fluctuation and volatility across realisations of the r.v..
They are sometimes called measures of scale,spread, dispersion, orrisk
Theres 3 measures of Variability:
● Variance (Var): average of squared distance from the mean
● Standard Deviation (std): square root of variance
● Coefficient of variation: std / mean x 100%
EXAMPLE: Consider two r.v.’s.
X: airflight ticket from Melbourne to Sydney at Jetstar
Y: airflight ticket from Melbourne to Sydney at Virgin Australia.
Variance
We can talk about either population variance or sample variance. If we denote the r.v. by 𝑋, we have:
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Variance: it computes the average squared distance between data points and their mean, depending on sample or
population.
EXAMPLE: X: waiting time of people in a queue (in minutes)
1) Why sum up or average out squared distance instead of distance?
Distance in different directions may cancel out, not suitable for measuring variability.
2) What is the unit? In minutes?
- Distance such as 𝑥1−𝜇=(12−9.93 )is in minutes.
- But squared distance such as (𝑥1−𝜇)2 = (12−9.93)2 is in minutes2! (2 is squared)
Standard Deviation
We can talk about either population standard deviation or sample standard deviation. If we denote the r.v. by 𝑋, we
have:
Standard deviation solves the problem of squared units. It has the same unit of the original data. In the our example,
we have
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
EXAMPLE: X, Y are the r.v.’s of time spent on work and leisure per day, respectively.
Coefficient of Variation (CV)
- We can talk about either population or sample Coefficient of Variation. If we denote the r.v. by 𝑋, we have:
-
It is unit free, because both the numerator and denominator have the same unit as the original data and they
cancel each other.
EXAMPLE: X: waiting time of people in a queue (in minutes)
-
CV is unit free. It measures standard deviation per unit of mean. In finance when the r.v. X denotes asset
returns, CV measures risk per unit of expected return.
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
Shape
USE EXCEL: everything can be computed with functions. In Google, search “how to calculate interquartile
range in excel?”
-
Central tendency and variability are useful to describe and summarise data ,or the distribution of r.v's
BUT cannot summarise Asymmetry. Skewness is a measure of Asymmetry
Calculating Skewness NOT EXAMINED
-
Central tendency, variability and skewness are useful to describe and summarise data, or the distribution of
r.v.’s
BUT cannot summarise tail behaviour. Kurtosis is a measure of heavy tail. (Calculating Kurtosis not
examined)
-
Leptokurtic distribution
Tall and thin. More probability mass in the centre and in the tails (heavy /fat tails).
Kurtosis >3 → exhibits excess kurtosis.
Mesokurtic distribution
Normal in shape. Kurtosis = 3 → it resembles normal
Platykurtic distribution
Flat and spread out. Less probability mass in the centre and in the tails (light /thintails).
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Kurtosis < 3.
Stats: Week 2 Lecture
Wk 1Recap
- Inferential Statistics is about
● The science of sampling
● Choosing between rare events and reinterpretation, EG)
1) Friend lets you down
2) Toss a coin 50 times and got 50 tails (claimed to be unbiased)
- The study of probability is crucial to interpret evidence, but it has many other uses too
Probability Theory
-
-
-
A prerequisite to stats is Probability Theory. We need to know what an event means & how we assign
probability
Event: a set of outcomes (can contain no outcome, single outcome or multiple outcomes) of an experiment to
which probability is assigned→ eg) head or tail
Sample Space: the set of all possible outcomes. So an event is the subset of a sample space
With observed outcomes, there are 2 methods of assigned probability
1) Classical: every outcome is assigned the SAME probability
● P (outcome 1) = … = P = (outcome n) = 1/n → EG) pulling an ace of spades from a deck of
cards
2) Relative Frequency: outcomes receive probability corresponding to their number of occurrences
● P(outcome 1) = number of occurrences of outcome i / TOTAL number occurrences of all
outcomes
Law of Addition
-
Marginal Probability: describes the probability outcomes associated with ONLY 1 Random Variable (r.v)
● EG) Day (weekends VS weekdays), or Price ($29 vs $49 vs $79) → without referring to other r.v’s
● P(weekends) and P($29)
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Orange = Marginal Probability
-
Joint Probability: describes the probability of outcomes associated with MORE THAN 1 Random Variable
(r.v)
● EG) Day & Price → P(weekdays and $29) and P(weekends and $49)
Purple = Joint Probability
Complement of Events
- Mathematically, we can define an event by A, & define the complement of the event by A’ (pronounced as A
prime) which means “not A”
- Complement rule of Probability: P(A) + P(A’) = 1
● The probability that one of them will be true is definite (1) as we are adding all possible probabilities
Using Intersections “∩”
- When referring top joint probability, we use intersection “∩”
- The event A ∩ B (it reads: the intersection of A and B, OR A intersection B) means the event where both A
and B are true, or both A and B occur
● EG) Let A denote the event “$29” and B denote “weekdays”
- A’ then indicates “NOT $29,” which would be either $49 or $79
- A ∩ B then indicates “$29 and weekdays”
- A’ ∩ B’ therefore means “$49 or $79 is chosen on the weekend”
- Law of Probability, Version 1: P(A ∩ B) + P(A ∩ B’) = P(A)
Venn Diagrams
- Venn Diagrams show logic relations across sets
● The external rectangle indicates the whole sample space
● The internal circle indicates some event A
- Rectangle (everything outside A) = A’
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
Joint Events: such as A ∩ B is the intersection (∩) of A and B
-
Union of A and B : denoted by A⋃B. pronounced as Union of A and B OR A union B
● P(A⋃B) indicates the probability that A or B is true, OR that A or B occur
-
General Rule for Addition: P(A⋃B) = P(A) + P(B) - P(A ∩ B)
● Can't just add A and B because you double up adding the middle section. Therefore you minus this
overlap after adding A and B
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
-
-
Mutually Exclusive events: If event A occurs only if event B does not occur (cannot occur at the same time),
we say A and B are mutually exclusive
Mutually exclusive events: P(A ∩ B) = 0
In a Venn Diagram, these events DO NOT intersect
Any event and its complement are mutually exclusive
● Either “A” occurs OR “A Does not occur” → P(A ∩ A’) = 0
Collectively Exhaustive Events: If the occurrence of A and B covers the whole sample space, we say A and B
are Collectively Exclusive
Collectively Exhaustive Events: P(A⋃B) = 0
Any event and its complement are collectively exhaustive
● “A occurs” and “A does not occur” make up all possible outcomes → P(A⋃A’) = 0
Conditional Probability
- In many cases, we are interest in Conditional Probability
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
●
-
-
-
What's the probability of achieving growth in the next quarter conditional on the success of our
advertising campaign?
● What's the possibility of bankruptcy conditional on that the economic recession lasts longer than 3
years?
P (A|B) denotes the probability event A occurs, conditional on B occurring
● EG) P ($29|weekdays) means the probability of a client choosing the $29 plan, conditional that she
visit the store during weekdays
Conditional Probability can be computed using Bayes Rule, Version 1: P (A|B) = P (A∩B)
P(B)
● like sample space is only B, so everything that occurs can only be in B
● “A” can only occur in space it shares with B: (A∩B)
Bayes Rule, Version 2: P (A∩B) = P (A|B) P (B)
SO, Joint Probability = Conditional Probability x marginal Probability. This leads to:
● Law of Probability, Version 2: P (A|B) P (B) + P (A|B’) P (B’) = P (A) (same as Version 1)
Independent Events
- If A and B are independent events, ether or not B occurs should NOT affect the probability of A occurring,
vice versa.
● Independent Events: Version 1: P (A|B) = P(A) & P (B|A) = P(B)
● Independent Events: Version 1 (based on Bayes rule): P (A∩B = P(A) x P(B)
EXAMPLE:
Are the events “client choosing $29 plan” and “client purchasing during weekdays” independent?
P($29) = 0.0625, and P(weekdays) = 0.475
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
So P($29) x P(weekdays) = 0.0297, which is different from
P($29 ∩ weekdays) = 0.0125. So these 2 events MAY NOT be independent
● Why do we say May? : this is a sample, and if we pick a different sample, it will be a different answer
- To see if such a hypothesis holds true we need to do a Statistical Test (Wk 5)
- To test for Independence, we need a contingency table (Wk 9)
Probability Trees & Binomial Probability
-
Having learnt Conditional Probability, we can draw up a Probability Tree to do scenario analysis
EG) A, denote Stock Price (p) movement (+1, 0, -1) in day 1 and day 2
-
A special case is when events come from only 2 outcomes (eg, success or failure, or binary outcomes) & are
independent → so P (A|B) = P(A)
In such cases, the Probability Tree is called a Binomial Tree
● EG) suppose we have 3 products, ach can be defective (D) with probability p, or functional (F) with
probability q = 1- p
-
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
●
EG) say there were 5 products. Whats the probability of detecting 3 defects out of 5 trials (3D, 2F)?
Case 1: p x p x (1-p) x p x (1-p) = p3 (1-p)2
Binomial Distribution
- A r.v X taking value in {0, 1,..., n) is said to follow the Binomial Distribution, denoted by X ~ Bin (n,p) if it
describes the (random) number of successes out of n trails in a binomial experiment (meaning that successes
in different trials are independent).
- Binomial distribution has the following probability distribution function (pdf) which calculates the
probability of the r.v equalling a certain number:
-
This computes the number of ways of choosing x objects from the set of n objects.
● Remember the factorial operator m! = 1 x 2 x 3 x … (m - 1) x m
So, if defect rate is 0.2, the probability of 3 defects out of 5 products is given by:
Properties of Binomial Distribution
- Almost all distributions have expectations (i.e mean) & variance (also standard deviation)
- Every distribution (their pdf) is characterised by some parameters
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
1) The binomial distribution has 2 parameters, n (number of trials) and p (success probability or success
rate)
Stats: Week 3 Lecture
Introduction
-
Just because something is random, doesn't mean we know nothing about it
● You can know the mean and variance
● You can know more! The probability distribution/ density function (called a pdf) is a mathematical
pattern which describes a random variable
● A small group of pdfs describe many things in the real world
Any situation where there are 2 outcomes can be represented by the Bernoulli random variable
● To buy / Not to buy
● Boom/ Recession
Many Bernoulli situations added up gives us a Binomial random variable
● Defective items in a factory production run
● The number of calls that drop out in an hour
The number of occurrences of something in a given time interval is a Poisson random variable
● The chance of 2 recessions in the next decade
-
-
-
Discrete Probability Distribution Function (pdfs)
-
Discrete Probability Distribution: the distribution of a discrete random variable (r.v)
Discrete Random Variable: a r.v that takes discrete values→ Discrete r.v counts
Continuous Random Variable: a r.v that takes values on (part of) the real line→ Continuous r.v measures
Discrete r.v
-
Continuous r.v
Number of kids in household
Number of successes in n trial
-
Waiting time in a queue
Height of soldiers
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
Number of obese kids aged 5
Doctor visits in 1 year
Daily counts of car accidents
Years of schooling
-
Stock returns
Inflation rate
Proportion of the elderly
Distance between 2 cities
Summation Notation
-
Reads: the sum of Xi, where i runs from i = 1 to i = n
Discrete Probability Distribution
- Can be defined via the means of probability distribution function (pdf) which aligns a probability within (0-1) to
possible outcomes such that all probabilities sum up to 1
- Suppose that r.v X can take n possible values. Its pdf is such that for all i E (1,2…,n), we have:
Reads: P(X = x1) + P(X = x2) + P(X = x3)...... P(X = xn)
EXAMPLE: r.v. X denotes the number of doctors visits in 1 month
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Cumulative Distribution Function (cfd)
- Given a pdf, e can define the Cumulative Distribution Function (cfd), which calculates the probability that the
r.v is smaller than or equal to a certain value. That is, the cdf gives:
cdf: Less than or = to a particular value
pdf VS cdf
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Special Discrete Probability Distributions
-
-
Rather than arranging the Probability Distribution Function (pdf) and Cumulative Probability Distribution (cdf),
we can mathematically model some Special Discrete Distributions
Learn the following Discrete Distributions (each of these are discrete r.v’s):
● Binomial Distribution
● Bernoulli Distribution
● Discrete Uniform Distribution
● Poisson Distribution
Recall from last week: every distribution has a pdf which can be characterised by some parameters. Let's go
through these distributions 1 by 1
Binomial Distribution
- A r.v X taking value in (0,1…,n) is said to follow the Binomial Distribution, denoted by: X ~ Bin(n, p) idf it
describes the (random) number of successes out of n trials in a binomial experiment. The Probability
Distribution Function (pdf) is:
-
There are 2 parameters
● n: number of trials
● p: probability of success p E (0,1)
- E(X) = np → Expected value of r.v is n x p
- Var(X) = np(1-p) → Variance of X
-
We can also write down the Cumulative Distribution Function (cdf)
EXAMPLE: a Call Centre is accessing its hotline connectivity. For each call a customer has, there is a dropout rate
of p = 0.3. What is the probability of 4 dropouts among 10 calls?
- Call dropouts are independent events (one drop out doesn't affect another)
- p = 0.3.
- x=4
- n = 10
r.v. X: number of dropouts out of 10 calls, with individual dropout rate of 0.3. So, X ~ Bin(n = 10, p = 0.3)
What is the probability of fewer than 5 dropouts amongst 10 calls? So we need P(X<5)
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Bernoulli Distribution
- A r.v taking value in (0 or 1) is said to follow the Bernoulli Distribution, denoted by X 〜 Ber(p), if it describes
the binary outcome of 0 (failure) or 1 (success), with probability of success p, or P(X =1) = p
- The Pdf is:
-
Bernoulli Distribution is a special case of binomial distribution where the number of trials n= 1
EXAMPLE: consider rolling a dice only once. Let the r.v indicate where 6 appears. It follows X 〜 Ber(⅙)
Discrete Uniform Distribution
- A r.v X taking value in (a, a+ 1,...., b - 1, b) is said to follow the Discrete Uniform Distribution
-
The pdf is:
if all potential outcomes (realisations) between a & b re equally likely.
EXAMPLE:
Poisson Distribution
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
A r.v X taking value in (0,1…) is said to follow the Poisson Distribution
If it describes the random number of arrivals of events within a given time period.
-
The pdf is
with 1 parameter
● λ: (Pronounced Lambda) the intensity of arrivals (average number of arrivals) within the given period
of time: λ > 0
● E(X) = λ; Var(X) = λ
Because E(X) = λ, the intensity parameter λcan also be interpreted as the mean arrival rate.
● Example: We can use Poisson Distribution to model
- The number of cars passing through a toll every min. If λ = 5, it means the intensity (mean
arrival rate) of cars passing through toll is 5 cars per min
- The customers come into a KFC store every min. If λ = 140, it means the intensity (mean
arrival rate) of customers coming to the store is 140 people per hour
-
EXAMPLE: the number of claims or missing baggage in a small city averages 12 per day (16 operating hours). What
is the probability that at any given hour, there will be fewer than 2 claims?
- Notice that time period is in different unit → convert into the same unit as the question
- On average, 12 claims per 16 hours (mean arrival rate of missing claims is 12 claims per 16hours) ean 0.75
claims per hour
-
Poisson Distribution has 1 very important property. That is “Poisson arrivals see exponential waiting time”
● It means, if he know the number of arrivals within a given time period follows Poisson Distribution, the
inter-arrival waiting time (time between successive arrivals) will follow the exponential distribution
-
Poisson Distribution is a Discrete distribution.
Poisson distributed r.v counts the number of arrivals within a given period of time
● What is the probability of fewer than 2 claims within an hour?
Exponential Distribution (next week) is a Continuous distribution (time can be of any factional number eg,
1.2434856393 mins)
Exponentially distributed r.v measures the length of time until the next arrivals
● What is the probability of less than 1 hour between claims being made?
-
Wk 3 Summary
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Stats: Week 4 Lecture
Introduction
-
What's the probability that a woman in Business Stats is 5 feet (152cm) tall? It's 0!
Continuous Probability Density functions are built on a line on an axis, & that line has so many values
squeezed onto it that the probability that it takes 1 particular value is actually 0
● Were used to probability of 0 meaning something impossible → here it means ‘infinitely infrequent’
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Continuous pdfs
-
Discrete probability Distribution: the distribution of a discrete r.v
Discrete Random Variable: a r.v that takes discrete values. eg) the difference between consecutive positive
values is a fixed constant → Discrete r.v counts
Continuous Random Variable: a r.v that takes value on the real line. eg) fractional numbers which infinitely
mean digits after the decimal point → Continuous r.v measures
-
Discrete r.v
-
Continuous r.v
Number of kids in household
Number of successes in n trail
Number of obese kids aged 5
Doctors visits in 1 year
Daily counts of car accidents
Years schooling
-
Waiting time in a queue
Height of soldiers
Stock returns
Proportion of LGBTQ+ people
Distance between 2 cities
Winning probability
Discrete Probability Distribution (LAST WK)
Continuous Probability Distribution (THIS WK)
Can be defined via the means of Probability
Distribution Function (pdf) which assigns a probability
within (0-1) to possible outcome such that all
probabilities sum up to 1
Can be defined via the the means of Probability
Density Function (pdf) which assigns a positive value
to possible outcomes such that all densities integrate to
1
Features about Continuous r.v X
- X can assume any values, including fractional numbers. (writing time in mins, altitude of a GPS location,
relative speed of 2 objects)
- Distribution covers the real line (real line means real numbers put on the line) or part of it (waiting time cannot
be negative, altitude cannot exceed the diameter of earth, speed of any object cannot exceed that of light)
- Its useless to talk about P(X = x), where x is some specific value
Formula
- We denote any continuous r.v X and any specific value x as: P(X = x) = 0.
- What's the probability that a rock you throw is exactly the middle of the blackboard? If you keep zooming in,
the probability is 0.
BUT, What's the probability that a rock you throw hits the left part of the blackboard? Non-0
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
-
Same as a Discrete Probability Distribution, P(X < x) for a Continuous r.v defines the Cumulative Probability
Distribution
EXAMPLE: if the r.v waiting time in mins, enoted by x, it makes perfect sense to ask P(X < 5)
● Is the probability of waiting in line less than 5 mins?
Different from the pdf (d= distribution) for discrete r.v’s, denoted by P(X = x), which can be plotted using bar
charts or histograms, the pdf (d = density) of continuous r.v’s, denoted by fx(x), is called the density of r.v X
evaluated at x.
The area under the pdf fx(x) and top the left of the x is the cfd P(X < x)
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Taking any values
- Suppose that r.v X can take any values on the real line.
- Some properties of its pdf include:
3. Just like discrete r.v’s, continuous r.v’s mean and variance (& thus, standard deviation)
Special Continuous Distributions
-
-
Continuous distributions are much more widely used than discrete distributions.
Learning the following continuous distributions:
● Continuous uniform distribution
● Exponential Distribution
● Normal Distribution
● Standard normal Distribution
Similar to discrete distributions, every continuous distribution can be characterised by a pdf with some
parameters
Uniform Distribution
- A r.v X taking any value within (a,b) is said to follow the Continuous Uniform Distribution: X ~ Unif(a,b) if all
potential outcomes (realisations) between a and b are equally likely
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
With 2 parameters:
● a: the minimum value that X can assume
● b: the maximum value that X can assume
●
E(X) =
𝑎+𝑏
2
, Var(X) =
(𝑏−𝑎)2
12
-
Example: X ~ Unif(1,5) eans any value between 1 and 5 is equally likely to occur. The pdf is continuously
uniformly distributed, r.v is a rectangle
-
Example: It is determined that the cost of a committee meeting in a company is continuously uniformly
-
-
distributed, with a minimum cost of $50 and a maximum cost of $120.
What is the probability that the committee meeting will cost exactly $70?
● Let X denote the r.v, we have X ~ Unif ( a= 50, b=120)
● Because X is continuously distributed, we know P(X = 70; a = 50, b= 120) = 0.
What is the probability that a committee meeting will cost less than $110
●
110 − 50
P(X < 110; a = 50, b= 120) = 120− 50 = 0.8571
Exponential Distribution
- Poisson distribution has one important property. That is:
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
-
-
-
● “Poisson arrivals see exponential waiting time”
It means if we know the number of arrivals within a given time period follows Poisson distribution, he
inter-arrival waiting time (time between successive arrivals) will follow the exponential distribution
Poisson Distribution: is a Discrete Distribution. Possession-distributed r.v counts the number of arrivals within
a given time period
● What's the probability of fewer than 2 customers within an hour?
Exponential Distribution: is a Continuous Distribution. Exponentially distributed r.v measures the length of
time until the next arrival.
● What's the probability of less than 1 hour between costumes coming to the shop?
A r.v X taking values in (0,∞) is said to follow the Exponential Distribution: X ~ Exp(λ)with λ > 0
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
EXAMPLE: Telstra has tested its first 5G network in Sydney. The IT department are determined that the Number of
Customers participating in the test who attempt to connect to the 5G network per hour is Poisson Distributed with 30
customers on average.
Because their system is under its initial test, if connection requests are too close (less than 45 seconds or less than
40/60 = 0.75 mins), connection failure will occur.
Management has posed the question of whether to purchase expensive equipment to fix this problem. They need to
know the probability that a customer will fail to connect.
- Answer: P(connection failure) = P(connection too close) = P(X< 0.75 mins), where X denotes the r.v of time
being successive connection request
Normal Distribution
- A r.v X taking any values on the real line is said to follow Normal Distribution:
X ~ N(µ, σ)with µE(-∞, ∞) and σ> 0
- With 2 parameters:
● µ: means value of this r.v
● σ: the standard deviation of this r.v
● E(X)........
- Normal Distributionis symmetric around
- Probability fades away as moving further away from the mean
- Due to symmetry: mean = mode = median
Snowflake Problem: there are infinitely many normal distributions characterised by different pairs of (µ, σ)
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Standard Normal Distribution
-
A r.v Z that is standard normally distributed menas Z ~ N(µ𝑧, θ𝑧), with µ𝑧=0, θ𝑧=1, or Z~ N(0,1)
Standardisation transforms some normally distributed r.v X into the r.v Z which is Standard Normally
Distributed, by subtracting the meanµ𝑥 from X and divided by the std θ𝑥
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Example: Daily revenue Y is normally distributed with mean 1000 and std 200. → Y 〜 N(1000,200). What is the
probability that revenue tomorrow is between 900 and 1400?
We compute: P(900 < Y < 1400; µ=1000, θ= 200.
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
………..
Wk 4 Summary
- Continuous Random variable: a r.v that takes values on the real line, i.e fractional numbers with infinitely
many digits after the decimal point.
- 3 key continuous random variables are
● Continuous Uniform Distribution
● Exponential Distribution
● Normal Distribution (& Standard Normal Distribution)
- The Normal is very important because of the Central Limit Theorem
- We Standardise the normal because we need to use the same table from all different means and variances
(the ‘snowflake problem’).
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
Stats: Week 5 Lecture
The Core of Inferential Statistics
-
One of the most important ideas in stats is the Sampling Distribution
The Sampling Distribution is the pdf of a statistical calculation
Even if we can't control the variance of the original data, we CAN control the variance of the sampling
distribution by using a larger sample!
1000 or 10?
- n=1000 is better than n=10 → Statistics tells us how much better exactly
- The standard deviation of the sampling distribution has a special name → The Standard Error
- The standard error of the sampling distribution is proportional to the inverse of the square root of n
- So, if n rises 100-fold, from 10 to 1000, the standard error falls 10-fold
We can do 2 things with Sampling Distribution
1. It will allow us to draw a reasonable range for the measure of central tendency, called a Confidence Interval,
OR
2. It will allow us to test beliefs, called a Hypothesis Test
- Confidence Intervals are next week's topic. The rest of the subject is devoted to Hypothesis tests
- Todays material is a really important foundation
Introduction to Sampling Distribution
Example: Helen, a research assistant at Burger King, reads a recent industry report. It indicates that the mean
dollars spent by consumers on fast food per year is $311. Helen doesn't believe this report, as the business in many
branches are in bad shape → there's usually not many customers.
- What questions could Helen have about this report average amount of spending
1. How did the report collect the data?
2. What sample was used?
3. Was the sampling conducted randomly?
4. How accurate is such a number?
5. Why not use the whole population? Many reasons:
● Costly: there's 7.7 billion people
● Time consuming: by the time measurements have finished, the population has changed
● Risks killing entire population
● Population sizer can be infinity: to measure average spinning speed of an atom, need to find
all atoms in the universe
- Statistical inference/ analysis uses information from a sample to infer properties about a population
-
Statistical inference goes from the sample to the population. We use information from a sample to
summarise/ report/ estimate/ describe/ test parametres in the population
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
Oftentimes, we cannot compute the population parameters.
BUT, We can compute sample statistics. We hope sample statistic can serve as an estimate for population
parameter. This is the case if the sample is collected randomly.
Collecting different samples lead to different sample statistics. Sampling error means the discrepancy
between a sample statistic and corresponding population parameter.
A sample statistic is also called (serves as) an estimate of the corresponding population parameter.
EXAMPLE:
-
Because the sample is chosen randomly, the sample statistic (the estimate) itself is a random variable, which
means it follows some certain distribution.
The variability of this distribution helps us understand how accurate our estimate is.
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
EXAMPLE:
The Population is stock returns (Price today/ price yesterday - 1) of IBM from 04/16/2014 to 04/16/2019.
-
We assume all random variables involved are normally distributed: X ~ N(µ𝑥, σ𝑥)
We will study the sampling distribution of:
Sampling Distribution: 𝑋
-
Assuming X ~ N(µ𝑥, σ𝑥) and we have a sample size of n, i.e we have x1, x2, xn (to the power of)
-
Because 𝑋 is an estimate (for µ𝑥), we call the standard deviation (std) of any estimate by standard error. Std
is for the original r.v, standard error is for any estimate or sample statistic.
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
EXAMPLE:
Suppose Hourly wages in the Australian retail industry are normally distributed, with a mean of $22.50 and a
standard deviation of $ 4.80. A random sample of 25 retail employees is invited to undertake an online survey
about their wages.
Compare the following two questions:
Q1: What is probability that a randomly selected individual will earn less than $21? → This is a question about the
random variable itself.
Q2: What is the probability that the sample mean will be less than $21? → This is a question about the sample mean
computed from a sample of observations of this random variable.
-
Assuming X ~ N(µ𝑥, σ𝑥) and we have a sample size of n, i.e we have x1, x2, xn (to the power of)
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
Student’s t distribution is a continuous distribution characterised by one parameter dof𝒗. It looks a lot like the
standard normal distribution (i.e. 𝒕𝒗 looks like 𝒁), symmetric around zero. But the smaller dof𝒗is, the fatter tail
is (than standard normal distribution).
EXAMPLE:
Suppose Hourly wages (𝑋) in the Australian retail industry are normally distributed, with a mean (𝜇𝑋) of $22.50 and
an unknown standard deviation. A random sample of 𝑛=20 retail employees is invited to undertake an online survey
about their wages. It turns out the sample standard deviation 𝒔𝑿=$𝟒.𝟖.
Q3: What is the probability that the sample mean is less than $21.
T-Table
What does Student’s t-table do? It finds 𝒙such that the right tail probability 𝑃𝑡𝑣>𝑥;𝑣=0.1, 0.05, 0.025, 0.01, 0.005 or
0.001. For example, if we have a r.v.t that is Student’s 𝑡-distributed with n=5 and dof𝑣=4
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
The value of 𝑥such that 𝑃𝑡𝑣>𝑥;𝑣=4=0.025equals 2.776
As discussed, the pdf of 𝑡(𝑣)is symmetric around 0, we also know
The value of 𝑥such that 𝑃𝑡𝑣<𝑥;𝑣=4=0.025equals −2.776.
Sampling Distribution: 𝑠2𝑥
-
Assuming 𝑋∼𝑁(𝜇𝑋,𝜎𝑋)and we have a sample of size 𝑛, i.e. we have 𝑥1,𝑥2,...,𝑥𝑛,
-
𝑥2 distribution is a continuous distribution characterised by one parameter dof𝒗. It is asymmetric and
takes only positive values (because variance is always positive).
Downloaded by Ben Shirley (benshirley002@gmail.com)
lOMoARcPSD|7884987
-
𝑃𝜒𝑣2 <𝑥;𝑣=5=0.1gives the right-tail probability 𝑃𝜒𝑣2 >𝑥;𝑣=5=1−0.1=0.9, so 𝑥can be found in the table
with 𝑣=5and right-tail probability 0.9, i.e. 𝑥=1.61.
For Exam: not be asked to compute expressions such as 𝑃(𝜒𝑣2<𝑥;𝑣) because 𝜒2 table does not provide it
-
But will be aked what distribution 𝝌𝒗𝟐=𝒏−𝟏𝒔𝑿𝟐/𝝈𝑿𝟐 follows, what 𝒔𝑿𝟐equals ,what the degrees of
freedom 𝒗 is.
- Also, you may need to compute this using Excel for the assignment !See tutorial exercises
EXAMPLE:
Downloaded by Ben Shirley (benshirley002@gmail.com)
Download