Uploaded by Jordan Chen

Statistical Methods

advertisement
Statistical Methods
Quiz every class – it will take 10 minutes at the beginning of every single class.
Use the ppt folder without the final answers
Mod is used for checking the answer
Section 1.1
-
-
What is Statistics?
o Is the study of how to collect, organize, analyze, and interpret numerical information
from data?
o What is the best way to interpret the numbers?
Individuals are the people or objects included in the study.
A variable is a characteristic of the individual to be measured or observed
o Variable is the characteristics so sometimes it’s – weight, measurement, color
(describes the individual)
o Quantitative variable has a value or numerical measurement for which operation such
as addition or averaging make sense. (Weight Measurement) (Number)
o Qualitative variable describes an individual by placing the individual into a category or
group, such as male or female (Color, Name, non-number) sometimes it is known as a
Categorical variable.
In Population data, the data are from every individual of interest
In sample data, the data are from only some of the individuals of interest
Population parameters is a numerical measure that describes an aspect of a populations (Average,
mean, variance, standard deviation)
Sample statistic is a numerical measure that describes an aspect of a sample
Levels of measurement: Nominal Ordinal Interval Ration
-
-
Nominal – names, labels categories – nationality, eye color, zip code, Major, (There is no best
version
Ordinal – data that is arranged in order – poor, acceptable, good, rankings. (Grade A,B,C) Rating
scale (poor, good, excellent)
Interval – can be arranged in order – temperature – precise differences rankings – there is no
true zero (SAT score, IQ, Temperature) no meaningful zero because zero doesn’t mean nothing.
Even if they get a zero IQ it doesn’t really mean zero it is a score
Ratio – Interval plus true zero, (Heightm weightm time, salary, age, (Zero really means Zero) aka
Nothing
Descriptive statistics – involved methods of organizing pictures, summarizing information from samples
or population
Inferential Statistics – involved methods of using information from a sample to draw conclusions
regarding the population – it takes it one step further descriptive statistics since you have to take
information and draw a conclusion regarding the population.
Section 1.2
Random Samples
Simple random samples – n measurements from a population is a subset of the population selected that
everyone in the population has an equal chance of being selected.
Everyone basically has an equal chance of getting selected to get tested for the sample size
-
-
-
-
Random Sampling: everyone just gets an equal sample chance from the entire population
Stratified sampling: Divide the entire population into subgroups called strata. – characteristics
such as age, income, education level are examples. They share similar characteristics and you
just take samples from each strata or each of the groups
Systematic sampling: Number all members of the population sequentially. Then from a point
you choose every kth member of the sample. So, if it was every 106th person for example
Cluster sampling: Divide the entire population into pre-existing segments or cluster. The clusters
are often geographic. Make a random selection of clusters. Include every member of each
selected cluster in the sample. (Often this is geographic) So you just choose random people from
the already done clusters.
Multistage sampling: Use a variety of sampling `methods to create successively smaller groups
at each stage. The final sample consists of clusters (You use the clusters first so you’re using
more than one method to get the sampling)
Convenience sampling: Create a sample by using data from population members that are
readily available. (Using a sample from a previous research or someone from someone else’s
sample.
There IS ALWAYS going to be errors when taking samples from a population because people are always
different
-
The differences in the error are known as sampling error.
Sampling error – does not perfectly represent the population
Nonsampling error - happens with poor sample design, sloppy data collection, faulty
measurement instruments, bias in questionnaires, and so on.
Section 1.3
Introduction to Experimental Design
Basic Guidelines for Planning a statistical study
1. First identify the individuals and or objects of interest
2. Specify the variables as well as the protocols for taking measurements or making
observations
3. Determine if you will use an entire population or a representative sample. If using a sample,
decide on a viable sampling method.
4. In your data collection plan, address issues of ethics, subject confidentiality, and privacy. If
you are collecting data at a business, store, college, or other institution, be sure to be
courteous and to obtain permission, as necessary.
5. Collect the data.
6. Use appropriate descriptive statistics methods and make decision using appropriate
inferential statistics methods
7. Finally, note any concerns you might have about your data collection methods and list any
recommendations for future studies.
Census – measurements or observations from the entire population are used
If we use data from only part of the population of interest, we have a sample.
Sample, measurements, or observations from part of the population
Gathering data for statistical study
Observational study, observations and measurements of individuals are conducted in a way
that does not change the response or the variable being measured (don’t do anything to change the
result)
Experiment, a treatment is a deliberately imposed on the individuals to observe a possible
change in the response or variable being measured. (purposely doing something to trigger some sort of
response)
Control is something that nothing happens to them – Everything that works as normal
Treatment group or Experimental group gets all of the medicinal stuff
Chapter 2 9/14/2020
Organizing Data
Frequency Distribution, Histograms, and related topics 2.1
4 assignments due
Frequency Table
-
-
First thing when you want to start using a table is to figure out how many groups
o Five to 15 classes are usually used
o Always use around (5-15)
o Use less than 5 and you lose too much information or too general
o More than 15 and the data isn’t summarized you could just use all of the data and
there’s no nothing you are really taking out of it.
How to figure out how many numbers in each classs
Compute
Large Data Value – Small data value
--------------------------------------------------------------Desired Number of classes
When you get the result increase the value to the next biggest number
The lower-class limit is the lowest data value that can fit in a class
The upper-class limit is the highest data value that can fit in a class
Midpoint just add the two class limits and divide by two
The relative frequency of a particular class, divide the class frequency f by the total of all frequencies n
sample size.
Frequency needs to equal the amount of data points
The relative frequency is just the frequency divided by the number of data points or total frequency
So like 14/60 21/60
Histograms and Relative-Frequency Histograms
-
For histograms, the height of the bar is the class frequency, whereas for relative-frequency
histograms, the height of the bar is the relative frequency of that class.
Make sure to have a gap for histograms. There’s a gap there because it usually doesn’t start at 0.
Because we always start with class boundaries. Because you have to be able to fill the gap so if it was a
perfect number.
Always use class boundary for histograms
Typical mound shaped symmetrical histogram
-
mount monitorLook like mountain
Typical uniform or rectangular histogram
-
Literally looks like a rectangle
Typical skewed histogram
-
Looks like Mario staircase
Skewed left means left is down and looks like Mario staircase
Skewed right means right is down and more data on the left side
Typical bimodal histogram
-
Two camel backs
Cumulative Frequency Tables and Ogives
-
Cumulative frequency is for a class is the sum of the frequencies for that class and all previous
classes. You are just adding the previous frequency plus the current frequency.
When building the ogive you need to match the upper class boundary number and the cumulative
frequency.
Section 2.2 Bar Graphs, Circle Graphs, and Time-series Graphs
-
The issue with histograms is that the data must be quantitative
Stem and leaf display
Averages and variation
3.1 – Measures of Central Tendency: Mode, Median, and Mean
Measures of Variation
Measuring the spread of the data we use RANGE basically the difference between the largest and
smallest value.
Variance and Standard Deviation
The Square root of Variance is the Standard Deviation
Mean = expected value
Sample is x bar
Population we use u
Find STANDARD DEVIATION
To calculate the standard deviation of those numbers:
1.
2.
3.
4.
Work out the Mean (the simple average of the numbers)
Then for each number: subtract the Mean and square the result.
Then work out the mean of those squared differences.
Take the square root of that and we are done!
The only difference between the sample and the population formulas is that there is no
(n-1) for the population in the denominator
Coefficient of Variation CV
Standard Deviation / the Mean then multiply by 100 for a percent
The higher the number or percent the more VARIABLE IT IS.
The lower the percent the more concentrated
Chebyshev’s Theorem
1 – ( 1/(k^2))
K is how many standard deviations from the mean.
Chapter 3.3 Percentiles and Box and Whisker Plots
For quarters if is
Lowest – Q1 – Q2 (Median or 50th Percentile) – Q3 – Highest
Find the Median to get the Q2
Then you find the median for the inbetween for the lowest value and highest value to find Q1 and Q3
IQR is known as Interquartile range or known as = (Q3 – Q1) to give you the middle 50 percent.
4.1 09/28/2020
What is probability?
It is all about the likelihood of an event
Not possible to get a negative probability
P(A), read “P of A,” denotes the probability of event A.
If P(A) = 1, event A is certain to occur
If P(A) = 0, event A is certain not to occu
Probability of event = relative frequency = f/n
F is the frequency of the even occurrence in a sample of n observation
Probability of even = Number of outcomes favorable to even / total number of outcomes
Intuition based probability
Law of large numbers
A statistical experiment or statistical observation can be thought of as any random activity that results in
a definite outcome
An event is a collection of one or more outcomes of a statistical experiment or observation
A simply even is one particular outcome of a statistical experiment
The set of all simple events constitutes that sample space of an experiment.
P(A) + P(Ac) = 1
P(Ac) = 1 – P(A)
Section 4
Conditional Probability and multiplication Rules
Independent events
-
Two events are independent if the occurrence or noncurrent of one vent does not change the
probability that the other event will occur
Multiplication rule for independent events
P(A and B) = P(A)*P(B)
Conditional probability means that the events are dependent
The notation P(A, given B) denotes the probability that event A will occur given that event B has
occurred.
In the P(A and B) = dependent considering P(AIB) – B must happen first
P(BIA) – A must happen first
P(A and B) = (P(A) * P(B I A)
P(A and B) = P(B) * P (A I B)
P(A I B) = P(A and B)/ P(B)
Section 4.3 Trees and Counting Techniques
0! = 1
1! = 1
5! = 5 * 4 * 3 * 2 * 1
The formula we use to compute this number is called the permutation formula. As we see in the next
example, the permutations rule is really another version of the multiplication rule.
Counting Rule for Combinations
Order does not matter
The number of comvintation so fn ovjects taken r at a time is
Cn,r = n!/r!(n-r)!
Chapter 5
The Binomial Probability Distribution and Related Topics
Section 1 Introduction to Random Variables and Probability distributions
A quantitative variable x is a random variable if the value that x takes on in a given experiment or
observation is a chance or random outcome
A discrete random variable can take on only a finite number of values or a countable number of values
-
A WHOLE NUMBER 25, 26, 27,
A continuous random variable can take on any of the countless number of values
Probability Distribution of Discrete Random Variable
A random variable has a probability distribution whether it is discrete or continuous
A probability distribution is an assignment of probabilities to each distinct value of a discrete random
variable or to each interval of values of a continuous random variable.
Features of the probability distribution of a discrete random variable
1. The probability distribution has a probability assigned to each distinct value of the random
variable
2. The sum of all assigned probabilities must be 1
Binomial Probabilities
P(at least one) = 1 – p(none)
Download