Uploaded by nnaattaallliiieee1737

Describing and Summarizing Data

advertisement
Describing and Summarizing Data
 Describing a research question quantitatively
 Research questions
 Social media use:
 Do consumers use social media?
 What proportion of the target audience use social media?
 Consumer attitude
 Corresponding variables
 X1 (binary)
 X2 (7 point scale)
 Measurement (Sample/Data)
 1, 0,…,1
 6,2, …,5
 Describing a discrete variable
 Discrete variable: takes finite number of values
 Example: an itemized variable with scale 1-7
 What is the gender of my consumer?
 Do you like my product or not?
 Age is subjective. It can be discrete or continuous.
 There’s only a certain number of categories
 A frequency distribution for a discrete variable (or Histogram) is the list of the count for each value of
the variable
 The most fundamental concept in stats
 Summarizing data of an itemized variable
 Measures of Locations (1st moment measures)
 Mean
 Median
 Mode
 Measures of Variability (avg. difference the data is from the mean)(2nd moment measures)
 Range
 Standard deviation/variance
 Standard error
 Measures of Shape (3rd and 4th moment measures)
 Skewness (asymmetry)
 Kurtosis (tail weight)
 Describing a continuous variable
 If we separate the values of the variable into multiple “small intervals” and treat each small interval as a
single value, then we can construct a relative frequency distribution (i.e., in terms of %, not counts) for
the “discretized” variable.
 When the interval is getting smaller, the limit of the frequency distribution for the “discretized” variable
is called Probability Density Function (PDF) of the variable.
 The probability that the variable is less than a certain value is called Cumulative Density Function
(CDF) of the variable
 Describing two discrete variables together
 a histogram describes the frequency distribution of one variable
 a cross-tabulation describes the frequency distribution of two or more variables simultaneously.
 Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a
limited number of categories or distinct values.
Download