Describing and Summarizing Data Describing a research question quantitatively Research questions Social media use: Do consumers use social media? What proportion of the target audience use social media? Consumer attitude Corresponding variables X1 (binary) X2 (7 point scale) Measurement (Sample/Data) 1, 0,…,1 6,2, …,5 Describing a discrete variable Discrete variable: takes finite number of values Example: an itemized variable with scale 1-7 What is the gender of my consumer? Do you like my product or not? Age is subjective. It can be discrete or continuous. There’s only a certain number of categories A frequency distribution for a discrete variable (or Histogram) is the list of the count for each value of the variable The most fundamental concept in stats Summarizing data of an itemized variable Measures of Locations (1st moment measures) Mean Median Mode Measures of Variability (avg. difference the data is from the mean)(2nd moment measures) Range Standard deviation/variance Standard error Measures of Shape (3rd and 4th moment measures) Skewness (asymmetry) Kurtosis (tail weight) Describing a continuous variable If we separate the values of the variable into multiple “small intervals” and treat each small interval as a single value, then we can construct a relative frequency distribution (i.e., in terms of %, not counts) for the “discretized” variable. When the interval is getting smaller, the limit of the frequency distribution for the “discretized” variable is called Probability Density Function (PDF) of the variable. The probability that the variable is less than a certain value is called Cumulative Density Function (CDF) of the variable Describing two discrete variables together a histogram describes the frequency distribution of one variable a cross-tabulation describes the frequency distribution of two or more variables simultaneously. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values.