54L1 THEORY DATA AND DATASET 1. Data are the facts collected, summarized, analyzed and interpreted and when they are collected in a particular study are referred to as data set 2. Elements are the entities on which data are collected such as individuals, firms, countries etc 3. A variable is a characteristic of interest for the elements such as shoe size, number of employees, GDP etc. 4. An observation is a set of measurements collected for a particular element such as shoe size 44, 10 employees, 583 billion USD etc. 5. The total number of data values in a data set is the number of elements multiplied by the number of variables. A simple graph with names and subject grades for each student ● The elements correspond to individuals ● The variables are the number of points on tests in mathematics, physics etc ● The observations are the actual pointsthat each individual received, ex. For matt: 38 ( Math), 58 for Physics, 66 for chemistry, 49 for biology ● The data set co sists of 32 observations ( everything all together) SCALES OF MEASUREMENT - 4 different scales of measurement This scale determines the amount of information contained in the data and indicates the statistical analyses that are most appropriate ● ● ● ● Nominal : Data is divided into different categories, there are no natural ranking of the categories, the variable “values” of the variable can only be described in words, not numbers, a nonnumeric label or numeric code may be used and the mathematical symbols are = and ≠ Ordinal: Data is divided into different categories, there exists a natural ranking of the categories, it isn’t possible to indicate in any meaningful way differences or distances between the values, a nonnumeric label or numeric code may be used, Interval : data is always numeric, there exists a natural ranking of data, variables measured on the interval scale have fixed measurement units, it makes sense to specify differences or distances between values, arbitrary zero point which means that interval scaled variables do not have a true or absolute zero point so because of this it is technically incorrect to declare that something is so many times larger or smaller than something else. Ratio : data is always numeric , there exists a natural ranking of the data, variables measured on the ratio scale have fixed measurement units, it makes sense to specify ● Data measured on the nominal and ordinal scale is usually called qualitative or categorical data ● Data measured on the interval and ratio scale are usually called quantitative data ● More options for statistical analysis when the data are quantitative (Statistical inference) : the process of drawing conclusions about an underlying population based on a sample or subset of the data Graphical methods ● ● Pie chart: it is a chart type where circle sectors show proportions of a total Bar chart: it is a chart type that shows the values of different groups by the height of the bars ● Histogram : it is a chart type that shows how mant observations there are for each interval ● Scatter plot : Scatter plot is a chart type that uses dots to represent values for two different numeric variables. It is used to observe relationships between two variables Measures of central tendency A measure of central tendency is a central or typical value for data or a probability function ● Mode is the most frequent value in the dataset ● Median is the middle value that separates the higher half from the lower half of the dataset ● Mean is the average value calculated as (formula) 5. Measures of variability - Percentiles ● The Pth percentile is the value such that that at least p percent of the observations are less than or equal to this value. The percentiles that divide the observations into four parts ( P25, P50 and P75) are called quartiles. P25 is called the first quartile, P50 is called the second quartile or the median, and P75 is called the third quartile