REVIEW QUESTIONS 1. Define the following terms: a. Population A population is the pool of individuals from which a statistical sample is drawn for a study. b. Sample A sample is an analytic subset of a larger population. The use of samples allows researchers to conduct their studies with more manageable data and in a timely manner. Randomly drawn samples do not have much bias if they are large enough, but achieving such a sample may be expensive and time-consuming. 2. What are the two major types of statistics? Describe them in detail. The two major areas of statistics are known as descriptive statistics, which describes the properties of sample and population data, and inferential statistics, which uses those properties to test hypotheses and draw conclusions. 3. Identify some methods used in descriptive statistics.\ 1. Measures of Frequency: * Count, Percent, Frequency * Shows how often something occurs * Use this when you want to show how often a response is given 2. Measures of Central Tendency * Mean, Median, and Mode * Locates the distribution by various points * Use this when you want to show how an average or most commonly indicated response 3. Measures of Dispersion or Variation * Range, Variance, Standard Deviation * Identifies the spread of scores by stating intervals * Range = High/Low points * Variance or Standard Deviation = difference between observed score and mean * Use this when you want to show how "spread out" the data are. It is helpful to know when your data are so spread out that it affects the mean 4. Measures of Position * Percentile Ranks, Quartile Ranks * Describes how scores fall in relation to one another. Relies on standardized scores * Use this when you need to compare scores to a normalized score (e.g., a national norm). 4. Explain two ways in which descriptive statistics and inferential statistics are interrelated. 5. Define the following terms: a. Observational study b. Designed experiment Observational study In an observational study, the researchers only observe the subjects and do not interfere or try to influence the outcomes. In other words, the researchers do not control the treatments or assign subjects to experimental groups. Designed experiment Design of experiments (DOE) is defined as a branch of applied statistics that deals with planning, conducting, analyzing, and interpreting controlled tests to evaluate the factors that control the value of a parameter or group of parameters. 6. Explain why a census is often not the best way to obtain information about a population. In a census, information is obtained by the researcher from the entire population of interest. Conducting a census of the entire population is time-consuming, costly, impractical, or sometimes impossible. 7. Identify two methods other than a census for obtaining information. Questionnaires and surveys. Observations. Documents and records. Focus groups. 8. In sampling, why is obtaining a representative sample important? Representative samples are important because they ensure that as many relevant types of people as possible are included in your sample, and that the right mix of people are interviewed or surveyed. This helps ensure your results aren't tainted by bias. 9. Regarding probability sampling: a. What is it? Probability sampling refers to the selection of a sample from a population, when this selection is based on the principle of randomization, that is, random selection or chance. Probability sampling is more complex, more time-consuming and usually more costly than non-probability sampling. b. Does probability sampling always yield a representative sample? Explain your answer. c. Identify some advantages of probability sampling. o o o o o o The absence of systematic error and sampling bias Higher level of reliability of research findings Increased accuracy of sampling error estimation The possibility to make inferences about the population. Effective to collect choose samples from broad population base Cost-effectiveness Simple and straightforward in application 10. Regarding simple random sampling: a. What is simple random sampling? Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population. Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset. b. What is a simple random sample? A simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen. c. Identify two forms of simple random sampling and explain the difference between the two. Probability sampling involves random selection, allowing you to make strong statistical inferences about the whole group. Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect data. A simple random sample is similar to a random sample. The difference between the two is that with a simple random sample, each object in the population has an equal chance of being chosen. With random sampling, each object does not necessarily have an equal chance of being chosen. 11. Identify two methods for obtaining a simple random sample. Identify two methods for obtaining a simple random sample. a. Stratified b. Cluster Set 2 1. Explain the meaning of data? data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created. Statistics are the results of data analysis - its interpretation and presentation. 2. Explain the meaning of variables. On what basis do we classify variables? A variable is a characteristic that can be measured and that can assume different values. Height, age, income, province or country of birth, grades obtained at school and type of housing are all examples of variables. Classification Qualitative Quantitative Discrete Continuous 3. Provide a reason why the classification of data is important. Using data classification helps organizations maintain the confidentiality, ease of access and integrity of their data. It also helps to lower the danger of unstructured sensitive information becoming vulnerable to hackers, and it saves companies from steep data storage costs. 4. What is the frequency distribution of qualitative data and why is it useful? It is a chart that displays the distinct values of the qualitative data on a horizontal axis and the frequencies of those values on a vertical axis. It is O B. It is a listing of the distinct values and their frequencies. It is useful because it provides a table of the values of the observations and how often O C. 5. Explain the difference between a. frequency and relative frequency. Frequency is the number of times a result occurs, while “relative frequency” is the number of times the result occurs divided by the number of times the experiment is repeated. b. percentage and relative frequency. Frequency is the number of times a result occurs, while “relative frequency” is the number of times the result occurs divided by the number of times the experiment is repeated. 6. Identify an important reason for grouping data. 2.35 Do the concepts of class limits, marks, cut points, and mid-points make sense for qualitative data? Explain your answer. 7. State three of the most important guidelines in choosing the classes for grouping a quantitative data set. Classes should be exhaustive: there should be a class for every data value in the set. Classes should be mutually exclusive: there should be no overlapping data values between classes. Classes should be continuous: there should be no gaps between classes. 8. With regard to grouping quantitative data into classes in which each class represents a range of possible values, we discussed two methods for depicting the classes. Identify the two methods and explain the relative advantages and disadvantages of each method. The two methods for depicting the classes are; Limit grouping Cut-point grouping Limit grouping works for large discrete data specifically because it reduces the number of classes which makes it easier to understand. Although it would be difficult to handle it with continuous data. On the other hand cut-point grouping is specifically for continuous data. 9. For quantitative data, we examined three types of grouping: single-value grouping, limit grouping, and cut point grouping. For each type of data given, decide which of these three types is usually best. Explain your answers. a. Continuous data is displayed in one or more decimal places CUT-POINT GROUPING b. Discrete data in which there are relatively few distinct observations. SINGLE-VALUE GROUPING 10. We used slightly different methods for determining the “middle” of a class with limit grouping and cut point grouping. Identify the methods and the corresponding terminologies. 11. Explain the difference between a frequency histogram and a relative-frequency histogram. The only difference between a frequency histogram and a relative frequency histogram is that the vertical axis uses relative or proportional frequency instead of simple frequency. 12. Explain the advantages and disadvantages of frequency histograms versus frequency distributions. The frequency histogram shows the distribution graphically, which is clearer than the numbers given by the frequency distribution. The frequency distribution gives the exact frequencies, while in the frequency histogram you can only estimate them. 13. Suppose that you have a data set that contains a large number of observations. Which graphical display is generally preferable: a histogram or a stem-and-leaf diagram? Explain your answer. STEM LEAF DIAGRAM 14. Suppose that you have constructed a stem-and-leaf diagram and discover that it is only moderately useful because there are too few stems. How can you remedy the problem? 15. Clearly differentiate between skewness and symmetry. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. 16. Explain the significance of modality. Modality shares its root with the word mode, meaning "the way in which something happens or is experienced." A sensory modality is a way of sensing, like vision or hearing. Modality in someone's voice gives a sense of the person's mood. SET 3 1. What is forecasting? Forecasting refers to the practice of predicting what will happen in the future by taking into consideration events in the past and present. 2. How do we execute qualitative forecasting? Which method is preferred? 3. Clearly differentiate between Time Series and Causal data. In time series, the independent variable is time. So every quantity in the time series is measured by time. Where in causal model one is cause and other is effect. 4. How do we apply forecasting for time series? 5. Clarify the concept of smoothing. the aim of smoothing is to give a general idea of relatively slow changes of value with little attention paid to the close matching of data values 6. Regarding a scatterplot, a. identifies one of its uses. It helps to get a general idea of the data. It also helps to identify the level in the frequency of data. c. what property should it have to obtain a regression line for the data? 7. Regarding the criterion used to decide on the line that best fits a set of data points, a. what is that criterion called? b. specifically, what is the criterion? 8. Regarding the line that best fits a set of data points, b. what is that line called? Regression line b. what is the equation of that line called? Regression Equation 9. What is the significance of the correlation and regression constants? The most commonly used techniques for investigating the relationship between two quantitative variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation.