Math 120

advertisement
Math 135
Review for Test 1
Data Analysis and Summary
Part I: Short Answer
1. You are conducting a study at North Trust Bank to determine if there is a relationship between job classification and job
satisfaction with the company's employees. Job satisfaction is measured by a survey score based on a 100 point scale (a
score of 100 points indicates an employee is extremely satisfied with their job). The information below shows a random
sample of employee survey scores selected from each job classification type:
Job Classification
Hourly - Craft
Hourly - Clerical
Salaried
Sample of job satisfaction
survey scores
32, 38, 45, 55, 65, 66, 68, 70
44, 52, 55, 56, 65, 68, 76, 94
37, 48, 48, 65, 77, 78, 88, 89
x-bar
s
a) What type of data relationship is being investigated here? (e.g. Q-Q, C-C, C-Q)?
b) Construct an appropriate graph to help you determine if a clear relationship exists between job classification and
job satisfaction.
c) Summarize your findings from your graph. Does a clear relationship exist? Defend your answer.
d) Calculate the mean, x-bar and standard deviation, s, for the job satisfaction score for each job class. You can list
these values in the table above. Does this information support your conclusion in part c? Explain.
2.
a)
Using the information from question # 1…. Construct an appropriate graph of just the hourly clerical job
satisfaction survey scores. Describe the shape and report the appropriate numerical summaries for center and
spread. Justify your description of the shape of the graph with a computation. Are there any outliers? Justify
your answer with a computation. Explain.
b) The following table shows a count of employees from North Trust Bank by job classification. Make an
appropriate graph/chart to display the information from the table. Summarize your findings.
GRAPH
Job
Classification
Hourly - Craft
Hourly - Clerical
Salaried
#
employees
85
20
15
c) Based on the information in your graph from part (b) and the survey scores from Question # 1, what strategy
would you think North Trust management might want to consider with regard to their employees and job
satisfaction?
1
3. North Trust Bank also wants to determine if a relationship exists between job tenure (# of years with the
company) and salary (for the employees in the internal audit department). The following table displays a random
sample of employees taken from the North Trust database:
Employee
#
Job tenure
(in years)
Annual Salary
Employee
Job tenure (in Annual Salary
(current year
#
years)
(current year only; in
only; in
thousands)
thousands )
1
15
50
32
12
66
7
5
35
45
8
48
13
.7
38
47
38
47
24
25
75
83
22
52
a) What is the relationship type investigated in this study? Which variable would be the explanatory variable? Which is the
response variable?
b) Make an appropriate graph to investigate whether a relationship seems to exist between these 2 variables.
Describe the relationship shown in the graph in terms of form, direction and strength.
c) Fit a linear regression model based on the data and add the line to your graph. Report the regression line and
appropriate numerical summaries. How strong is the linear relationship numerically? Describe in words what
the slope of the regression line represents in this situation.
d) Are there any potential outliers in your plot? Are the outliers influential? Remove any suspected outliers (there
should be one) and recalculate the regression line and correlation. Use the new regression line to predict the salary
amount for someone who has worked for North Trust for 17 years.
e) Construct a plot of the residuals (using the new regression line) using Excel. Based on the residual plot, does the
regression line provide a “good fit” for the data? Explain.
4. Is there a relationship between the treatment used for cocaine addiction and whether the patient has a relapse or not? A
study was conducted and the following data were collected:
Cocaine relapse?
Yes
No
Treatment
Desipramine
10
14
Lithium
18
6
Placebo
20
4
a) Identify the explanatory and response variables.
b) Calculate the conditional distribution of cocaine relapse given the treatment type and draw the appropriate bar
graphs (i.e. draw a bar graph for each value of the explanatory variable).
c) Does there appear to be a relationship? Explain.
5. You purchase a shipment of 60-watt bulbs to be used in a variety of your products. You want to determine if the
shipment of bulbs is different from 60 or not. You measure the wattage of a random sample of 20 bulbs. Set up the
appropriate null and alternative hypotheses for this scenario.
6. Classify each variable in the table below as either categorical or quantitative:
Patient Name
Illness type
Pain Level (1-10)
Pulse Rate (bpm)
(1=Heart;
2=Lung)
Creek, Martin
1
9
76
Dade, Susan
2
3
68
Kidman, Bart
1
2
82
2
Insurance
MVP
CDPHP
MVP
7. Suppose you have a list of temperatures ( x i ) measured in degrees Celsius and you want to change the temperature
values to be measured in Fahrenheit. What effect would be produced on the old mean and old standard deviation
when this conversion is completed?
8. According to Current Population Reports, self-employed individuals in the US work an average of 44.6 hours per
week with a standard deviation of 14.5. If this variable is approximately normally distributed,
a) What percent of the self employed work more than 40 hours per week?
b) What percent of the self employed work less than 50 hours per week?
c) Between 50 and 60 hours per week?
Part II Multiple Choice:
1. High levels of glucose in the blood are indications of diabetes, which is becoming more prevalent in
the United States. Diabetes can lead to many complications such as blindness and heart disease. A
random sample of 180 individuals had their blood sugar level measured. The results are displayed
in the graph.
The shape of the distribution of blood glucose levels is
a. Unimodal, left skewed.
b. Bimodal.
c. Unimodal, right skewed.
2. The 5-number summary of scores on a test is
35
60
65
70
90
Based on this information
a. There are no outliers.
b. There are low outliers.
c. There are both high and low outliers.
3
3. Too much cholesterol in the blood increases the risk of heart disease. The cholesterol levels of
young women aged 20 to 34 vary approximately normally with mean 185 milligrams per deciliter
(mg/dl) and standard deviation 39 mg/dl. About what percent of young women in this age group
will have cholesterol levels less than 150 mg/dl?
a. 90%.
b. 18.5%.
c. 81.5%.
4. Too much cholesterol in the blood increases the risk of heart disease. The cholesterol levels of
young women aged 20 to 34 vary approximately normally with mean 185 milligrams per deciliter
(mg/dl) and standard deviation 39 mg/dl. Cholesterol levels for middle-aged men vary normally
with mean 222 mg/dl and standard deviation 37 mg/dl. Sandy is a young woman with a cholesterol
level of 220. Her father has a cholesterol level of 250. Who has relatively higher cholesterol?
a. Sandy.
b. Sandy's father.
c. Impossible to tell because of the scaling.
5. The lifetime of a 2-volt non-rechargeable battery in constant use has a normal distribution with a
mean of 516 hours and a standard deviation of 20 hours. The proportion of batteries with lifetimes
exceeding 520 hours is approximately
a. 0.2000.
b. 0.5793.
c. 0.4207.
6. The lifetime of a 2-volt non-rechargeable battery in constant use has a normal distribution with a
mean of 516 hours and a standard deviation of 20 hours. 90% of all batteries have a lifetime less
than
a. 517.28 hours.
b. 536.00 hours.
c. 541.60 hours
7. The most common intelligence quotient (IQ) scale is normally distributed with mean 100 and
standard deviation 15. Many school districts across the country seek to identify "Gifted and
Talented" children for special enrichment programs. Typically, these children must have IQ scores
in the top 5%. What is the minimum score to qualify a child for these programs?
a. 130.
b. 125.
c. 115.
4
8. The scores on the Survey of Study Habits and Attitudes (SSHA) for a sample of 150 first-year
college women produced the following boxplot and descriptive statistics using MINITAB.
The number of women with scores between 93.26 and 129.23 is
a. about 75.
b. about 50.
c. about 36.
9. A teacher gave a 25 question multiple choice test. After scoring the tests, she computed a mean
and standard deviation of the scores. The standard deviation was 0. Based on this information
a. All the students had the same score.
b. She must have made a mistake.
c. About half the scores were above the mean.
10. A major study examined the relationship between cause of death (heart attack, cancer, stroke,
accident, etc.) and age. A good way to graphically represent the relationship is with
a. side-by-side boxplots.
b. back-to-back stemplots.
c. a scatterplot
11. At a large department store, the amount a shopper spent and the shopper's gender (male or
female) were recorded. To determine if gender is useful in explaining the amount of money a
shopper spends at the store we could
a. make side-by-side boxplots of the distribution of the amount spent by males and the
distribution of the amount spent by females.
b. compute the correlation between the amount spent and gender.
c. compute the least-squares regression line of amount spent on gender.
12. The regression line to predict average exam grade from hours of study is y = 15 + 5.6*x. The
slope of the regression line indicates
a. for any student, an extra hour of study increases the grade 5.6 points.
b. on average, an extra hour of study will increase the grade 5.6 points.
c. an extra hour of study will increase the grade 15 points.
5
13. A survey of 1000 adults ages 30 to 35 is conducted. The number of years of schooling and the
annual salary for each person in the survey is recorded. The correlation between years of schooling
and annual salary is found to be 0.27. Suppose instead the average salary of all individuals in the
survey with the same number of years of schooling was calculated and the correlation between
these averages and years of schooling was computed. This correlation would most likely be
a. equal to 0.27.
b. larger than 0.27.
c. less than 0.27.
14. High levels of glucose in the blood are indications of diabetes, which is becoming more prevalent in
the United States. Diabetes can lead to many complications such as blindness and heart disease. A
random sample of 180 individuals had their blood sugar level measured. The 5-number summary
was
52
79
91
119
220
How many of the people in the sample had glucose levels above 119?
a. 25.
b. 135
c. 45.
15. Below is a plot of the Olympic gold medal winning performance in the high jump (in inches) for the
years 1900 to 1996.
From this plot, the correlation between the winning height and year of the jump is
a. about 0.95.
b. about 0.10.
c. about -0.50.
6
Download