Homework #1
Due: October 23, 2024
Statistics for the Sciences – STAT1001
University of the West Indies- Department of Mathematics
Shevanese Thomas (620159195)
Adrianna Je- Annie (620154882)
O’shaunalee Duncan (620153962)
Dasia-Ann Lyons (620171936)
Shawn Henry (03021392)
1.
Do all questions and show all workings
Based on the situations below, identify whether the variable is quantitative or qualitative.
i)
The current temperature inside the classrooms on campus.
Ans: Quantitative
ii)
The types of beverages preferred by the students of UWI, Mona.
Ans: Qualitative
iii)
The rank held by the officers at a military base.
Ans: Qualitative
iv)
The amount of air inside balloons at a party.
Ans: Quantitative
2.
3.
Stem plot showing the weight of 22 members of the cricket team.
Below shows the following on the dataset:
9, 10,9,7,12.5, 14, 13.5, 11, 12, 14, 13, 15
Calculate the following on the dataset:
a) Find the median
7, 9, 9, 10, 11, 12, 12.5, 13, 13.5, 14, 14, 15
𝑛
𝑛
12
12
M=( 2 ) + ( 2 + 1) ÷ 2= 2 + ( 2 + 1) ÷2
= 6+ (6+1)
13
= 2 = 6.5
b) Find the average
n=12
=
7+9+9+10+11+12+12.5+13+13.5+14+14+15
12
140
= 12 = 11.67
c) Find the range
Range= 15-7=8
d) Find the standard deviation
Mean= 11.67
𝑋𝑖
𝑋𝑖 −
𝑥
(7-11.67) = -4.67
(9-11.67) = -2.67
(9-11.67) = -2.67
(10-11.67) = -1.67
(11-11.67) = -0.67
(12-11.67) = 0.33
(12.5-11.67) = 0.83
(13-11.67) = 1.33
(13.5-11.67) = 1.83
(14-11.67) =2.33
(14-11.67) =2.33
(15-11.67) = 3.33
7
9
9
10
11
12
12.5
13
13.5
14
14
15
∑𝑛𝑖=1 (𝑋𝑖 − )
𝑥
2
÷𝑛−1
= 61.7379 ÷ 12 – 1
= 5.6269
s= √5. 6269
= 2.3721
e) Find the variance
s2=2.37212 = 5.6269
f) Produce the five number summary
7+ 9+ 9+ 10+ 11+ 12+ 12.5+ 13+ 13.5+ 14+ 14+ 15
Min: 7
Max: 15
(𝑋𝑖 − )2
𝑋
(-4.67)2 = 21.8089
(-2.67)2 = 7.1289
(-2.67)2 = 7.1289
(-1.67)2 = 2.7889
(-0.67)2 = 0.4489
(0.33)2 = 0.1089
(0.83)2 = 0.6889
(1.33)2 = 1.7689
(1.83)2 = 3.3489
(2.33)2 = 5.4289
(2.33)2 = 5.4289
(3.33)2 = 11.0889
Σ = 61.7379
Q2 :6.5
Q1: 9.5 (9+10) ÷(2) = 9.5
Q3: 8.75 (3.5+14) ÷ (2) = 8.75
g)
h)
Q1 = 9.5, Q2=6.5, Q3=8.75
IQR= Q3-Q1= 8.75-9.50= -0.75
U= Q3+1.5∗ IQR= 8.75+1.5∗ -0.75= 7.625
L= Q1-1.5∗IQR=9.5-1.5∗ -0.75= 10.625
Low Outlier(s)= 15 is an outlier because it lies above 7.625
High Outlier(s)= 12.5 is an outlier because it lies above 10.625
4.
(a): Describe the overall shape of the distribution of the monthly returns.
Ans:
Since the highest bars in the histogram are to the right, with a tail of smaller bars to the left, the
distribution is skewed to the left. The distribution appears to have three outliers because there are two bars
(one at 1 month and the other at 2 months) that are separated from the other bars of the histogram by a
gap. The gaps are roughly at -20 and -13, with two potential outliers between -25 and -22.5 and one other
between -17.5 and -15.
(b) What is the approximate center of this distribution?
Ans:
The center of the distribution can typically be described by the mean or median. The highest bar has
monthly percent returns ranging from 0% to 2.50%, we estimate the center of this distribution to be
between 0% and 2.50%.
(c) Approximately what were the smallest and largest monthly returns, leaving out the outliers?
Ans:
•
The leftmost bar (excluding the outliers) in the histogram represents monthly percent returns on
common stocks ranging from –12.5 to -10 percent, implying that the minimum return is between
–12.5 and -10 percent.
•
While the rightmost bar in the histogram represents returns ranging from 12.5 to 15 percent,
which means the maximum returns is between 12.5 and 15 percent
(d) A return less than zero means that stocks lost value in that month. About what percent of all
months had returns less than zero.
Ans: Consider the number of months with percentage less than zero
Monthly percent (Intervals)
Number of Months (Frequency)
-25<-25
2
-22.5<-20
0
-20<-17.5
0
17.5<-15
1
-15<-12.5
0
-12.5<-10
4
-10<-7.5
6
-7.5<-5
11
-5<-2.5
28
-2.5<0
50
Hence the total number of months with negative percentage is: 102
Now:
5. Among persons aged 15 to 24 years in the United States, the leading causes of death and
number of deaths in 2008 were: accidents – 14,020; homicide – 5285; suicide – 4297; cancer –
1659; heart disease -1059; congenital defects – 466.
(a) Make a diagram of the above data.
(b) Describe the diagram.
•
•
•
•
Accidents are by far the leading cause of death in this age group, with 14,020 deaths,
making it the most significant factor compared to other causes.
Homicide and Suicide follow as the second and third leading causes, with 5,285 and
4,297 deaths, respectively.
Cancer and Heart Disease account for much fewer deaths, with 1,659 and 1,059 deaths.
Congenital Defects have the lowest count, contributing to 466 deaths in this group
6. It appears that people who are mildly obese are less active than leaner people. One study
looked at the average number of minutes per day that people spend standing or walking.
Among mildly obese people, minutes of activity varied according to the N(373, 67) distribution.
Minutes of activity for lean people had the N(526, 107) distribution. Within
what limits do the active minutes for about 95% of the people in each group fall? Use the 6895-99.7 rule. [5]
For mildly obese people:
Mean (μ) = 373 minutes
Standard deviation (σ) = 67 minutes
95% of the active minutes fall within:
μ ± 2σ = 373 ± 2(67) = 373 ± 134 = [239, 507]
For lean people:
Mean (μ) = 526 minutes
Standard deviation (σ) = 107 minutes
95% of the active minutes fall within:
μ ± 2σ = 526 ± 2(107) = 526 ± 214 = [312, 740]
∴ For mildly obese people, about 95% of the active minutes fall between 239 minutes and 507
minutes.
For lean people, about 95% of the active minutes fall between 312 minutes and 740 minutes.
7. The length of human pregnancies from conception to birth varies according to a distribution
that is approximately Normal with mean 266 days and a standard deviation 16 days. Draw an
appropriately shaded and labeled Normal curve to accompany your answer to EACH of the
questions below:
• Mean (μ) = 266 days
• Standard deviation (σ) = 16 days
Convert the given time intervals into z-scores using the formula:
z = (x - μ) / σ
where x is the specific value (240 days, 300 days, 270 days)
Part (a): What percent of pregnancies last less than 240 days?
• Calculate the z-score for 240 days:
Z= 240-266/16 = -26/16 = -1.625
Using a z-table or normal distribution calculator, the proportion corresponding to z = −1.625z is
approximately 0.052 or 5.2%. This means about 5.2% of pregnancies last less than 240 days.
Part (b): What percent of pregnancies last more than 300 days?
• Calculate the z-score for 300 days:
Z= 300-266/16 =34/16 = 2.125
Using a z-table or calculator, the proportion corresponding to z = 2.125 is approximately 0.983
or 98.3%. Since the question asks for the percentage that lasts more than 300 days, we subtract
this from 1:
1− 0.983=0.017
About 1.7% of pregnancies last more than 300 days.
Part (c): What percent of pregnancies last between 240 and 270 days?
•
•
Calculate the z-score for 240 days (as done in part (a)):
z = −1.625
Calculate the z-score for 270 days:
z= 270-266/16 = 4/16 = 0.25
Using the z-table:
• The proportion corresponding to z = −1.625 is 0.052.
• The proportion corresponding to z = 0.25 is 0.5987.
To find the percentage of pregnancies lasting between 240 and 270 days, subtract the proportion
for 240 days from that of 270 days:
0.5987 − 0.052 = 0.54670
Approximately 54.7% of pregnancies last between 240 and 270 days.
8. The time until recharge for a battery in a laptop computer under common conditions is
normally distributed with a mean of 260 minutes and a standard deviation of 50 minutes. What
value of life in minutes is exceeded with 95% probability? [6]
x = values in minutes
μ = mean = 260 minutes
σ = standard deviation = 50 minutes
100% - 95% = 5% = 0.05
z = z-score = -1.645
z = (x - μ) / σ
x = z * σ + μ = (-1.645) * 50 + 260 = 177.75
∴ The battery life that is exceeded with 95% probability is approximately 177.75 minutes.
9. Summary of data using Rstudio:
summary(diabetes)
id
chol
stab.glu
hdl
ratio
glyhb
location
Min.
: 1000
Min.
: 78.0
Min.
: 48.0
Min.
: 12.00
Min.
:
1.500
Min.
: 2.68
Length:403
1st Qu.: 4792
1st Qu.:179.0
1st Qu.: 81.0
1st Qu.: 38.00
1st Qu.:
3.200
1st Qu.: 4.38
Class :character
Median :15766
Median :204.0
Median : 89.0
Median : 46.00
Median :
4.200
Median : 4.84
Mode :character
Mean
:15978
Mean
:207.8
Mean
:106.7
Mean
: 50.45
Mean
:
4.522
Mean
: 5.59
3rd Qu.:20336
3rd Qu.:230.0
3rd Qu.:106.0
3rd Qu.: 59.00
3rd Qu.:
5.400
3rd Qu.: 5.60
Max.
:41756
Max.
:443.0
Max.
:385.0
Max.
:120.00
Max.
:19.300
Max.
:16.11
NA's
:1
NA's
:1
NA's
:1
NA's
:13
age
gender
height
weight
frame
bp.1s
bp.1d
Min.
:19.00
Length:403
Min.
:52.00
Min.
: 99.0
Length:403
Min.
: 90.0
Min.
: 48.00
1st Qu.:34.00
Class :character
1st Qu.:63.00
1st Qu.:151.0
Class
:character
1st Qu.:121.2
1st Qu.: 75.00
Median :45.00
Mode :character
Median :66.00
Median :172.5
Mode
:character
Median :136.0
Median : 82.00
Mean
:46.85
Mean
:66.02
Mean
:177.6
Mean
:136.9
Mean
: 83.32
3rd Qu.:60.00
3rd Qu.:69.00
3rd Qu.:200.0
3rd Qu.:146.8
3rd Qu.: 90.00
Max.
:92.00
Max.
:76.00
Max.
:325.0
Max.
:250.0
Max.
:124.00
NA's
:5
NA's
:1
NA's
:5
NA's
:5
bp.2s
bp.2d
waist
hip
time.ppn
Min.
:110.0
Min.
: 60.00
Min.
:26.0
Min.
:30.00
Min.
:
5.0
1st Qu.:138.0
1st Qu.: 84.00
1st Qu.:33.0
1st Qu.:39.00
1st Qu.:
90.0
Median :149.0
Median : 92.00
Median :37.0
Median :42.00
Median :
240.0
Mean
:152.4
Mean
: 92.52
Mean
:37.9
Mean
:43.04
Mean
:
341.2
3rd Qu.:161.0
3rd Qu.:100.00
3rd Qu.:41.0
3rd Qu.:46.00
3rd Qu.:
517.5
Max.
:238.0
Max.
:124.00
Max.
:56.0
Max.
:64.00
Max.
:1560.0
NA's
:262
NA's
:262
NA's
:2
NA's
:2
NA's
:3
>
10. List the qualitative variables from the dataset.
Ans:
Location
Frame
Gender
11.
Since the highest bars in the histogram are to the left, with a tail of smaller bars to the right, the
distribution is skewed to the right meaning that there are more younger persons in the study than older
person.