EX2-01_solution - Institute for Applied Ecology

advertisement
Trawl Catch Statistics
Surname:
Student No.
Introduction
Many fisheries agencies keep detailed statistics on fish stocks, sampling specifically for that purpose from
research vessels. We have at hand trawl catch statistics for a coastal estuary for the years 1999 and 2000. The
data are in the form:
SMALLMOUTH_FLOUNDER
SPOT
BLUE_CRAB
BAY_ANCHOVY
ATLANTIC_CROAKER
1999
1999
1999
1999
1999
JUN
JUL
MAY
AUG
FEB
84
180
27
43
253
where the first column is the fish species, the second column is the year, the third column is the month and the
fourth column is fish length in mm. There are data for 53,856 fish in the dataset. In this exercise, you are asked
to interrogate the dataset to answer some questions of specific interest.
Part 1: Data Entry
Input the data to a workfile called TRAWL.
You will need to notify SAS that the species variable is a character variable of up to 21 characters, otherwise
your species names will be truncated to 8 characters. Do this with a
LENGTH SPECIES $ 15;
statement in the DATA step immediately before the INPUT statement.
Transform the length measurements from mm to cm with an assignment statement. Add a label to the length
variable reading "TOTAL LENGTH IN CM".
Paste your program code here.
DATA TRAWL;
INFILE "C:\AAAAA\TRAWL.DAT";
LENGTH SPECIES$ 21;
INPUT SPECIES$ YEAR MONTH$ LENGTH;
LENGTH=LENGTH/10;
RUN;
 Copyright Arthur Georges 2002
1
Confirm that that the data have been correctly input.
Outline what measures you took to confirm that the data had been correctly
input here.
1.
2.
Referred to the LOG Window to confirm that 53,856 data lines were read.
Perused the data in the EXPLORER Window to confirm contents.
Part 2: Summary by Species
Perform an appropriate analysis to yield summary statistics for each species, including only sample size,
minimum, maximum and mean fish size. Your programming solution to this question should include only a single
PROC step, and should make use of the BY statement. Do not forget to sort your data first.
Paste your program code here.
PROC SORT; BY SPECIES;
PROC MEANS DATA=TRAWL N MEAN MIN MAX;
VAR LENGTH;
BY SPECIES;
RUN;
Paste an extract of the tabular output from your program here.
------------------------------------- SPECIES=American_eel ------------------------------------The MEANS Procedure
Analysis Variable : LENGTH
Mean
Minimum
Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
26.0024510
13.3000000
64.4000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
----------------------------------- SPECIES=Atlantic_croaker ----------------------------------Analysis Variable : LENGTH
Mean
Minimum
Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
12.3799480
0.4000000
40.3000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
---------------------------------- SPECIES=Atlantic_menhaden ----------------------------------Analysis Variable : LENGTH
Mean
Minimum
Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
9.4760417
1.7000000
32.1000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
 Copyright Arthur Georges 2002
2
Part 3: Graphic Summary by Species
Perform an appropriate analysis to yield a barchart showing the relative abundance of the different species in the
trawl dataset.
Your analysis should yield a high quality barchart. Be sure to add a title to your graph.
Paste your program code here.
GOPTIONS RESET=ALL;
TITLE "BARCHART OF FISH SPECIES COUNTS";
PROC GCHART DATA=TRAWL;
HBAR SPECIES / TYPE=PCT;
RUN;
Paste graphic output from your program here.
 Copyright Arthur Georges 2002
3
Part 4: Size Distribution
Perform an appropriate analysis to yield a histogram showing the size distribution for the most abundant fish
species in the dataset. Use a WHERE statement to select only data for that fish species.
Your analysis should yield a high quality histogram. Be sure to add a title to your graph.
Paste your program code here.
GOPTIONS RESET=ALL;
PROC GCHART DATA=TRAWL;
TITLE "ATLANTIC CROAKER";
VBAR LENGTH / TYPE= PCT SPACE=0;
WHERE SPECIES="ATLANTIC_CROAKER";
RUN;
Paste graphic output from your program here.
Describe in words what you see.
1.
2.
The size distribution for Atlantic Croaker is bimodal, and certainly not normal.
The bimodality could have arisen from a recruitment event, or may represent sexual size
dimorphism. More background on the species is needed for a reasonable interpretation.
Calculate a full set of summary statistics for length of the above species.
Paste your program code here.
PROC UNIVARIATE DATA=TRAWL;
VAR LENGTH;
WHERE SPECIES="ATLANTIC_CROAKER";
RUN;
Paste the tabular output from your program here.
 Copyright Arthur Georges 2002
4
The UNIVARIATE Procedure
Variable: LENGTH
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
9236
12.379948
8.10755145
0.26513208
2022576.74
65.4893819
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
9236
114341.2
65.7323905
-1.2480235
607038.626
0.08436217
Basic Statistical Measures
Location
Mean
Median
Mode
Variability
12.37995
12.10000
3.00000
Std Deviation
Variance
Range
Interquartile Range
8.10755
65.73239
39.90000
15.70000
Tests for Location: Mu0=0
Test
-Statistic-
-----p Value------
Student's t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
146.7476
4618
21328233
<.0001
<.0001
<.0001
Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
40.3
29.7
24.4
22.5
20.0
12.1
4.3
3.0
2.5
1.6
0.4
Extreme Observations
----Lowest----
----Highest---
Value
Obs
Value
Obs
0.4
0.6
0.6
0.6
0.6
5925
5926
4481
3733
3732
38.0
38.0
38.4
38.7
40.3
1331
2299
1421
2860
1422
Prepare a complete statistical summary for fish length of the above species. Make sure that your summary
conforms to the standard outlined in the worked examples.
 Copyright Arthur Georges 2002
5
Paste your summary here.
1.
2.
3.
4.
5.
6.
Present the smallest and largest observed values.
Present the mean, standard error and sample size.
As the data are not normal, give also the median and mode(s)
The interquartile range is a useful measure of spread for non-normal data.
Define an extreme event, or an exceptionally large fish, in terms of percentiles (95th or
99th)
Do not forget to give the units of measurement.
Exercise 5: Transformations
Perform an appropriate analysis to yield a histogram showing the size distribution for Spotted Hake.
Your analysis should yield a high quality histogram. Be sure to add a title to your graph.
Paste your program code here.
GOPTIONS RESET=ALL;
PROC GCHART DATA=TRAWL;
TITLE "SPOTTED HAKE";
VBAR LENGTH / TYPE=PCT SPACE=0;
WHERE SPECIES="SPOTTED_HAKE";
RUN;
Paste graphic output from your program here.
Describe in words what you see.
1.
Non-normal uni-modal distribution strongly skewed to the right.
 Copyright Arthur Georges 2002
6
Clearly fish length for Spotted Hake is not normally distributed, but it is unimodal. Repeat the analysis on this
variable following a standard square root transformation and a log transformation.
The transformations are: Y' = LOG10 (Y+ 1) and Y' = SQRT(Y + ½)
Paste your program code here.
DATA NEW;
SET TRAWL;
LG=LOG10(LENGTH+1);
SQ=SQRT(LENGTH+0.5);
LABEL LG="LOG10(LENGTH)"
SQ="SQUARE ROOT (LENGTH)";
RUN;
GOPTIONS RESET=ALL;
PROC GCHART DATA=NEW;
TITLE "SPOTTED HAKE";
VBAR LENGTH SQ LG / TYPE=PCT SPACE=0;
WHERE SPECIES="SPOTTED_HAKE";
RUN;
Paste the graphic output from your program here.
SQUARE ROOT
LOG BASE 10
What was the effect of the transformation in each case?
1.
Square root transformation reduced the skewness, but was not strong enough to
remove it altogether.
 Copyright Arthur Georges 2002
7
2.
Log transformation converted distribution to a bell shaped curve. Suspect that the size
distribution of Spotted Hake may be normalized by a log transformation.
Calculate a full set of summary statistics for length of the above species after applying the transformation that
was most effective in normalizing the data.
Paste your tabular output here.
The UNIVARIATE Procedure
Variable: LG (LOG10(LENGTH))
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
890
1.0953837
0.10780875
0.21127011
1078.21285
9.84209931
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
890
974.891489
0.01162273
0.03109041
10.3326041
0.00361376
Basic Statistical Measures
Location
Mean
Median
Mode
Variability
1.095384
1.093422
1.093422
Std Deviation
Variance
Range
Interquartile Range
0.10781
0.01162
0.65642
0.14732
Tests for Location: Mu0=0
Test
-Statistic-
-----p Value------
Student's t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
303.1149
445
198247.5
<.0001
<.0001
<.0001
Tests for Normality
Test
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
0.996005
0.031558
0.138289
0.94168
<
>
>
>
W
D
W-Sq
A-Sq
0.0219
0.0304
0.0359
0.0186
Quantiles (Definition 5)
 Copyright Arthur Georges 2002
Quantile
Estimate
100% Max
1.434569
8
99%
95%
90%
1.369216
1.278754
1.240549
SPOTTED HAKE
21:30 Tuesday, July 24, 2001
3468
The UNIVARIATE Procedure
Variable: LG (LOG10(LENGTH))
Quantiles (Definition 5)
Quantile
Estimate
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
1.164353
1.093422
1.017033
0.963788
0.929419
0.857332
0.778151
Extreme Observations
------Lowest------
-----Highest-----
Value
Obs
Value
Obs
0.778151
0.778151
0.792392
0.832509
0.832509
890
210
286
889
367
1.38382
1.41330
1.42651
1.43297
1.43457
886
146
108
132
133
Histogram
1.425+*
.**
.*****
.***********
.******************
.****************************
.*******************************************
.***********************************
.*****************************************
.***********************
.************
.*****
.**
0.775+*
----+----+----+----+----+----+----+----+--* may represent up to 4 counts
SPOTTED HAKE
#
4
8
20
44
71
109
171
139
162
90
47
17
5
3
Boxplot
0
|
|
|
|
+-----+
|
|
*--+--*
+-----+
|
|
|
|
0
21:30 Tuesday, July 24,
2001 3469
The UNIVARIATE Procedure
 Copyright Arthur Georges 2002
9
Variable:
LG
(LOG10(LENGTH))
Normal Probability Plot
1.425+
*
|
****
|
*****+
|
*****
|
*****
|
*****
|
******
|
+****
|
******
|
******
|
******
| *****
|**+
0.775+*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
Succinctly summarise what you conclude about the Normality of fish length for the above species following
transformation. Include reference to supporting evidence.
Paste your summary here.
1.
The log transformation clearly normalised the size distribution of Spotted Hake, as
evidenced by the Shapiro-wilkes test, the histogram, and the probability plot.
Part 6: More complex graphics
Use the GROUP option on the VBAR statement to compare the size distributions of the two most common species
in the dataset.
Paste your program code here.
GOPTIONS RESET=ALL;
PROC GCHART DATA=TRAWL;
TITLE "ATLANTIC CROAKER VERSUS HOGCHOKER";
VBAR LENGTH / TYPE=PCT SPACE=0 GROUP=SPECIES;
WHERE SPECIES="HOGCHOKER" OR SPECIES="ATLANTIC_CROAKER";
RUN;
Paste graphic output from your program here.
 Copyright Arthur Georges 2002
10
Describe in words what you see.
1.
Clearly the size distributions of these species are very different. You should have
commented on the biomodality versus the unimodality, perhaps on the differing
maximum sizes of the two species, and on the possible reasons for the differences.
Use the GROUP option on the VBAR statement to compare the size distributions of the most common species in
1999 and 2000.
Paste your program code here.
GOPTIONS RESET=ALL;
PROC GCHART DATA=TRAWL;
TITLE "ATLANTIC CROAKER VERSUS HOGCHOKER";
VBAR LENGTH / TYPE=PCT SPACE=0 GROUP=YEAR;
WHERE SPECIES="ATLANTIC_CROAKER";
RUN;
Paste graphic output from your program here.
 Copyright Arthur Georges 2002
11
Describe in words what you see.
1.
The size distributions in the two consecutive years are very similar, though the total
catch in 2000 was somewhat less than in 1999.
Source
The length frequency data were kindly provided by the Virginia Institute of Marine Science, Juvenile Fish and
Blue Crab Trawl Survey. The web-based data retrieval system appears online
[http://www.fisheries.vims.edu/vimstrawldata/]
 Copyright Arthur Georges 2002
12
Download