CSE 7331 Fall 2011 Project I Dataset Description The training and

advertisement
CSE 7331 Fall 2011
Project I Dataset Description
The training and test datasets used in project I are CSV files. Each row in the file corresponds to feature
values for a storm at a point in time. Data for each storm is grouped together and separated from other
storms by a row of all zeros. The data originally is produced by NOAA and then preprocessed by
Professor Mark DeMaria at Colorado State. The dataset for this project is then reformatted slightly.
Data for each storm is obtained at 6 hour intervals and listed in order in the data set.
The training dataset contains data for the named Atlantic tropical cyclones of seasons from 1982 to
2010. Each hurricane in the dataset is recorded every 6 hours. Here we use the same feature set as the
SHIPS of version 2009, which is composed of 23 independent and dependent numerical features.
Among them, VMAX is our target prediction feature, which is the initial storm maximum intensity in
knot.
At the early stage of our research, we found that the current state is related to not only one previous
state, but also several most recent states as well. To improve the prediction of PIIH, we added six new
features to provide more historic information. We first create features PER12, PER18, and PER24,
which are changes in intensity of previous 12, 18 and 24 hours. Based on features PER12, PER18, and
PER24, features VPER12, VPER18, and VPER24 are defined as PER12*VMAX, PER18*VMAX and
PER24*VMAX$. We also create one extra feature named VPC20 which is defined as VMAX*PC20
because we found that the feature PC20 has a high weight during the learning process. Therefore, 7 extra
features are added to the original dataset. It forms a dataset with totally 30 features. Thus each row in
the dataset contains 30 columns The table below summarizes these features.
Feature
VMAX
Description
Current Maximum intensity in knot
ADAY
Gaussian function of Julian day - peak value
SPDX
Zonal component of initial storm motion
PSLV
Vertical Depth
POT
Possible intensity - initial intensity
T200
Average 200 mb temperature within 1000 km of storm center
EPOS
Surface - 200 hPa deviation of lifted parcel
Z850
Average 850 mb vorticity within 1000km of storm center
1
2
3
4
5
6
7
8
D200
Average 200-mb divergence within 1000 km of storm center
9
10
11
12
13
14
15
VSHR
POT2
T250
LHRD
TWAT
SHDC
16
17
18
19
20
21
RHMD
PC20
GSTD
RHCN
SDIR
SHGC
22
23
24
25
PER6
VPER6
PER12
26
27
28
29
30
VPER12
PER18
VPER18
PER24
VPER24
VPC20
Quadratic variable VMAX x SHRD
Quadratic variable POT x POT
Same as above for 250 mb temperature (deg C *10)
SHR times the sine of the initial storm latitude
GFS mean tangential wind
Same as SHRD but with vortex removed and averaged from 0-500 km relative
to 850 mb vortex center
Same as RHLO for 700-500 mb
GOES Predictor
VMAX x GOES Predictor
Ocean heat content (KJ/cm2) from satellite altimetry data
Reference direction for shear direction predictor (sdp)
Same as SHRG but with vortex removed and averaged from 0-500 km relative
to 850 mb vortex center
Change in intensity to the previous 6 hour change
Quadratic variable (PER6 x VMAX)
Change in intensity to the previous 12 hour change
Quadratic variable (PER12 x VMAX)
Change in intensity to the previous 18 hour change
Quadratic variable (PER18 x VMAX)
Change in intensity to the previous 24 hour change
Quadratic variable (PER24 x VMAX)
VMAX x PC20 Predictor
Download