CSE 7331 Fall 2011 Project I Dataset Description The training and test datasets used in project I are CSV files. Each row in the file corresponds to feature values for a storm at a point in time. Data for each storm is grouped together and separated from other storms by a row of all zeros. The data originally is produced by NOAA and then preprocessed by Professor Mark DeMaria at Colorado State. The dataset for this project is then reformatted slightly. Data for each storm is obtained at 6 hour intervals and listed in order in the data set. The training dataset contains data for the named Atlantic tropical cyclones of seasons from 1982 to 2010. Each hurricane in the dataset is recorded every 6 hours. Here we use the same feature set as the SHIPS of version 2009, which is composed of 23 independent and dependent numerical features. Among them, VMAX is our target prediction feature, which is the initial storm maximum intensity in knot. At the early stage of our research, we found that the current state is related to not only one previous state, but also several most recent states as well. To improve the prediction of PIIH, we added six new features to provide more historic information. We first create features PER12, PER18, and PER24, which are changes in intensity of previous 12, 18 and 24 hours. Based on features PER12, PER18, and PER24, features VPER12, VPER18, and VPER24 are defined as PER12*VMAX, PER18*VMAX and PER24*VMAX$. We also create one extra feature named VPC20 which is defined as VMAX*PC20 because we found that the feature PC20 has a high weight during the learning process. Therefore, 7 extra features are added to the original dataset. It forms a dataset with totally 30 features. Thus each row in the dataset contains 30 columns The table below summarizes these features. Feature VMAX Description Current Maximum intensity in knot ADAY Gaussian function of Julian day - peak value SPDX Zonal component of initial storm motion PSLV Vertical Depth POT Possible intensity - initial intensity T200 Average 200 mb temperature within 1000 km of storm center EPOS Surface - 200 hPa deviation of lifted parcel Z850 Average 850 mb vorticity within 1000km of storm center 1 2 3 4 5 6 7 8 D200 Average 200-mb divergence within 1000 km of storm center 9 10 11 12 13 14 15 VSHR POT2 T250 LHRD TWAT SHDC 16 17 18 19 20 21 RHMD PC20 GSTD RHCN SDIR SHGC 22 23 24 25 PER6 VPER6 PER12 26 27 28 29 30 VPER12 PER18 VPER18 PER24 VPER24 VPC20 Quadratic variable VMAX x SHRD Quadratic variable POT x POT Same as above for 250 mb temperature (deg C *10) SHR times the sine of the initial storm latitude GFS mean tangential wind Same as SHRD but with vortex removed and averaged from 0-500 km relative to 850 mb vortex center Same as RHLO for 700-500 mb GOES Predictor VMAX x GOES Predictor Ocean heat content (KJ/cm2) from satellite altimetry data Reference direction for shear direction predictor (sdp) Same as SHRG but with vortex removed and averaged from 0-500 km relative to 850 mb vortex center Change in intensity to the previous 6 hour change Quadratic variable (PER6 x VMAX) Change in intensity to the previous 12 hour change Quadratic variable (PER12 x VMAX) Change in intensity to the previous 18 hour change Quadratic variable (PER18 x VMAX) Change in intensity to the previous 24 hour change Quadratic variable (PER24 x VMAX) VMAX x PC20 Predictor