Taak 1: Data en hun voorstelling

advertisement
Description of data sets
1. Airline
The number of passengers that did not show (business class and economy class) for 100
flights, (simulated) data: 2 variables, 100 observations.
Variables:
NS1: number of business class passengers that did not show
NSE: number of economy class passengers that did not show
2. Baguette
Information about baguettes sold at 74 locations in Belgium in March 2005, collected by
Test-Aankoop, a Belgian consumer organization: 5 variables, 74 observations.
Variables:
weight: the weight of the baguette in gram;
length: the length of the baguette in cm;
price: the price of the baguette in euro;
salt: the level of saltiness of the baguette: VS (very salty), S (salty), RS (reasonably salty),
RG (reasonable to good), G (good), VG (very good);
taste: quality of the baguette: P (poor), poor to average (poor to average), A (average),
average to good (AG), G (good), VG (very good).
3. Breaking strength
Breaking strength and the logarithm of breaking strength of wool in kg/20 threads: 2
variables, 50 observations (published data).
Variables:
breaking strength
ln(breaking srtrength)
4. Cars
11 characteristics of 38 small cars (data collected by Testaankoop, a Belgian consumer
organization): 11 variables, 38 observations.
Variables:
type: berline, monovolume, break
fuel; diesel, gasoline
cylinder: in π‘π‘š3
power: kilowatt
fiscal horsepower
doors: number of doors
load capacity: in π‘‘π‘š3
length: length of the car in meter
mileage: liter/100km
price(2003): catalog price in May 2003
cost/100km: assuming 15000km/year, amortization over 7 years
5. Decathlon2011
Best results of the top 130 decathlon male athletes in 2011 on the 10 disciplines of a
decathlon: 15 variables, 130 observations
Variables:
Name: name of the athlete
Nationality: nationality of the athlete
100 m: time over 100 meter dash in seconds
long jump: distance in meter
shot put: distance in meter
high jump: height in meter
400 m: time over 400 meter in seconds
110 m hurdles: time over 100 meter hurdles in seconds
disk throwing: distance in meter
pole vault: height in meter
javelin: distance in meter
1500 m: time over 1500 meter in seconds
day 1 points: points collected after the first day (first 5 disciplines)
day 2 points: points collected on the second day (remaining 5 disciplines)
total points: total number of points over the two days of competition
Points are computed as a polynomial of the results on the different disciplines. As an
example, the points collected for a distance of y cm in the long jump is the term (Excel):
= π‘‡π‘…π‘ˆπ‘πΆ[0.14354 ∗ π‘ƒπ‘‚π‘ŠπΈπ‘…(100 ∗ 𝑦 − 220: 1.4)]. All the details for the remaining
disciplines can be found in the data set.
6. Euroweight
weight of 8 batches for a total of 2000 coins of €1, data as described in Ziv Shkedy, Marc
Aerts and Herman Callaert, The weight of Euro coins: its distribution may not be as
normal as you would expect, Journal of Statistics Education, vol.14, number 2 (2006): 3
variables, 2000 observations.
Variables:
coin: number of the coin
weight: weight in gram
class: batch in which the coin was produced, 8 batches of 250 coins each.
7. Forbes2010
The data set is based on the Forbes list of the major 2000 companies worldwide as
perceived by the Forbes Company in 2010. For each company two qualitative variables
are listed (country and industry) and four quantitative variables (in billions of dollars):
Sales, Profits, Assets and Market Value. Four companies were dropped from the list
because either the information was incomplete (no profit figure for CIT Group at rank
1041, for Charter Common at rank 1408 and for Lear at rank 1860) or inconsistent (Sales
of 0 and positive profit for OGC at rank 1461). As a result the list contains 1996
companies:
Variables:
Rank: rank of the company in the list
Company: name of the company
Country: location of the company
Industry: sector in which the company operates
Sales: sales in 2010 (billions of dollars)
Profits: profits made by the company in 2010 (billions of dollars)
Assets: total assets of the company in 2010 (billions of dollars)
Market value: market value (billions of dollars)
8. Pils2011
Characteristics of 44 Pilsner beers sold on the Belgian market, collected by Testaankoop,
a Belgian consumer organization: 8 variables, 44 observations.
Variables:
brand: brand of the beer
price: price per bottle or can (€)
content: content of the bottle or can (cl)
recipient: bottle (b) or can (c)
alcohol: percentage alcohol of the beer
label: quality of the label to correctly and completely describe the content
taste: evaluation of the taste by a panel: good (G), average (A), bad (B)
score: overall score assigned by Testaankoop
9. Rice
Weight of the rice content of boxes by different fillers. The objective is to fill the boxes
with 50 gram of rice besides other ingredients. Different fillers are used. 20 observations
are available for each filler (data provided by a company): 5 variables, 20 observations.
Variables:
Observation: observation 1 to 20
filler 1: weight of boxes filled by filler 1 (in gram)
filler 2: weight of boxes filled by filler 2 (in gram)
filler 3: weight of boxes filled by filler 3 (in gram)
filler 4: weight of boxes filled by filler 4 (in gram)
10. Sabena
The data set contains information about all incoming SABENA flights into Brussels from
other European airports during one spring and summer season in the late 1990’s. The set
contains 3845 flights: 16 variables, 3854 observations.
Variables:
INDEX: flights numbered 1 to 3854
DATE: date of the flight
FLIGHT NUMBER: the SABENA flight number
AIRCRAFT-ID: code identifying the aircraft
AIRCRAFT-TYPE: type of aircraft
LINE STATION-DEP: the airport the flight is coming from:
BHX: Birmingham
BOD: Bordeaux
BRS: Bristol
BUD: Budapest
CPH: Copenhagen
DUS: Dusseldorf
EDI: Edinburgh
FLR: Florence
GLA: Glasgow
HAJ: Hannover
HAM: Hamburg
LBA: Leeds-Bradford
LCY: London City
MRS: Marseille
NAP: Naples
NCL: Newcastle
SXB: Strasbourgh
THF: Tempelhof (Berlin)
TLS: Toulouse
TRN: Turin;
STD: scheduled time of departure
ATD: actual time of departure
DELAY TIME DEP: delay time at departure (in minutes)
DR1: code for (first) cause of delay at departure;
DR1-LENGTH: delay time as a result of DR1;
DR2: code for (second) type of delay at departure;
DR2-LENGTH: delay time as a result of DR2;
STA: scheduled time of arrival at Brussels;
ATA: actual time of arrival at Brussels ;
DELAY TIME ARR: delay time at arrival in Brussels (in minutes)
11. Tennis balls
Diameter of 30 tennis balls (simulated data): 1 variable, 30 observations
Variables:
diameter: diameter of tennis balls (in mm)
12. TV2011
Characteristics of 46 brands of TV sets. Data collected by Testaankoop, a Belgian
consumer organization: 12 variables, 46 observations.
Variables:
TV: name of the brand
size: size of the screen (inches)
min price: lowest price recorded by Testaankoop
max Price: highest price recorded by Testaankoop
screen: type of screen (lcd, led-lcd)
scart connections: number of scart connections
HDMI connections: number of HDMI connections
diversity: quantity and quality of options (1 to 5)
reflection: reflection of the screen (1 to 5)
picture HD: quality of the picture of high definition broadcasts (1 to 5)
kwh/year: electricity used per year on a basis of 4 hours of viewing per day
evaluation: global evaluation by Testaankoop (score of 0 to 100)
Download