Description of data sets 1. Airline The number of passengers that did not show (business class and economy class) for 100 flights, (simulated) data: 2 variables, 100 observations. Variables: NS1: number of business class passengers that did not show NSE: number of economy class passengers that did not show 2. Baguette Information about baguettes sold at 74 locations in Belgium in March 2005, collected by Test-Aankoop, a Belgian consumer organization: 5 variables, 74 observations. Variables: weight: the weight of the baguette in gram; length: the length of the baguette in cm; price: the price of the baguette in euro; salt: the level of saltiness of the baguette: VS (very salty), S (salty), RS (reasonably salty), RG (reasonable to good), G (good), VG (very good); taste: quality of the baguette: P (poor), poor to average (poor to average), A (average), average to good (AG), G (good), VG (very good). 3. Breaking strength Breaking strength and the logarithm of breaking strength of wool in kg/20 threads: 2 variables, 50 observations (published data). Variables: breaking strength ln(breaking srtrength) 4. Cars 11 characteristics of 38 small cars (data collected by Testaankoop, a Belgian consumer organization): 11 variables, 38 observations. Variables: type: berline, monovolume, break fuel; diesel, gasoline cylinder: in ππ3 power: kilowatt fiscal horsepower doors: number of doors load capacity: in ππ3 length: length of the car in meter mileage: liter/100km price(2003): catalog price in May 2003 cost/100km: assuming 15000km/year, amortization over 7 years 5. Decathlon2011 Best results of the top 130 decathlon male athletes in 2011 on the 10 disciplines of a decathlon: 15 variables, 130 observations Variables: Name: name of the athlete Nationality: nationality of the athlete 100 m: time over 100 meter dash in seconds long jump: distance in meter shot put: distance in meter high jump: height in meter 400 m: time over 400 meter in seconds 110 m hurdles: time over 100 meter hurdles in seconds disk throwing: distance in meter pole vault: height in meter javelin: distance in meter 1500 m: time over 1500 meter in seconds day 1 points: points collected after the first day (first 5 disciplines) day 2 points: points collected on the second day (remaining 5 disciplines) total points: total number of points over the two days of competition Points are computed as a polynomial of the results on the different disciplines. As an example, the points collected for a distance of y cm in the long jump is the term (Excel): = ππ πππΆ[0.14354 ∗ ππππΈπ (100 ∗ π¦ − 220: 1.4)]. All the details for the remaining disciplines can be found in the data set. 6. Euroweight weight of 8 batches for a total of 2000 coins of €1, data as described in Ziv Shkedy, Marc Aerts and Herman Callaert, The weight of Euro coins: its distribution may not be as normal as you would expect, Journal of Statistics Education, vol.14, number 2 (2006): 3 variables, 2000 observations. Variables: coin: number of the coin weight: weight in gram class: batch in which the coin was produced, 8 batches of 250 coins each. 7. Forbes2010 The data set is based on the Forbes list of the major 2000 companies worldwide as perceived by the Forbes Company in 2010. For each company two qualitative variables are listed (country and industry) and four quantitative variables (in billions of dollars): Sales, Profits, Assets and Market Value. Four companies were dropped from the list because either the information was incomplete (no profit figure for CIT Group at rank 1041, for Charter Common at rank 1408 and for Lear at rank 1860) or inconsistent (Sales of 0 and positive profit for OGC at rank 1461). As a result the list contains 1996 companies: Variables: Rank: rank of the company in the list Company: name of the company Country: location of the company Industry: sector in which the company operates Sales: sales in 2010 (billions of dollars) Profits: profits made by the company in 2010 (billions of dollars) Assets: total assets of the company in 2010 (billions of dollars) Market value: market value (billions of dollars) 8. Pils2011 Characteristics of 44 Pilsner beers sold on the Belgian market, collected by Testaankoop, a Belgian consumer organization: 8 variables, 44 observations. Variables: brand: brand of the beer price: price per bottle or can (€) content: content of the bottle or can (cl) recipient: bottle (b) or can (c) alcohol: percentage alcohol of the beer label: quality of the label to correctly and completely describe the content taste: evaluation of the taste by a panel: good (G), average (A), bad (B) score: overall score assigned by Testaankoop 9. Rice Weight of the rice content of boxes by different fillers. The objective is to fill the boxes with 50 gram of rice besides other ingredients. Different fillers are used. 20 observations are available for each filler (data provided by a company): 5 variables, 20 observations. Variables: Observation: observation 1 to 20 filler 1: weight of boxes filled by filler 1 (in gram) filler 2: weight of boxes filled by filler 2 (in gram) filler 3: weight of boxes filled by filler 3 (in gram) filler 4: weight of boxes filled by filler 4 (in gram) 10. Sabena The data set contains information about all incoming SABENA flights into Brussels from other European airports during one spring and summer season in the late 1990’s. The set contains 3845 flights: 16 variables, 3854 observations. Variables: INDEX: flights numbered 1 to 3854 DATE: date of the flight FLIGHT NUMBER: the SABENA flight number AIRCRAFT-ID: code identifying the aircraft AIRCRAFT-TYPE: type of aircraft LINE STATION-DEP: the airport the flight is coming from: BHX: Birmingham BOD: Bordeaux BRS: Bristol BUD: Budapest CPH: Copenhagen DUS: Dusseldorf EDI: Edinburgh FLR: Florence GLA: Glasgow HAJ: Hannover HAM: Hamburg LBA: Leeds-Bradford LCY: London City MRS: Marseille NAP: Naples NCL: Newcastle SXB: Strasbourgh THF: Tempelhof (Berlin) TLS: Toulouse TRN: Turin; STD: scheduled time of departure ATD: actual time of departure DELAY TIME DEP: delay time at departure (in minutes) DR1: code for (first) cause of delay at departure; DR1-LENGTH: delay time as a result of DR1; DR2: code for (second) type of delay at departure; DR2-LENGTH: delay time as a result of DR2; STA: scheduled time of arrival at Brussels; ATA: actual time of arrival at Brussels ; DELAY TIME ARR: delay time at arrival in Brussels (in minutes) 11. Tennis balls Diameter of 30 tennis balls (simulated data): 1 variable, 30 observations Variables: diameter: diameter of tennis balls (in mm) 12. TV2011 Characteristics of 46 brands of TV sets. Data collected by Testaankoop, a Belgian consumer organization: 12 variables, 46 observations. Variables: TV: name of the brand size: size of the screen (inches) min price: lowest price recorded by Testaankoop max Price: highest price recorded by Testaankoop screen: type of screen (lcd, led-lcd) scart connections: number of scart connections HDMI connections: number of HDMI connections diversity: quantity and quality of options (1 to 5) reflection: reflection of the screen (1 to 5) picture HD: quality of the picture of high definition broadcasts (1 to 5) kwh/year: electricity used per year on a basis of 4 hours of viewing per day evaluation: global evaluation by Testaankoop (score of 0 to 100)