Construction of a database Per Weidenman PAR AB Database •A collection of data •It belongs together •It models the ”world” Database management system (DBMS) •The database (a collection of interrelated data) •Software to manage and access the data User: •Searching •Reporting DBMS requirements Input: transactions DBMS Organised data •”Database” •Data Warehouse •etc. Statistical analysis Database management systems (DBMS) •Microsoft Access •Microsoft SQL Server •DB2 •Oracle •MySQL •FirebirdSQL •etc. SQL – Structured Query Language A computer language to define and search data Relational databases Tables containing data, organised in rows and columns Keys, used for linking data in different tables Example Simple database for collecting and organising statistical papers Created in Microsoft Access Paper name and details Link to dokument (pdf file) Autors A database with four tables Keys One of the tables, containing paper name and details Key Rows containing paper name and other details One paper on each row The keys are used to link data in the four tables Table ”artiklar” Key: artikel_id 1 2 3 4 5 6 7 8 9 Table ”författare2” Key: artikel_id 1 1 1 … 5 6 Key: person_id 1 2 3 … 4 4 Table ”personer2” Key: person_id 1 2 3 4 One paper having 3 autors One person being the autor of 2 papers Aaaa Bbbb Cccc Dddd A query: the result of asking the database about papers and autors One paper and the corresponding 3 autors One autor and the corresponding 2 papers ”Business” users User: •Searching •Reporting DBMS requirements Input: transactions DBMS Organised data •”Database” •Data Warehouse •etc. IT Department Statistical analysis DBMS requirements from a statistical / analytical viewpoint •Data quality •Data types •Performance •Maximun information •Historical data •Regulation and secrecy DBMS requirements from a statistical / analytical viewpoint Data quality Sales System X Choose customer name: Sales System X User: •Searching •Reporting Enter customer name: Input: transactions Instead of entering text/data by typing… DBMS … use, if possible, selection from a list of valid values Volvo Personvagnar AB Volvo Lastvagnar AB Volvo Construction AB Volvo Bussar AB Volvo Logistics AB … DBMS requirements from a statistical / analytical viewpoint Data quality Sales System X User: Enter customer age: Define rules for valid input (values, intervals, etc.) Input: transactions •Searching •Reporting DBMS We dont want: •Negative values •40+ •1982 DBMS requirements from a statistical / analytical viewpoint Data quality Missing values should stored as ”null” User:in the database. Not•Searching as 0 (digit zero) •Reporting Handling of missing values … Input: transactions DBMS DBMS requirements from a statistical / analytical viewpoint Data types Text Numeric DBMS requirements from a statistical / analytical viewpoint Performance Searching for individual records User: •Searching •Reporting Creating ”prepared” reports by counting or summing DBMS requirements Input: transactions DBMS Organised data •Large datasets •Multivariate methods •Iterative estimation •Etc. Statistical analysis DBMS requirements from a statistical / analytical viewpoint Maximum information Sales System X User: •Searching •Reporting Enter customer age: 34 Input: transactions We need to report on age groups: 20-29 30-39 40-49 … DBMS Thus we store age as an interval, not as a value! The fallacy of beeing too user oriented! DBMS requirements from a statistical / analytical viewpoint Historical data Sales System X Customer name: … will be added to table Orders and stored as a ”new row” User: •Searching •Reporting Customer address: Order value: Order date: Input: transactions Each new order for a specific customer … DBMS Table: Orders Customer ID Order date Order value DBMS requirements from a statistical / analytical viewpoint Historical data Sales System X Customer name: … will probably UPDATE the existing record (row) for the specific customer User: •Searching •Reporting Table: Customers Customer ID Customer name Customer address Customer address: Order date: Order value: Input: But a new address … transactions DBMS Thus, the old value of ”customer address” will be deleted and replaced with the new value. But this will do fine for users focusing on searching / reporting! DBMS requirements from a statistical / analytical viewpoint Historical data Each time a value is UPDATED for a certain customer … User: •Searching •Reporting Create av new table to contain historic records Table: Customers Customer ID Customer name Customer address Table: Customers_history Customer ID Customer name Customer address From Input: transactions To DBMS … the complete (previous) record is transfered to the table Customers_history DBMS requirements from a statistical / analytical viewpoint Historical data Table: Customers Customer ID This structure will make analysis of processes possible User: •Searching •Reporting But not easy! Table: Customers_history Customer ID Customer name Customer address From Input: transactions To DBMS Customer name Customer address DBMS requirements from a statistical / analytical viewpoint Regulation and sectrecy DBMS requirements from a statistical / analytical viewpoint Current data Current + historical data Operating on individual records Operating on many records Next on this channel… User: •Searching •Reporting DBMS requirements Input: transactions DBMS A database containing historic transactions Organised data Statistical analysis PAR / Bisnode database Tables Basic company data One record per company. Contains name, address, startdate, enddate, line of business, etc. FTG Historic company data Many records per company. Contains the accumulated historic records from table FTG FTG_H Sampling for times series statistics Serrano Balance sheet data One record per annual report (thus many records per company). Turnover, profit, key ratios, etc. Board member data Many records per company and person. BOKSLUT FUNKTION_ PERIOD And many more tables! Historic names etc. How? Board data Statistical analysis END Basic company data One record per company. Contains name, address, startdate, enddate, line of business, etc. Historic company data Many records per company. Contains the accumulated historic records from table FTG Balance sheet data One record per annual report (thus many records per company). Turnover, profit, key ratios, etc. Board member data Many records per company and person. Serrano Balance sheet data from different periods transformed to yearly data records Serrano Historic transactions from FTG_H transformed to yearly data records Serrano Board Data Balance member data from any mix of startdate, enddata and period length transformed to yearly data records Summing up register data to annual figures A Exampel. Register containing balance sheet data: •Number of employes •Turnover 3 •Profit •Tangible assets •Etc. ÅR 2 1 Nu Summing up register data to annual figures A B ÅR 3 Brutet räkenskapsår 2 1 Nu Summing up register data to annual figures A B C ÅR 3 Omlagda räkenskapsår 2 1 Nu Summing up register data to annual figures A B C D ÅR 3 Missing data 2 1 Nu Summing up register data to annual figures B Förslag: Bryt ner flödesvariablerna 3 (omsättning, vinst, etc.) till månadsvärden … ÅR 2 1 Nu Summing up register data to annual figures B Förslag: Nu 3 2 1 … och summera Förslag: månadsvärdena till ett ’fingerat’ kalenderårsvärde. … samt imputera för full täckning under sista året ÅR Summing up register data to annual figures B ÅR 3 2 1 Database Nu First exampel Register based transport statistics for SIKA: Decreased response burden Increased understanding of the transporting companies (as a complement to the ”usual” fokus on type of goods) Time series describing economic status and change . Objective: Describing economic status and change in transporting companies during the last ten years. Total number of employes and turnover … 140000 250000 120000 Anställda 100000 150000 80000 60000 100000 40000 50000 20000 0 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Nettoomsättning, MKR 200000 Anställda Nettoomsättning Objective: Describing economic status and change in transporting companies during the last ten years. … or turnover growth compared to BNP 200 180 160 Transportbranscherna 140 BNP 120 100 80 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Objective: Describing economic status and change in transporting companies during the last ten years. … or profit development for different types of freight companies 10% 5% 0% -5% -10% -15% -20% -25% 1997 1998 1999 2000 2001 2002 2003 Vägtransport Sjötransport Lufttransport Kollektivtrafik Taxitrafik Totalt 2004 2005 2006 2007 Järnvägstransport Objective: Describing economic status and change in transporting companies during the last ten years. … or the number of employes in a cohort of new companies. 160% 140% 120% 100% 80% 60% 40% 20% 0% 1997 1998 1999 2000 2001 2002 Kvarvarande anställda 2003 2004 Tillkommande anställda 2005 2006 2007 Tables based on balance sheet data from each company Aktiva företag Aktiva aktiebolag BNP År Totalt Därav aktiebolag Antal anställda Nettoomsättning (Mkr) Löpande priser (Mkr) 1997 12912 10599 98259 120284 1927001 1998 12788 10626 100663 127745 2012091 1999 12547 10543 102531 133078 2123971 2000 12562 10704 106811 145496 2249987 2001 12383 10659 112685 163418 2326176 2002 12432 10741 114426 168214 2420761 2003 12616 10935 115135 178294 2515150 2004 12689 11067 118015 188913 2624964 2005 12709 11100 119387 209819 2735218 2006 12514 11012 121683 224225 2899653 What data is needed? Company data including micro level history. Exactly which companies where active in transport during each year? Balance sheet data from all transporting companies for each year Aktiva företag Aktiva aktiebolag BNP År Totalt Därav aktiebolag Antal anställda Nettoomsättning (Mkr) Löpande priser (Mkr) 1997 12912 10599 98259 120284 1927001 1998 12788 10626 100663 127745 2012091 1999 12547 10543 102531 133078 2123971 2000 12562 10704 106811 145496 2249987 2001 12383 10659 112685 163418 2326176 2002 12432 10741 114426 168214 2420761 2003 12616 10935 115135 178294 2515150 2004 12689 11067 118015 188913 2624964 2005 12709 11100 119387 209819 2735218 2006 12514 11012 121683 224225 2899653 What data is needed? Company data including micro level history. Exactly which companies where active in transport during each year? Balance sheet data from all transporting companies for each year Aktiva företag Aktiva aktiebolag BNP År Totalt Därav aktiebolag Antal anställda Nettoomsättning (Mkr) Löpande priser (Mkr) 1997 12912 10599 98259 120284 1927001 1998 12788 10626 100663 127745 2012091 1999 12547 10543 102531 133078 2123971 2000 12562 10704 106811 145496 2249987 2001 12383 10659 112685 163418 2326176 2002 12432 10741 114426 168214 2420761 2003 12616 10935 115135 178294 2515150 2004 12689 11067 118015 188913 2624964 2005 12709 11100 119387 209819 2735218 2006 12514 11012 121683 224225 2899653 Faster access to ”last years” data compared to taxation based registers Sampling companies for time series statistics A B C D ÅR 3 2 1 Nu Sampling companies for time series statistics A B C D ÅR 3 2 1 Nu Sampling companies for time series statistics A B C D ÅR 3 2 1 Nu Sampling companies for time series statistics A B C D ÅR A C D 3 A B C D 2 A B C 1 Nu