Stata - Experiences and Needs of a Professional Services Organisation Dr. Andreas Stiehler Berlecon Research GmbH Oranienburger Str. 32 10117 Berlin as@berlecon.de Fon/Fax: +49 30 285296-0/-29 Outline I. Intro: Berlecon Research II. Requirements for statistic software III. Deploying Stata 8: main challenges IV. Conclusions and wish list 2 I. Intro: Berlecon Research GmbH We analyze the potential of new technologies. Analyses are typically published as reports Our clients are demanding and have high standards. IT executives from large and medium-sized companies Technology and consulting companies Public institutions (EU commission, BMWA etc.) Empirical work makes up a significant part of the research. About 3 - 5 surveys per year (increasing importance) Data typically delivered by market research companies Requirements for statistics software differ from that of typical Stata users (scientists). 3 II. Requirements for statistics software 4 II. Requirements for statistics software We need less sophisticated statistics… > 90% binomial and ordinal data mainly descriptive statistics needed (tabulate, tabsum) …but high quality graphs and tables! layout of the reports is a competitive factor We only have a limited amount of time… for both exploring software opportunities and computing …but need highly reliable data! errors in data analysis would seriously damage our reputation Stata 8 has the potential to fulfil many of these requirements. 5 II. Requirements for statistics software July 2004: Start of internal project with two objectives: Choosing a new statistics software and - based on it Finding a way to streamline the creation of tables and graphs Decision in favour of Stata 8, decisive features are: Programming opportunities: allows for customisation of outputs and exploiting economies of scale Intuitive user interface: facilitates the exploration of possibilities that Stata offers. Graphs: high quality, no additional software needed Further features: Statalist, Mac support, price But: Exploiting the potentials for our needs is challenging 6 III. Tables and Graphs: main challenges 7 III. Tables : Main Challenges Goal: Preparation of data for custom tables Overview tables typically compile results from several tabulate(s) Often further transformation of results needed (e.g., summary data for general approval = fully agree + somewhat agree). Companies Solution 1 Solution 2 Solution 3 All By country Germany UK.... By education High-skilled Low-skilled C & P from “tabulate” (s) is too time-consuming and error prone. 8 III. Tables : Main Challenges Solution today: Processing of data via matrices 1. Tabulate results to matrix: tab A B, col nofreq matcell (AB) 2. Transformation of data using matrix operations: percentages, aggregation, compiling for the presentation in table. 3. Matrix to data table: svmat (AB), names (newsol1) 4. Copying of table-ready data to Excel or Word Advantages Significant time savings (half of the time needed compared to transfer of results on a case-by-case basis) Error reduction (copying only once) Stata 8 allows for creating table-ready results in an efficient way. 9 Programming of table-ready data matrix def e = 1,1,1,1,1 foreach var of varlist q31* { /*********Country******************************/ tabulate `var' country, matcell(c`var') matrix define gesc`var'= e * c`var' matrix define c`var' = c`var' * inv(diag(gesc`var')) * 100 /***********Education*************************/ tabulate `var' edu, matcell(e`var') matrix define gese`var'= e * e`var' matrix define e`var' = e`var' * inv(diag(gese`var')) * 100 /***************All together***************************/ matrix define `var' = c`var''\e`var'‘ matrix define `var' = `var'* (1,1,0,0,0)' svmat `var', names (z`var') foreach var of varlist z`var'* { replace `var' = round(`var') } } 10 III. Graphs: Main challenges Graphs at Berlecon: About 7 different graph types in use Customisation time: about 2 weeks (GUI ”Submit” - big help) Use of program files makes it economic (reuse of do.files) Extensive use of overlaid two-way charts Challenge: Many graphs needed by Berlecon Research are non- standard in Stata! 11 IV. Conclusions and wish list 12 IV. Conclusions Stata has the potential to serve as a helpful tool for the specific needs of professional services organisations. Ability to streamlined and less error prone generation of tableready data. High quality graphs - no additional software needed. Intuitive user interface and collaboration via Statalist facilitates programming for non-experts. There are several challenges: Standard output typically not sufficient for further processing in tables and graphs. Finding appropriate solutions is challenging - but it’s worth it. Tools facilitating the customisation process could help to win further professional services organisations as customers. 13 IV. Wish list Tables: Option to: Design tables within the Stata program!!! Save percentages in matrices Display “zero” rows/columns to matrices in case of empty cells Graphs: Option to define colours for value labels in stacked bar charts as standard (as it is possible for pie charts) “Text” options for labels on y/x-axis (including line breaks, size…) Book “A visual guide to Stata” for non-standard graphs Thanks for your attention! 14 Programming of graph-ready data /***Tabulate results to data editor*********/ matrix define e7 = 1,1,1,1,1,1,1 foreach var of varlist f7* { tabulate `var' [fweight=rweight], matcell(`var') matrix define ges`var' = e7 * `var' matrix define `var' = `var' * inv(ges`var')* 100 } matrix define y7 = f7a,f7b,f7c matrix define y7 = y7' svmat y7 /*Stack bars*/ gen y7stack7 = gen y7stack6 = gen y7stack5 = gen y7stack4 = gen y7stack3 = gen y7stack2 = gen y7stack1 = } y71+y72+y73+y74+y75+y76+y77 y71+y72+y73+y74+y75+y76 y71+y72+y73+y74+y75 y71+y72+y73+y74 y71+y72+y73 y71+y72 y71 /*counting*/ matrix input zaehl3 = (1\2\3) svmat zaehl3 /*Rounded scatter values*/ forvalues i = 1/7 { gen y7scat`i' = y7stack`i'-y7`i'/2 replace y7`i' = round(y7`i') } 15 Programming of overlaid charts twoway /// (bar y7stack7 zaehl31, horizontal blcolor(black) bfcolor(gs13) barwidth(0.5)) /// (scatter zaehl31 y7scat7, msymbol(none) mlabel(y77) mlabsize(medsmall) /// mlabcolor(black) mlabposition(0)) /// /// (bar y7stack6 zaehl31, horizontal blcolor(black) bfcolor(gs3) barwidth(0.5)) /// (scatter zaehl31 y7scat6, msymbol(none) mlabel(y76) mlabsize(medsmall) /// mlabcolor(white) mlabposition(0)) /// /// (bar y7stack5 zaehl31, horizontal blcolor(black) bfcolor("204 255 204") barwidth(0.5)) /// (scatter zaehl31 y7scat5, msymbol(none) mlabel(y75) mlabsize(medsmall) /// mlabcolor(black) mlabposition(0)) /// /// (bar y7stack4 zaehl31, horizontal blcolor(black) bfcolor("0 204 0") barwidth(0.5)) /// (scatter zaehl31 y7scat4 ......... 16 Labelling of bars ..../// , ylabel( /// 0.85 "Cost " /// 1.15 "leadership " /// 1.4 "..........................." /// 1.85 "Quality " /// 2.15 "leadership " /// 2.4 "..........................." /// 2.85 "Innovation " /// 3.15 "leadership " /// 3.4 "..........................." /// 3.85 "Market " /// 4.15 "leadership " /// 4.4 "..........................." /// 4.85 "Vertical " /// 5.15 "focus " /// 5.4 "..........................." /// 5.85 "Focus on " /// 6.16 "enterprise size" /// , ..... 17