Michael Lokshin, Sergiy Radyakin and Zurab Sajaia World Bank Analytical work at the World Bank Each year World Bank produces: 10-15 poverty assessments 5-10 Labor market studies 10 Education and Health assessments Gender studies Nutritional Studies Reports on Social protection and Benefit-Incidence analysis, etc. Most analytical work for these reports is done in Stata Research Department (DECRG) of the World Bank develops new methods and tools that are used in these reports and need to be make accessible to a wide audience of practitioners of applied economic analysis Stata in the World Bank Stata is the main statistical package used in the Bank Hundreds of users both in the HQ and regional offices Many users are short-term consultants with limited skills in Stata programming Consultants are hired on a project and leave the Bank after the project is completed Difficult to impose rules of a programming style, code documentation, archiving Many Stata programs are lost or undocumented and are difficult to reuse There is a need to automate the analytical work conducted in the Bank Stata routines developed in DECRG Poverty analysis toolkit: Growth-inequality decomposition (gedecomposition.ado) Sectoral poverty decomposition (sedecomposition.ado) Growth-incidence curves (gicurves.ado) Stochastic dominance analysis (pov_robust.ado) egen extension for inequality and poverty measures Fast algorithm for calculation of Gini coefficients (fastgini.ado) Applied Economic Research: FIML algorithm of two-equation ordered probit models with endogeneity FIML estimation of the endogenous switching regression model Selection models based on ordered probit Semi-parametric difference-based estimation of partial linear regression models Selecting a subset of variables providing the model’s best fit Efficient estimation of regressions based on pseudo-panel data LOOKFOR_ALL - an extention of a Stata program lookfor xml_tab.ado: Saving the outputs from Stata estimation procedures in Microsoft Excel usespss.ado; use10.ado – read SPSS files into Stata; read Stata 10 files in Stata 9. Many other Stata routines Automated Economic Analysis Speed-up production of basic (required) results Minimize human errors To free resources for more meaningful and interesting tasks. Easily introduce new techniques and methods Allow easy replication of previous results Generate standard, comparable results across the countries/years. A tool for simulations A tool for sensitivity analysis and training. Helpful in situation of limited data access Simple checking of previous reports/results Minimize training time and skills requirements ADePT: Software platform for automated economic analysis ADePT User Interface Version 3: Customized Stata dialogs, classes Request for computations Stata Computation Kernel Output in XLS or PDF format xml_tab.ado Version 4: User interface in C# ~100,000 lines of code Multiple version support Team Development Set of Stata and MATA routines; plug-ins ADePT Solutions: ADePT offers users a solution of a particular problem. Modules of ADePT: set of analytical results (tables, graphs) sufficient to give an answer to a particular question. Combination of software tools and the substantive contributions from the experts in a field. Garry Fields (Cornell) : Labor Martin Ravallion (WB): Poverty Adam Wagstaff (WB) : Health Two main directions of ADePT: Assessments of the current situation Projections and simulations ADePT V4.0 Accepts individual-level and household data in Stata and SPSS format. Uses Stata for computations. Possibility of remote computing No prior knowledge of Stata is required Minimal data preparation Extensive checks on possible problems with the data Control for influential outliers Tested on the datesets from more than 50 countries: LSMS, HBS, DHS Estimated 500 users in the WB, international research institutions, universities, government agencies. Expected increase in the number of users when new modules are released ADePT V4.0: The roadmap ADePT Poverty: ADePT MAPS: ADePT Labor: ADePT Gender: ADePT Social Protection: ADePT Education: Public Release – June 2007 Public Release – October 2007 Public Release – November 2007 Public Release – November 2008 Public release – June 2009 Public Release – June 2009 ADePT Targeting: Planned Release – August 2009 ADePT PLINES: Development stage ADePT HEALTH: Planned Release – August 2009 ADePT Inequality: Planned Release – August 2009 ADePT: Website www.worldbank.org/adept Download: installation and updates, documentation, examples. Practical issues Interface Performance (-ftabstat2-) Interaction/communication with other programs (IniFile.class, -smtp-) Graphics (-twoway parea-, -amap-) Custom file formats (-usespss-, -use10-) Installation and updates (-pkg2script-) Certification Practical issues: Interface Dialogs in Stata can be created to facilitate the use of custom written commands. But they are highly oriented on forming a command line: command with parameters and options, not the full application interface. Some additional features were added in Stata 10 to expand the dialog possibilities, but they are still very limited, and we had a constraint to remain compatible with Stata9.2. After exhausting standard dialogs features of Stata we decided to remove the interface part into an external application written in C# (Microsoft Visual Studio). Released version 3.0 of ADePT used Stata dialogs Practical Issues: Interface Current version 4.0 of ADePT uses Windows forms for dialogs Practical Issues: Performance Stata’s built in routines seem to be very efficient, but the code implemented in *.ado files is often quite slow. In particular, -tabstat- has shown inadequate performance for our tasks despite of its simple nature. It was rewritten as a plugin -ftabstat2- in C++ (Microsoft Visual Studio) and modified to suit our particular needs: it now returns means, totals, counts, and various proportions matrices for each specified variable with support of by()-rows and by()-cols Trade-off: no MP because plugins are (currently?) single-threaded. Practical Issues: Communication Interaction/communication with other programs: we needed to solve two problems: 1. To provide an easy to handle job-file, which would contain the description of all the parameters and options for a large project (not possible to fit everything in command line). Transition from txt to ini-files. IniFiles.class 2. To provide communication between Stata and another program: while the computations are performed in Stata, the external interface part needs to be updated about the status of calculations. We solved this by writing a C++ plugin –smtp- (SendMessageToPipe), which utilizes Windows pipes for IPC Practical Issues: Graphics We have faced some limitations of the Stata graphics. Some of them were circumvented with custom graphics commands or adaptations of existing commands (-twoway parea-). We didn’t find any way to interact with the mouse in Stata graphics (version 9.2). We decided to move our mapping program –amap- out of Stata to external program and communicate with it seamlessly via ini-files. Demonstration only, not actual data Practical Issues: File Formats We needed to have a support of SPSS files in ADePT We developed –usespss- plugin to import SPSS data to Stata -usespss- was presented at SNASUG 2008 in Chicago and made available to the public immediately afterwards We needed to provide Stata 9 users possibility to process datasets saved in Stata 10 format. We developed (using Mata) a new command –use10- for this purpose. Available at SSC. http://repec.org/snasug08/radyakin_usespss.ppt findit usespss findit use10 Practical Issues: Installation and Updates We have experienced problems with installing and updating packages from our web site into Stata. The problem was not due to Stata, but we received a number of very helpful responses from the StataCorp’s Tech Support Team on this issue. Effectively, this problem ruled out -net install We have developed a tool -pkg2script- to create autonomous installations from one or more Stata packages with the help of NSIS installation system. The tool will work in Windows only; empty path – take package from SSC In theory, all SSC could be packed into one distributive like the one shown here: Practical Issues: Certification We have faced the problem of verification of results. Checking the numbers by hand is long and unreliable. We have included a test-mode for ADePT, where it: launched from an external application (tests manager), runs requested jobs, and verifies the output against a predefined set of benchmarks, which were verified (confirmed by non-team members). We monitor: whether the test succeeds (results are produced), whether the results are correct, and what time does it take to produce them. If the benchmark for the current test does not exist, ADePT will generate them from the current results, and verify against this saved output next time. Practical Issues: Wishes for Stata12 Access to registry (at least read-only) to detect presence of other programs, their versions, and location. (Currently solved with a plugin). IPC – pipes (currently solved with a plugin). Preserve/restore to RAM (currently solved with a RAMDrive). Extend plugins possibilities: allow execute commands like Mata can do it: stata(“command”). Support of Cyrillics/Local fonts Unicode??