Michael Lokshin, Sergiy Radyakin and Zurab Sajaia World Bank

advertisement
Michael Lokshin, Sergiy Radyakin and Zurab Sajaia
World Bank
Analytical work at the World Bank
 Each year World Bank produces:
 10-15 poverty assessments
 5-10 Labor market studies
 10 Education and Health assessments
 Gender studies
 Nutritional Studies
 Reports on Social protection and Benefit-Incidence analysis,
etc.
 Most analytical work for these reports is done in Stata
 Research Department (DECRG) of the World Bank
develops new methods and tools that are used in these
reports and need to be make accessible to a wide audience
of practitioners of applied economic analysis
Stata in the World Bank
 Stata is the main statistical package used in the Bank
 Hundreds of users both in the HQ and regional offices
 Many users are short-term consultants with limited
skills in Stata programming
 Consultants are hired on a project and leave the Bank
after the project is completed
 Difficult to impose rules of a programming style, code
documentation, archiving
 Many Stata programs are lost or undocumented and
are difficult to reuse
 There is a need to automate the analytical work
conducted in the Bank
Stata routines developed in DECRG
 Poverty analysis toolkit:
 Growth-inequality decomposition (gedecomposition.ado)
 Sectoral poverty decomposition (sedecomposition.ado)
 Growth-incidence curves (gicurves.ado)
 Stochastic dominance analysis (pov_robust.ado)
 egen extension for inequality and poverty measures
 Fast algorithm for calculation of Gini coefficients (fastgini.ado)
 Applied Economic Research:
 FIML algorithm of two-equation ordered probit models with endogeneity
 FIML estimation of the endogenous switching regression model
 Selection models based on ordered probit
 Semi-parametric difference-based estimation of partial linear regression models
 Selecting a subset of variables providing the model’s best fit
 Efficient estimation of regressions based on pseudo-panel data
 LOOKFOR_ALL - an extention of a Stata program lookfor
 xml_tab.ado: Saving the outputs from Stata estimation procedures in Microsoft
Excel
 usespss.ado; use10.ado – read SPSS files into Stata; read Stata 10 files in Stata 9.
 Many other Stata routines
Automated Economic Analysis
 Speed-up production of basic (required) results
 Minimize human errors
 To free resources for more meaningful and interesting








tasks.
Easily introduce new techniques and methods
Allow easy replication of previous results
Generate standard, comparable results across the
countries/years.
A tool for simulations
A tool for sensitivity analysis and training.
Helpful in situation of limited data access
Simple checking of previous reports/results
Minimize training time and skills requirements
ADePT: Software platform
for automated economic analysis
ADePT
User Interface
Version 3:
Customized Stata
dialogs, classes
Request for
computations
Stata
Computation Kernel
Output in XLS or PDF
format xml_tab.ado
Version 4:
User interface in C#
~100,000 lines of code
Multiple version support
Team Development
Set of Stata and MATA
routines; plug-ins
ADePT Solutions:
 ADePT offers users a solution of a particular problem.
 Modules of ADePT: set of analytical results (tables,
graphs) sufficient to give an answer to a particular
question.
 Combination of software tools and the substantive
contributions from the experts in a field.
 Garry Fields (Cornell) : Labor
 Martin Ravallion (WB): Poverty
 Adam Wagstaff (WB) : Health
 Two main directions of ADePT:
 Assessments of the current situation
 Projections and simulations
ADePT V4.0
 Accepts individual-level and household data in Stata and SPSS








format. Uses Stata for computations.
Possibility of remote computing
No prior knowledge of Stata is required
Minimal data preparation
Extensive checks on possible problems with the data
Control for influential outliers
Tested on the datesets from more than 50 countries: LSMS, HBS,
DHS
Estimated 500 users in the WB, international research
institutions, universities, government agencies.
Expected increase in the number of users when new modules are
released
ADePT V4.0: The roadmap





ADePT Poverty:
ADePT MAPS:
ADePT Labor:
ADePT Gender:
ADePT Social
Protection:
 ADePT Education:




Public Release – June 2007
Public Release – October 2007
Public Release – November 2007
Public Release – November 2008
Public release – June 2009
Public Release – June 2009
ADePT Targeting: Planned Release – August 2009
ADePT PLINES:
Development stage
ADePT HEALTH: Planned Release – August 2009
ADePT Inequality: Planned Release – August 2009
ADePT: Website
www.worldbank.org/adept
Download: installation and updates,
documentation, examples.
Practical issues
 Interface
 Performance (-ftabstat2-)
 Interaction/communication with other programs
(IniFile.class, -smtp-)
 Graphics (-twoway parea-, -amap-)
 Custom file formats (-usespss-, -use10-)
 Installation and updates (-pkg2script-)
 Certification
Practical issues: Interface
 Dialogs in Stata can be created to
facilitate the use of custom written
commands. But they are highly
oriented on forming a command
line: command with parameters and
options, not the full application
interface.
 Some additional features were
added in Stata 10 to expand the
dialog possibilities, but they are still
very limited, and we had a
constraint to remain compatible
with Stata9.2.
 After exhausting standard dialogs
features of Stata we decided to
remove the interface part into an
external application written in C#
(Microsoft Visual Studio).
Released version 3.0 of ADePT used Stata dialogs
Practical Issues: Interface
Current version 4.0 of ADePT uses Windows forms for dialogs
Practical Issues: Performance
 Stata’s built in routines seem to be very efficient, but
the code implemented in *.ado files is often quite slow.
 In particular, -tabstat- has shown inadequate
performance for our tasks despite of its simple nature.
 It was rewritten as a plugin -ftabstat2- in C++
(Microsoft Visual Studio) and modified to suit our
particular needs: it now returns means, totals, counts,
and various proportions matrices for each specified
variable with support of by()-rows and by()-cols
 Trade-off: no MP because plugins are (currently?)
single-threaded.
Practical Issues: Communication
Interaction/communication with other programs: we needed
to solve two problems:
1. To provide an easy to handle job-file, which would contain
the description of all the parameters and options for a large
project (not possible to fit everything in command line).
Transition from txt to ini-files. IniFiles.class
2. To provide communication between Stata and another
program: while the computations are performed in Stata,
the external interface part needs to be updated about the
status of calculations. We solved this by writing a C++
plugin –smtp- (SendMessageToPipe), which utilizes
Windows pipes for IPC
Practical Issues: Graphics
 We have faced some
limitations of the Stata
graphics. Some of them were
circumvented with custom
graphics commands or
adaptations of existing
commands (-twoway parea-).
 We didn’t find any way to
interact with the mouse in
Stata graphics (version 9.2).
 We decided to move our
mapping program –amap- out
of Stata to external program
and communicate with it
seamlessly via ini-files.
Demonstration only, not actual data
Practical Issues: File Formats
 We needed to have a support of




SPSS files in ADePT
We developed –usespss- plugin
to import SPSS data to Stata
-usespss- was presented at
SNASUG 2008 in Chicago and
made available to the public
immediately afterwards
We needed to provide Stata 9
users possibility to process
datasets saved in Stata 10 format.
We developed (using Mata) a
new command –use10- for this
purpose. Available at SSC.
http://repec.org/snasug08/radyakin_usespss.ppt
findit usespss
findit use10
Practical Issues: Installation and
Updates
 We have experienced problems with
installing and updating packages from our
web site into Stata.
 The problem was not due to Stata, but we
received a number of very helpful responses
from the StataCorp’s Tech Support Team on
this issue.
 Effectively, this problem ruled out -net
install We have developed a tool -pkg2script- to
create autonomous installations from one or
more Stata packages with the help of NSIS
installation system.
 The tool will work in Windows only; empty
path – take package from SSC
 In theory, all SSC could be packed into one
distributive like the one shown here:
Practical Issues: Certification
 We have faced the problem of
verification of results. Checking the
numbers by hand is long and unreliable.
 We have included a test-mode for
ADePT, where it:
 launched from an external application
(tests manager),
 runs requested jobs, and
 verifies the output against a predefined
set of benchmarks, which were verified
(confirmed by non-team members).
 We monitor: whether the test succeeds
(results are produced), whether the
results are correct, and what time does it
take to produce them.
If the benchmark for the current test
does not exist, ADePT will generate
them from the current results, and
verify against this saved output next
time.
Practical Issues: Wishes for Stata12
 Access to registry (at least read-only) to detect
presence of other programs, their versions, and
location. (Currently solved with a plugin).
 IPC – pipes (currently solved with a plugin).
 Preserve/restore to RAM (currently solved with a
RAMDrive).
 Extend plugins possibilities: allow execute commands
like Mata can do it: stata(“command”).
 Support of Cyrillics/Local fonts
 Unicode??
Download