MBIE PowerPoint presentation template

advertisement
Small team workflow in government analytics
Peter Ellis
Manager Sector Performance
18 March 2014
Today’s talk
•
•
•
•
Who are we and why is our experience important?
What are “data-intensive economic reports”?
The challenge
The solution
• Reflections on analytics in government
The Sector Performance team
• 9-10 staff
• $5 million budget – mostly for outsourced data collection
• One of 3, 4 or 9 analytical teams in MBIE
• Depending on definitions
• But diverse approaches from different teams
• Variety of roles
•
•
•
•
Manage collection of tourism and science and innovation data
Analyse and publicly disseminate tourism data
Analyse data on all sectors for policy teams and Ministers
Support policy teams in other areas
• Mid through 5 year Tourism Data Improvement Programme
• Since MBIE’s creation, now applying the tools, skills and techniques
to a wider range of data
Whatever the
terminology, tools
and content, your
organisation’s
“analytics” team/s
need to be in this
space
http://drewconway.com/zia/2013/3/26/thedata-science-venn-diagram
Capability building for an analytical team
• Five key areas needed
1.
2.
3.
4.
5.
Workflow, document management and teamwork
Analytical techniques
Tools
Data reshaping and management
Data storage
• Many programmes don’t take all five into account…
• IT-led BI programmes may focus on only #3 and #5
• Universities typically only teach #2
Data-intensive economic reports
http://www.mbie.govt.nz/w
hat-we-do/business-growthagenda
The challenge –
update the draft overview Sectors Report
• Current version had evolved over 24 months – over 200
plots and 50 tables of data
•
•
•
•
•
Not all the data sources fully defined
Some of the Excel workbooks lost
Some data was custom-cut by Statistics New Zealand
Home-grown (and inconsistent) concordances to “sector”
Some data hard keyed in, and not clear what was original,
what was analysis, and what was grooming/reshaping
• Tight timeframe
• High profile, and quality guarantee essential
This is just one
worksheet of around 30
– only 20 of which we
could find…
Principles for a solution
• Separate the data from the grooming and analysis
• Reproducibility
• Systemised constant teamwork and peer review, requiring:
•
•
•
•
Repository-based version control
Centralised and disciplined folder and file structure
Modular code with custom functions, palettes and themes
Frequent integration and continuous testing
• Cut the dependencies on externals
• Extreme code-based plot polishing
• And for our next project (Small Business Report):
• Frequent iteration with the client (policy team and Minister)
• Separate exploratory analysis from polishing
The toolkit
( future
warehouse)
DATA
SOURCES
DATA PROCESSING AND
ANALYSIS
DATA MANAGEMENT
INTERMEDIATE
OUTPUTS
LaTeX (preferred),
or MS Word /
MS Powerpoint
One-off data
slices from
various
sources
FINAL
PRODUCTS
Adobe
Illustrator
Messy custom data
Hard copy
plots, tables
and text
DATA WAREHOUSE
MBIE data
Hard copy and PDF
Statistics NZ
data
International
data
Tidy data in
datamarts
Project specific
database
Data for
web
version
Tidy data in
datamarts
“ad hoc ETL” adds data to the datamart if suitable
Other
regularly
acquired data
Tidy data in
datamarts
SQL Server
Reproducible
grooming of
data
Exploratory
data analysis
Production
of visuals
and text
R and Git
Code-based / auditable / reproducible / version-controlled
Interactive web version
Design, build and
touch up for final
products
HTML,
JavaScript
The folder structure
• raw_data
•
•
•
•
•
•
•
•
concordances
NZ.Stat
Infoshare
custom
grooming_code
data
analysis_code
output
• Part I
• Part II – dashboards
•R
• .git
Held together with key files in
the project’s root directory:
• integrate.r (in future to
replace with makefile)
• sector_report.rproj
• .Rprofile
add, commit
add, commit
add, commit
save
save
save
save
save
changechangechangechangechangechangechangechangechangePull Push
John's memory
John’s
PC stick
Master
C
clone
A
Shared
P:/OTSP/somewhere
Master
fileserver
init, add, commit, branch
E
F
merge
Visible
G
clone
B
Jane's memory
stick
Jane’s
PC Master
changechangechangechangechangechangechangeD
changechangechange
save
save
save
save
save
save
add, commit
add, commit
add, commit
Pull Push
Particular things that make this approach hum
• Git
• Rstudio projects are a great way of organising
• But Notepad++ users can still participate if they use R shortcuts in the
root folder of the repo
• Clean, pared back, modular scripts essential for readability
• Create your own palette, ggplot2 themes, font variables and
functions for image dimensions and resolution
• Resource for oversight, coordination, ensuring the build works
• Manager needs to be technical enough to dive into the repo
• You wouldn’t have a policy manager who couldn’t use Word
• Clear spec – or ability to have agile iterative approach with client
Joel’s 12 point test for software developer teams
1.
Do you use version control for your code?*
2.
Can you make a build in one step?
3.
Do you make frequent builds (at least daily)?
4.
Do you use an issues tracking system?*
5.
Do you fix bugs before writing new code?
6.
Do you have an up-to-date schedule?
7.
Do you have a spec?
Tweaked (*) from
8.
Do programmers have quiet working conditions?
http://www.joelonsoftware.com/articles/fo
g0000000043.html
9.
Do you use the best tools available?*
Surprisingly relevant
for analytics teams
too
10. Do you have testers? (not sure this one’s relevant)
11. Do new candidates write code during their selection ?
12. Do you do hallway usability testing?
Five things needed for successful capability
building
1. External demand
2. Sustained management commitment
3. Resourcing for trialling, experiments and intensive
customised training
4. Supportive IT team and environment
5. Preparedness for the process to take years rather than
months
Different needs
and roles
IT BAU
Web
Web
support
support
Network
Network
support
support
·
·
·
·
Policy world
Instantaneous needs
Fast delivery time
Unclear and changing data needs
Non-specialist tools
Applications
Applications
packaging
packaging
IT project land
·
·
·
·
Capital projects
Waterfall projects; big design up front
12 month + delivery time
Deliver vital infrastructure to empower
analytics teams
· Tools like SSIS
Project
Project office
office
Programme
Programme
PM
PM
manager
manager
BA
BA
BA
BA
Developer
Developer
Architect
Architect
Architect
Architect
Developer
Developer
Analytics team
· Relies on the infrastructure provided by IT
· Bilingual in IT and Policy languages
· Agile, scrum and “extreme programming” project
methods; iterate as it goes
· Tools like R, SQL, Shiny, JavaScript
· Translate policy needs into demands for basic
infrastructure from IT
· Use the infrastructure to deliver flexible products
Some particular issues in government
• Demand from Ministers and senior management essential
• Courage required to raise the expectations
• Need to push some boundaries
• Work with, not against, your ICT team
• Common goals
• Recognise where ICT projects are needed and when to use “BAU”
• Balance of waterfall v. agile and beyond
• But - be prepared to use personal machines as a trial
environment for new tools and techniques
• Only way to know what you want to invest in – high costs in
packaging up new software for locked down networks
• A significant sized team essential to build momentum
• Recent developments only possible for us with the creation of MBIE
Download