NCSA Parsons 090905 - Grid Computing at NCSA

advertisement
The mechanics
of EPCC
How EPCC learnt to do technology transfer
and software engineering the hard way
Dr Mark Parsons
Commercial Director, EPCC
m.parsons@epcc.ed.ac.uk
+44 131 650 5022
Structure of talk
• What is EPCC today
• Learning to deliver on time
• The structure of a commercial project
• Software development and OGSA-DAI
• Project management
• Questions and discussion
The mechanics of EPCC
2
EPCC Activities
•
Europe’s largest, most successful
supercomputing centre – 15 years old
•
Vital statistics:
– 65 staff
– £3.2M turnover (almost) all from external sources
Facilities
•
HPC
Research
– with a large spectrum of activities
– and a critical mass of expertise
•
Technology
Transfer
Training
European
Visitor
Coordination Programme
Multidisciplinary and multi-funded
Strong engagement with industry
– from local SMEs to large multinationals
– project based consultancy services
•
Supports research at University of Edinburgh via
–access to facilities
– training and support
– TRACS visitor programme
•
NeSC
– founding partner of National e-Science Centre
•
Wide variety of leading-edge systems
– 1,600 processor HPCx system
– 2,000 processor IBM Bluegene/L
– 12,000 processor QCDOC
•
•
New investment in Advanced Computing Facility
EPCC has a unique breadth of expertise in high
performance computing
The mechanics of EPCC
3
Commercial activities today
• Bespoke software development and software project management for
business
– network, cluster and high-performance computing
– novel application areas
– from mushrooms to internet packets
• Start-to-finish projects
– full software development lifecycle, 3 - 12+ months
– most commercial projects are < 6 months
• Operate like a business
– Commercial Group brings in business, Software Development Group delivers
– charge at commercial rates ($1,000 per day)
– very delivery focused
– all commercial contracts are fixed cost
– funded by cash contracts, public funds and European Commission, EU
– many of the smaller projects are supported by SE
The mechanics of EPCC
4
Clients
USA:
o Cisco Systems Inc
o Sun Microsystems Inc
o IBM Corporation
o Oracle Corporation
o Hewlett Packard
o Microsoft
o Xilinx Corporation
UK:
AlmondEngineering
Engineering
Ltd
o Almond
Ltd
AltamiraLtd
Ltd
o Altamira
Arran Aromatics
AromaticsLtd
Ltd
o Arran
CallandersSawmills
SawmillsLtd
Ltd
o Callanders
Calman Ltd
Ltd
o Calman
CB Technology
TechnologyLtd
Ltd
o CB
Centre for
forCustomer
CustomerAwareness
Awareness
o Centre
LtdLtd
o CERN
o Cheltenham & Gloucester plc
o DTI
DigitalBridges
BridgesLtd
Ltd
o Digital
ElektrobitLtd
Ltd
o Elektrobit
o First Group plc
Europe:
GoldenCrumb
CrumbLtd
Ltd
o Golden
o European Commission
High Speed
SpeedProductions
Productions
o High
LtdLtd
IntegritiSolutions
Solutions
o Integriti
LtdLtd
+ many EU project partners
TechnologyLtd
Ltd
o IP Technology
IronsideFarrar
FarrarLtd
Ltd
o Ironside
JardineTechnology
Technology
Ltd
o Jardine
Ltd
o Pepper’s Ghost Productions Ltd
Radar World
WorldLtd
Ltd
o Radar
Red Lemon
LemonLtd
Ltd
o Red
Rosti(Scotland)
(Scotland)Ltd
Ltd
o Rosti
QuadstoneLtd
Ltd
o Quadstone
Ltd
o SCI Ltd
o Scottish Enterprise
o The Crown Office
o TSB Bank Scotland Ltd
o UK Meteorological Office
UpstreamSystems
SystemsLtd
Ltd
o Upstream
AlphaData
DataParallel
ParallelSystems
Systems
Ltd
o Alpha
Ltd
NallatechLtd
Ltd
o Nallatech
The mechanics of EPCC
2000 - 2004
Japan:
o Hitachi
o NEC Europe
o Fujitsu Labs Europe
5
Business Strategy
• to solve business problems NOT sell technology
• … individual solutions for clients
Technology Push Down
Academic
Research
Project size
£ X,000,000
£ X00,000
£ X,000
OGSA-DAI
SunDCG PGPGrid
First Group
CCA
IPO
Autoscreen
Microsoft
C&G
The mechanics of EPCC
6
How do we work?
No.
The mechanics of EPCC
7
How do we work?
• Take pride in a professional approach
– Work in small project teams
– Project leader, 1-6 developers, technical reviewers
• Use documented engineering & management processes
– Project management based on PRINCE2
– Engineering using agile methods
• Built from experience and industry best practice
–
–
–
–
Iterative/staged development techniques
Requirements triage
Test-driven development
Tuned to the leading edge of innovative software development
The mechanics of EPCC
8
Who does the work?
• Currently around 4 business development staff
– 2 focus on business development
– 2 focus on marketing and publicity
• Currently around 20 engineering staff
–
–
–
–
Three full-time project managers, two software architects
c. 15 consultants and principal consultants
Staff backgrounds – maths , physics, computer & life sciences
Over 100 staff-years of experience, over 1/3 from industry
• Typical skills
–
–
–
–
Java, C/C++, Visual Basic/C#, Perl, Fortran
Distributed computing, webservices, XML, J2EE, MPI, OpenMP
Databases, SQL, JDBC, XML-DB
Software engineering, OO design, UML
The mechanics of EPCC
9
EPCC’s early history
• Established in 1990
– focus for interest in parallel computing within Physics and CS
• Early years largely supported by UK Government “Parallel
Applications Programme”
– made lots of money working with large UK corporations to
optimise/parallelise their codes
• How did our funding model come about?
– from a belief in the self-funding of University
research
– we’ve shown it can be done but it’s very
difficult
– it did mean we had to work with industry from
the beginning
The mechanics of EPCC
10
EPCC history (continued)
• 1990-1994
– funded by UK Government Parallel Applications Programme
– grew to 65 staff
– many parallelisation projects with UK industry – aerospace, nuclear,
oil & gas etc etc
– span out company – Quadstone
• 1995-1996
– as Gov money dried up so did projects
– had to move from long term projects (18 months) to much shorter
projects (3-6 months)
– major problem – project / cost overruns
– nearly had to make many staff redundant
The mechanics of EPCC
11
EPCC history (continued)
• 1997-2000
–
–
–
–
successfully moved markets from large-scale industry to SMEs
opportunities focussed around successful EU TTN project
projects 3-6 months in duration
embarked on having a repeatable process
• 2000-now
– over the past few years moved into Grid computing
– continued to work with industry
– wide variety of projects:
– OGSA-DAI – data access & integration for the Grid
– Intersim – packet level modelling of differentiated services
– Golden Crumb – automatic mushroom selection in factory
– Cheltenham & Gloucester – data mining for mortgage industry
The mechanics of EPCC
12
How does EPCC work today?
• We have well developed project processes
• Two linked processes
– software development process
– project management process
• Will illustrate software development process using OGSADAI as example
• Recently moved to PRINCE2 project management
methodology
The mechanics of EPCC
13
The project lifecyle
• Commercial Group identifies clients and initiates discussions
• Following initial discussions CD and technical staff visit
company to discuss requirements
• High level design written – timings / costs agreed
– may involve free code survey at this point
• Contract negotiated – fixed price – includes detailed
workplan based on design
• Project handed to IS – staff scheduled according to skills
• All projects have
– Project Leader, Applications Consultant, Technical Reviewer
– Regular meetings between IS and CG
– CG act as account manager to company / funder
The mechanics of EPCC
14
OGSA-DAI
• Data Access and Integration for databases resources on the Grid
• Aim to deliver application mechanisms that:
– Meet the data requirements of Grid applications
– Functionally, performance and reliability
– Reduce development cost of data centric Grid applications
– Provide consistent interfaces to data resources
– Acceptable and supportable by database providers
– Trustable, imposed demand is acceptable, etc.
– Provide a standard framework that satisfies standard requirements
• A base for developing higher-level services
– Data federation / Distributed query processing
– Data mining
– Data visualisation
The mechanics of EPCC
15
OGSA-DAI team
EPCC Team, Edinburgh NeSC, Edinburgh
NEReSC, Newcastle
ESNW, Manchester
IBM Development Team, Hursley IBM Dissemination Team
The mechanics of EPCC
16
REVIEW
Software Process and Teams
Programme
Board
Technical
Review Board
Peer Review
and Inspection
Technical
Reviewer
Users’
Group
Design
Implement
QA
Ingest
Release
Dissem.
Support
Training
Requests
Contribs
Continual process →
DEVELOPERS
Reqs.
Deep track
features
Prototype
System tests
based on reqs
USERS
Test Cases
Use
Cases
Nightly unit +
system tests
Testing
Additional
test cases
Fix Bugs
Prioritisation
The mechanics of EPCC
17
Working together
• No more heroes any more
– the lone researcher can get into trouble
– so don’t do it!
– use teams even for small projects
– a task leader to keep the bigger picture in mind
– a “reviewer” as a technical foil for the developer
– distributed extreme programming doesn’t work
– be sensible!
• Code needs owners
– and joint ownership doesn’t work
– Java packages and CVS module provide useful boundaries
– “buddy” system worked well for a team of 10-12, not as well for 5
– we now have 80,000 lines of Java code + 30,000 lines of documentation
The mechanics of EPCC
18
An agile approach to development
• Agility is all
–
–
–
–
Grid/HPC environments and problems = complex systems
complex systems = big, complex projects
big, complex projects = high risk of failure
adopting incremental approaches to requirements, design, and
implementation helps minimise risk
– delivering small increments regularly is good
– good for quality, for visibility, for morale
• Keep your eyes on the road
– keep an active eye on project risks
– think about what happens if this goes wrong
– just thinking about it reduces the likelihood it’ll happen!
The mechanics of EPCC
19
Releasing software
• No release schedule = no releases
– don’t timebox research, but do timebox development
– HPC is fun and exciting - beware feature creep!
– “how’s the project?”
– “oh, we’re 95% there” (and always will be…)
– frequent release milestones focus developers
– but don’t overspecify what will be released
• OGSA-DAI had the opposite problem
– three months too short
– six months about right
– major/minor/patch/”special brew”
– set your testing timetable in stone
The mechanics of EPCC
20
Know your requirements
• Requirements, requirements, requirements
– write ‘em down! Give ‘em numbers!
– remember, requirements aren’t just functional!
– whatever they are, they are always testable
– tests on HPC systems may be tricky, but that
makes it fun!
– MoSCoW notation is good
– Must, Should, Could, Won't
– “how important are Priority 3 requirements again..?”
• OGSA-DAI had lots of requirements
– but make sure you can understand their worth
– real users are often better than good ideas
– a user group helps to focus development as software matures
The mechanics of EPCC
21
Return, recycle, reuse
•
Throwaway prototypes never are
– “once I’ve proved this, I’ll junk the code”
– no, you won’t (or your grad student won’t)
– apply some basic process even to trivial codes
– even reuse of “good” code is sometimes wrong
•
OGSA-DAI started with high ideals
– beware the big ball of mud
– patterns in architecture
– “Shantytown”
– enables quick exploration of feature territory
– must be built on a strong central foundation
– must include council legislation aka testing
The mechanics of EPCC
22
OGSA-DAI Dashboard
The mechanics of EPCC
23
Can I see your documents please?
• Document! Document! Document!
– Imagine trying to program without a language reference
– structure and stability is good
– Get people who like writing documents to do them
– but get everyone to doc their code
– a single editor can provide guidance
– Good code documentation can be used by the tooling
– Good human documentation will win your users support
• Make sure you don’t underestimate the cost
– code maintenance and documentation takes longer than code
development
– make it part of the process
The mechanics of EPCC
24
People power
• Social engineering is the key
– Push decisions down to the developers
– “Too many chiefs”
– make sure you know what are the key battles to win
– Have a process for change
– or one person will become very unpopular
– developers and managers both think they know what’s best
– Understand your teams
– different people like working in different ways
– no one style for management in OGSA-DAI
– Competition is good
– go one better
The mechanics of EPCC
25
The big picture
• Balance the hype
– software engineering is about vision vs effort vs requests
– expectation management is important
– researchers, developers, users and funders are all different
– and all want different things
– the larger the project, the harder it falls
• Listen to your users
–
–
–
–
–
–
useability is good
it has to install easily
don’t change your interface
client tooling helps
support helpdesk is better
user groups are interesting
The mechanics of EPCC
26
Software development summary
•
Agile methods are very sympathetic
– the Agile founders disliked Rigid Inflexible Processes too!
•
Adopt a simple process and toolset
–
–
–
–
–
–
•
even lightweight process really pays off
scoping, requirements and risk analysis up front
incremental approach to design, develop, test
learn some basic tools (they’re even free!)
distributed teams are hard to manage strictly
distributed management is even harder
Listen to your customers – they always know best
The mechanics of EPCC
27
Project Management
• All technical staff have a line manager and at least one
project leader
• Procedures are well documented and have grown up over
time
• Recently we have moved to PRINCE2 project management
methodology for commercial projects
– seems to work well but is a bit of a culture shock
•
•
•
•
We employ staff specifically for project management
All staff time is logged – planned and actual
A working day has two blocks of 3 hours
Staff can bid for time to do research / proposal writing
The mechanics of EPCC
28
What is PRINCE2?
• PRojects IN Controlled Environments version 2
• A project management standard produced by UK’s Office of
Government Commerce (part of DTI)
– “PRINCE2 is a process-based approach for project management
providing an easily tailored, and scalable method for the management
of all types of projects”
• PRINCE2 is a de facto UK PM standard
– becoming mandatory in the public sector (Gov, NHS, Police)
– becoming PM method of choice in business
– Unilever, GlaxoWellcome, Tesco, BT, Sun, TSB, NatWest, Norwich
Union, Centrica, Cable & Wireless…
– becoming widespread in Europe too
• PRINCE2 is internationally recognised and respected
The mechanics of EPCC
29
What is PRINCE2 not?
• PRINCE2 is not a software engineering method
– but it grew out of an IT environment
– and it fits well with traditional or agile development methods alike
• PRINCE2 will not help you code better
– but it will help you deliver better quality products, on time
– and will stop you falling out with your boss/staff
• PRINCE2 will not tell you how to write software
– but it will leave you alone to write software your way
• PRINCE2 is not a silver bullet
– but it’s general, flexible and tailorable and – most importantly – it’s
based on common sense
The mechanics of EPCC
30
PRINCE2 in a nutshell
• Projects have a clear Business Case or they don’t happen
– “remind me again why we’re doing this project?”
• Projects have a beginning, a middle and an end
– clearly defined – they start and they stop – they don’t weeble on forever
• Projects run in stages with clearly defined boundaries
– get a clear picture of how we’re all doing
• Product-based planning focuses on deliverables not tasks
– think “what do we have to make?”
• Layered management: corporate, board, project, team
– each level has a clearly defined interface with the others
• Management by exception
– if there are no problems, just carry on – management don’t meddle
• Change is fundamental: change management is intrinsic
– assume things will change and plan accordingly
The mechanics of EPCC
31
The PRINCE2 process diagram
Corporate or Programme Management
Directing a Project
Project
Mandate
Starting up
a Project
Initiating a
Project
Controlling
a Stage
Managing
Stage
Boundaries
Closing a
Project
Managing
Product
Delivery
Planning
The mechanics of EPCC
32
PRINCE2 Components
• As well as the processes there are several complementary
components…
• The Business Case
– a key driver – the Why? for the project
– either a genuine (commercial) business case or at least a set of
compelling reasons
– owned by the Executive
– monitored throughout the project
– if the BC goes away, the project should be stopped
• The Project Organisation
– describes the four management layers
– corporate, board, project, team
– everyone should have a job description
– make roles and responsibilities clear
The mechanics of EPCC
33
PRINCE2 Components (2)
• Plans
– product-based, as discussed above
– write product descriptions for key products
• Controls
– divide the project into Management Stages
– a Stage is “as far ahead as you can plan in reasonable detail”
– typically a few months
– define reports, meetings etc.
• Tolerances
– allowed variations in time, budget, scope before escalation triggered
– “you have six months, +/- 1 month”
– “you must satisfy these requirements; those are optional this stage”
The mechanics of EPCC
34
PRINCE2 Components (3)
• Quality
– the project must define methods for QC and test
– quality checks should be built in to the MP process
• Risk
– think about it, monitor it
– one of the best management tools is to ask “what might go wrong?”
– and create plans to handle it if it does
• Configuration Management
– keep track of product versions and histories
– software version control tools are a good way of implementing this
The mechanics of EPCC
35
PRINCE2 Summary
• PRINCE2 is a powerful, flexible, scalable PM approach
• It’s based on industry best practice
– rooted in software development projects
• Provides good, intelligent layers of management control
• Formalises, in a positive way, customer relations
• Can fit easily with agile software development
• It’s the only PM approach with internationally recognised
qualifications
The mechanics of EPCC
36
Final comments on working with industry
• Wear a tie!
• Remember that the person you’re meeting is just as nervous
of meeting a mad academic as you are of meet a rapacious
capitalist
• The managing director of the company may be drunk
• Always apply Denis Healey’s law of holes: “When in one stop
digging”
• If you’re going to deliver late – tell the customer straightaway
• Listen listen listen!!! It’s the only way to get business
The mechanics of EPCC
37
Questions / discussion
?
The mechanics of EPCC
38
Download