Spreadsheets: functional programming for the masses Simon Peyton Jones Margaret Burnett Alan Blackwell Q1: What should a functional programmer in Microsoft Research do? Q1: What should a functional programmer in Microsoft Research do? A1: Persuade developers to implement stuff in Haskell. No more C#! Haskell is better! Q1: What should a functional programmer in Microsoft Research do? A1: Ask Q2 Q2: What is the worlds most widely used functional language, by far? Q1: What should a functional programmer in MSR do? Q2: What is the worlds most widely used functional language, by far? Violent exothermic reaction Spreadsheets are functional programs B1 = A1*A1 C1 = A2*A2 D1 = B1-C1 B2 = A1+A2 C2 = A1-A2 D2 = B2*C2 Just a big bunch of equations No side effects Order of evaluation controlled by data dependencies Q3: What chance does a pointy-headed researcher have of influencing the direction of a Microsoft cash cow? Q3: What chance does a pointy-headed researcher have of influencing the direction of a Microsoft cash cow? Excel’s market is tall 50m 50m end-user programmers End users Can use Excel Market size 2m Programmers • Real job is engineering, teaching, financial; NOT programming • Use Excel formulae to build "models" • No need to "sell" functional programming: they are already doing it! 2m “classic” programmers (write VB,C++, C#) Can use VB, C++ Not much 50m End users Can use Excel Market size 2m Programmers Can use VB, C++ 2m “classic” programmers (write VB,C++, C#) Research effort expended 50m Tall, but narrow When the task... becomes large or complex changes over time rewards re-use is mission-critical End users Market size ... cells and formulas are not enough. Current solution: shift programming paradigm Use Excel + VB, C# 2m Programmers Application requirements 50m End users Our vision New territory to colonise Market size 2m Programmers Application requirements Increase Excel’s “reach” by empowering end users to write “programs” without hiring programmers Excel Simon Peyton Jones Functional programming Excel Plus Margaret Burnett End user & Alan Blackwell visual software engineering Psychology of programming Ruthless design-time focus on usability, based on empiricallygrounded research. Our target end users Of all Excel users Some just use Excel for lists Some can type very simple formulae e.g. =SUM(A1:A10) Some use formulae, and understand copy-and-paste of formulae (absolute and relative cell references) Some can use Visual Basic Our target end users Of all Excel users Some just use Excel for lists Some can type very simple formulae e.g. =SUM(A1:A10) Some use formulae, and understand copy-and-paste of formulae (absolute and relative cell references) Some can use Visual Basic (professional programmers) This is our target audience. A minority of Excel users, but still extremely numerous Market size 50m How? 1. Functions as ordinary spreadsheets End users 2. First class array values 2m Programmers Application requirements Two complementary ideas Functions as ordinary spreadsheets What's missing? B1 = A1*A1 C1 = A2*A2 D1 = B1-C1 B2 = A1+A2 C2 = A1-A2 D2 = B2*C2 Functions as ordinary spreadsheets Scenario teacher types formula to compute student grade copies and pastes down a column (much later) wants to change the formula Problem must alter many cells to implement a single change impacts re-use, error-proneness, modularity Obvious solution (to a programmer) Make a named function to encapsulate the formula User is working on a formula User brings up the right click menu Cut Copy Paste … Make a function … A new function is automatically created in a sheet and called Formula replaced by call to function New function worksheet Now, fill down does not lose sharing Regular fill down User can see/modify the function definition as desired Functions as worksheets Creating a function is fast Understanding a function requires no new skills: no paradigm shift Using a function improves quality Named abstraction is our primary weapon in the war against complexity. Imagine conventional programming with no procedures, only smart copy/paste! Creating a function from scratch Build a worksheet to calculate the distance a ball will travel, when at a particular angle and velocity Turn it into a function by identifying the input cells (a bit like “scenarios”, only callable) Call the function many times to see the distance the ball goes for different throwing angles Debugging The “call tree” becomes a tree of linked worksheets, laid out in space, not in time. So debugging is particularly easy. Need new mechanisms for navigating the plethora of worksheets, via the tree structure. First year programming courses will be taught this way! Main program Calls Function CylVol Calls Function CircArea Domain-specific libraries Every domain (physics, electronics, statistics, financial, marketing...) has domain-specific abstractions. Excel’s function libraries are an ideal way of packaging those abstractions for Excel users. Hence, we want to make it easy for end users to build, encapsulate, and share their own function libraries, without help from professional programmers. First class data values First class data values User-defined functions need array arguments. e.g. SUM( A1:B9 ) Simple but powerful idea: anything a scalar can do, an array can do: be the value of a formula be the value of a cell be the argument or result of a function Make Excel’s existing “array formulae” simpler and more powerful. First class values Currency; units in general (unit-aware arithmetic) Hyperlink Matrix (index, add, multiply…) Relation (filter, select, join…) XML blob (query, combine) Picture (generate picture from numbers, combine pictures) Each value type comes complete with a repertoire of functions over it Bulk data operations A1 = …connect to a database relation… A2 = EXTEND( A1, [First Name], GetFirst( [Name] ) ) A3 = EXTEND( A1, [Last Name], GetLast( [Name] ) ) A4 = FILTER( A3, AND( [Age] > 30, [Age] < 50 ) ) A5 = SELECT( A4, [First Name], [Last Name], [Age] ) This stuff can be done today, by hand (e.g. Data/AutoFilter), but it can’t be automated robustly Extensible types It should be easy for a VB or C# programmer to add a new data type. All Excel needs to know about it is: How to display it How to “drill into” it to display its full value Perhaps, how to downcast it to a number/string The recalc chain and dependency analysis are completely unaffected Back to the supertanker Small crew, high-value payload, many customer requests, so systemic changes are not easy Excel 2003 is out -- the next version is being designed We’re talking to the Excel team regularly (weekly) Back to the supertanker Small crew, high-value payload, many customer requests, so systemic changes are not easy Excel 2003 is out -- the next version is being designed We’re talking to the Excel team weekly Next: higher order functions assertions, test generation static type system? Summary Functional programming End user & visual software engineering Psychology of programming Empower non-programmer end users (accountants, engineers, salesmen...) to do things they could not do before • Control complexity through building re-usable abstractions • Succeed in more ambitious applications • Encapsulate domain-specific expertise in function libraries • Crush more errors earlier Multi-disciplinary inputs http://research.microsoft.com/~simonpj/papers/excel