SNSF CONSOLIDATOR GRANT RESEARCH PROPOSAL ENGINEERING DISCOVERY IN MATHEMATICS PAUL-OLIVIER DEHAYE Part 1. State of the art and objectives 1. Previous insights: the rise of social machines 1.1. Nielsen’s philosophy. In his book Reinventing Discovery [44], Nielsen argues that a radical change needs to take place in the sciences, where too often discovery is left to chance. The challenge is to create an open environment that enables one curious observer (out of many) to connect the dots, reshare their insight, and iterate on that process. As explained by Gowers and Nielsen in their Nature paper [9], the polymath efforts have achieved huge success [30] using these techniques, and the mathematical community is slowly warming up to these collaborative ideas. We propose to go merely beyond observing such systems, and actually create new ones to engineer discovery. A prerequisite in Nielsen’s vision is to make accessible as much previous knowledge as possible. This is achieved by building information pools that concentrate and organize science. Already, numerous successful examples of such information commons have changed society, but also affected the way research is done: the ArXiv, for instance, makes publications quickly available before the long yet necessary process of peerreviewing. At the same time, large and effective collaborations need to sustain their size by encouraging newcomers to compensate for dropouts, hence onboarding costs need to be reduced as much as possible. Only in that way, can one create modular collaborations, where actors can jump in and out easily without jeopardizing the entire project. This brings special challenges in mathematics due to the scarcity of experts and the complexity of the subject matter. Despite this complexity, Nielsen insists that microcontributions are always important (a blog comment in polymath, for instance): they outwardly contribute little to the project, but give it momentum to make the significant discoveries. 1.2. Social machine of mathematics. These trends have been studied more quantitatively by Martin and Pease [41]. The concept of a social machine, first introduced by Berners-Lee [16], allows us to view a network of humans and computers as a single problem-solving entity. The polymath projects are one example of a social machine since it harnesses new technologies to provide a platform for massive collaborative mathematics. Unlike traditional collaboration through published work, participants share their thoughts/ideas and not only statements accompanied by a rigorous proof. Frontstage mathematics denotes the polished final product, while backstage mathematics consists of the informal conversations that might eventually lead to a formal statement. Projects such as polymath explicitly encourage researchers to share the backstage activities and collaborate at a much earlier stage in their research. In addition, the traces of this shared process can then also be studied to analyse how mathematical research works, as done by Martin and Pease [41]. They found that MathOverflow, another example of a widely used social machine with approximately 23,000 users, is a very effective tool given that 90 percent of the questions analysed were answered to the satisfaction of the questioner. In one third of the answers concrete examples were given to support or refute a conjecture and over half of the responses cited literature, which indicates that the mathematical community responds positively to such efforts. Beyond polymath and MathOverflow, one can find more and more examples of such social machines in mathematics: the sage computer algebra system or the LMFDB collaboration [52], for instance. There is also a very recent EPSRC grant in the field of collaborative mathematics formalisation, called ProofPeer [11]. CO-WRITTEN WITH REDA CHHAIBI, PATRICK KÜHN, HELEN RIEDTMANN AND HUAN XIONG 1 Paul-Olivier Dehaye Research Proposal SNSF 1.3. Citizen science. This evolution towards collaborative research goes beyond merely mathematics. Indeed, since 2009, Zooniverse.org [17] offers the possibility to anyone to contribute to scientific research, by performing tasks that are difficult to automate. These tasks tend to be relatively simple: they consist mostly of transcription (of XIXth century ship logs to assist modelling in climate science, for instance) or of feature recognition/annotation tasks (pictures of galaxies, of surface of Mars, of the sea floor, etc). They are explicitly geared towards distributing large data sets to humans. This established a new area called citizen science. Since the participants are given minimal training (a couple minutes) the tasks are basic and very repetitive. Some extra features (e.g. a forum) allow for a sense of community among the participants, and help discover unanticipated facts, such as the existence of green pea galaxies orHanny’s voorwerp [17]. Even the journal Nature offers a platform welcoming contributions from the general public on problems that baffle expert scientists [9]. 1.4. Crowdsourcing and collaborative intelligence. This citizen science movement fits into yet another even larger movement, that of crowdsourcing. Crowdsourcing occurs when tasks are distributed to the general public. Those tasks might be extremely simple, yet very hard to automate on a computer. This is exploited by the CAPTCHAs [56] system, for instance, where computer-generated swirling text is used to assess whether a potential new user of a website is really a human or a robot. When mixed with scanned words, this can be used to digitize books, as done by Google (reCAPTCHA [57]). In effect, Google is thus picking the brain of users, and with clever crosschecking it can foolproof that system against malicious participants. In fact, it is so successful that Google has now transitioned to showing pictures of street numbers, in order to improve its pinpointing of particular houses in Google Maps. An example of crowdsourcing platform for more generic tasks would be the Amazon Mechanical Turk service. In all these examples, "workers" never interact with each other and operate within very defined constraints. The system distributes very simple tasks massively and performs cross-checks to test for quality. Wikipedia uses another approach. Anybody is welcome to edit any given article and, because this requires high levels of expertise, little is automated. On the other hand, it recognises many user access levels, from mere user to administrators, ombudsman, steward, etc. Each level comes with rights and responsibilities, such as protecting pages where edit wars occur. This helps manage a sometimes unwieldy pool of contributors. When both aspects are present (automation and high-level tasks), can one properly say to have a social machine. Facebook is the canonical example (in the sense of Berners-Lee, beyond the mere presence of the word social in social network). Indeed, it solves the problem of collecting personal data for marketing purposes, by crowdsourcing to its users the option of tagging interesting content on the internet, writing wall posts, identifying people in pictures, etc. As a consequence of the diversity of possible workflows, a new field has emerged, called collaborative intelligence. Experts there aim to identify the features that make particular crowdsourcing approaches successful, and have narrowed down on specific traits [40]. For instance, some projects decide to exploit hierarchies, while others use an egalitarian approach. The mechanism used to motivate the participants varies as well (fun, recognition, money,...). The main challenge in collaborative intelligence is to design systems that allow attacking problems that usually require higher cognitive skills, and that are impossible to achieve single-handedly. 1.5. MOOCs. In 2012, with the founding of Udacity, Coursera, and the edX consortium, MOOCs formally took off [45]. They originated from the distribution of course material for classes taught using the flipped teaching model, in particular videos, lecture notes and machine graded homeworks. The demand for such courses had been unanticipated, and they have now brought higher education to millions of students all over the world. The massive levels of investments that they have obtained (USD 10s M each) have helped them to sustain a very rapid growth, both in terms of contents provided and number of students. It is hard to overemphasise the degree of dedication that some students display in those courses. Naturally, beyond simply doing their homework, students start helping each other, produce extremely detailed notes, contribute to the course material, and produce stellar projects. In other words, students do real work, some of it with commercial value, mostly because they are grateful for the education they receive. The emphasis in those MOOCs is scalability, and some have taken to call them xMOOCs, to contrast them with an earlier experiment: cMOOCs, where the c stands for connectivist. This adjective is more concerned with the communication means used by the students. More interesting to underscore is the contrast in the relation between students and instructor: in xMOOCs, instructors will actively push content to students, while in cMOOCs instructors will try to pull ideas originating from the class forums and highlight them (the connectivist highlights that in fact students will communicate with other tools than just the provided class forums). This insight came to the author after organising in the Fall of 2013, a course called MAT101, still viewable online [23]. We will give more details about this course in the methodology. 2 Paul-Olivier Dehaye Research Proposal SNSF Figure 1. Mathopoly, a game written by MAT101 students Markus Neumann, Xuan Hong Nguyen, Andrin Kalchbrenner and Silvan Biffiger. The complexity of this final project highlights the mix of skills employed: pure coding skills, GUI and creativity (the monopoly board matches the layout of the math institute, professors’ offices represent Monopoly properties, students can buy blackboards, etc). An additional insight is that each MOOC is ultimately a carefully crafted social machine, one that attracts easily of thousands of users who are there to learn, and that this learning might happen precisely by doing tasks that are hard to automate. MOOC platforms are ultimately only software. However, they are very useful software: they scale well, their development is rapid because of their large financial backing and they can be repurposed or adapted easily. 1.6. Flexiformalisation. Mathematical literature, from textbook to research material, knows different level of formalisation, ranging from the fully formal level to the level of mathematical natural language. It has been long realized that this is a problem, and indeed the recent project of a Digital Mathematics Library aims to integrate all the mathematical literature together [22]. Ultimately, the goal would be to build tools that answer questions such as "What are the most frequent ways to prove that a continuous function is, in fact, Lipschitz?". This could provide assistance to mathematicians trying to prove something by hand but also to any system trying to handle formal mathematics, for instance an automated theorem prover, by narrowing down more quickly on tried and tested strategies. The formalisation level refers to efforts using formal proof checkers or automated theorem provers such as Mizar [47], HOL (Light) [33] or Coq [14]. These have made significant progress over the past years: Four Color Theorem in 2005 and the Odd Order Theorem in 2012 by Gonthier and his team [28, 29], almost-complete proof of the Kepler conjecture by Hales and the Flypseck team [31], and efforts around the univalence axiom and homotopy type theory [54] by Voevodsky et al. However, it still seems unlikely that these efforts would manage to attract a significant part of the mathematical community, and there is an incompatibility problem as each is based on slightly different formal logics. Instead of aiming for the most formal human-produced content, an alternative, trying to automate the content formalisation, is to start from sources of papers on the ArXiv and extract formal knowledge. This is the approach followed by Kohlhase et al. with their ArXMLiv project [50]. Their work attempts to put semantic meaning on LATEX macros, and for this they crowdsource a lot of the work to the students at Jacobs University. One of the main results is that they are able to produce an intelligent search engine for ArXiV formulas [39]. Spanning all levels between full formalisation and this thin semantic layer above the ArXiv, stands flexiformalisation and in particular the MMT language, the next logical step after the OpenMath [20] and OMDoc [38] presentation formats. MMT is a knowledge representation format focused on modularity, logic-independence and flexibility. For instance, one could formalise in MMT the high level definition of Ring (i.e. at the level most mathematicians communicate) without specifying entirely the underlying logic used, and individuals could still 3 Paul-Olivier Dehaye Research Proposal SNSF redefine Ring according to their own preference (if the existence of a multiplicative unit is assumed, for instance). MMT theories can be edited relatively simply and directly, and are not expected to be full formalisations. It aims to replicate as closely as possible the level of formality used by mathematicians in their everyday work. In other sciences, one uses ontologies for this, but it will not work for mathematics, due simply to the complexity of the subject matter [46]. While very few mathematicians will directly contribute to an MMT flexiformalisation effort, due to a lack of time, the key insight is that many mathematicians have to devote incompressible time to teaching if done offline, and that they will want to use a platform that helps them save time teaching. 1.7. Mediation through computers offers new opportunities to collect and analyze data. This insight is of course the basis of Google’s and Facebook’s business models of collecting information. MOOCs offer specific opportunities in that respect, since the content has to be didactically presented and hence broken down and progressively built up. Simultaneously, several feedback loops can be added to assess what is deduced. 2. Potential impact We want to enable a new scientific workflow for mathematical research and teaching, tailored for large collaborations and that enables the general public to contribute. This will offer a new way to propel mathematics forward. This will affect: • students, who are provided better learning tools and start contributing to real research as part of their studies; • large collaborations of mathematicians, who will be offered better tools to work together; • the general crowdsourcing community, which will have means to raise the cognitive level of the microtasks offered; • the whole mathematical community, which will have rich content-aware mathematical text to work with. 3. Innovative aspects The most innovative aspect of this proposal is to realise that each individual MOOC is a social machine in itself, and that the software allows custom engineering of each machine towards research. This enables individual professors to create a new social machine, use it to push science problems to participants, to pull new scientific results from the same community, and to iterate on this process. By engineering these smart systems, we are able to not only enable individual learning but also a global one, i.e. discovery. The cognitive level of those tasks can be made much higher than with existing citizen science projects, thanks to the teaching architecture around the tasks and the numerous features of MOOC platforms, that include simultaneously peer feedback, complex input solutions, and high level automated grading. Different social machines could even be combined together if too many different skills would be required from participants. The innovation here is to marry all those insights together. Notice that they have been combined pairwise before: • Duolingo [55] uses crowdsourcing to translate websites, and teaches languages at the same time; • MOOCs (or even classical courses) use gamification to encourage students; • polymath crowdsources a scientific problem but does not use what would be considered standard in a MOOC, such as a forum system with categories; • a couple innovative MOOCs have been used for crowdsourcing [59, 35], but not exactly with a research goal. Part 2. Methodology 4. Studies or experiments envisaged We will now describe several tracks. Each track consists of successive experiments to perform. The tracks are independent of each other so they could and should be approached simultaneously. 4 Paul-Olivier Dehaye Research Proposal SNSF Each of these tracks presents the risk that not enough people would want to participate. To alleviate this, from the start, we need strong outreach efforts to advertise those experiments and their goals. At the same time, if needed, the experiments can simply be rerun at very low cost. The last track, Track Z, is the most complicated but will bring the highest gain. By engaging in the other tracks as well, we make Track Z more realistic: other tracks will provide us with more experience, more financing options, more visibility and more data. 4.1. Preliminary track: commonality of needs. Before introducing our methodology for the mathematical tracks, we want to outline an example highlighting interdisciplinary needs. Imagine a course in human geography entitled Migrations. Instead of the actual format of MOOCs pushing videos and homework on the students, we want to explore a more collaborative model: there are around 200 countries in the world, making for 2002 possible international migrations. As a first task in such a MOOC, students could be given the opportunity to vote on the hundred most interesting such migration paths, and each of those would be studied in the next homework by a much smaller group. The rest of the course would follow the students along and help them treat this homework as a research topic, exploring implications along different dimensions: consequences for the cultures and economies of the inbound and outbound countries, etc. Among their lectures, the students would be taught how to properly document their arguments, and to be critical of each other’s work, both within their working groups and later, via peer feedback, of each others’. Ultimately, not everything that would be created by the students would be useful to the professor, but this would certainly provide useful preliminary material for the book that the professor always wanted to write1. In a course entitled Financial Mathematics, one could imagine simulating a stock market, asking separate groups of students to assume different roles. Some will be market makers or hedgers, while a third group would be made of pure speculators. From all those interactions, a market price would emerge and give a primary market. There, the teacher can easily illustrate the concepts of market equilibrium and portfolio diversification. In parallel, in a more advanced course, students could be taught about the theory of financial pricing and offered to trade derivatives products. The class then witnesses the rise of a secondary market. A different course could provide different asset classes for the first class to monitor, interest rates for instance. All these examples highlight that these approaches, which are sometimes used in classrooms on a small scale, will definitely be tried on a larger scale soon. We are thus not only convinced that the ideas below are technically feasible, but that they will be highly facilitated by software contributions from the other sciences as well. 4.2. Track A: Easy initiatives for engineering knowledge discovery and archiving. The traditional notion that all mathematical discoveries are made through solitary endeavours no longer holds true. Indeed, massive collaborative projects, such as MathOverflow and the polymath projects, have already begun to change how some leading mathematicians do part of their research. In this track, we outline how to make this method of collaboration even more effective and prevalent. The polymath projects [30] are global collaborations to solve open conjectures. In 2009, Tim Gowers initiated the first polymath project – now referred to as polymath 1 – by asking the followers of his blog to post ideas on how to find a combinatorial proof of the Hales-Jewett density theorem. This social experiment turned out to be an unexpected success, which has led to further polymath projects. MathOverflow is an interactive mathematics website [8], which is both a collaborative knowledge base and an online community of mathematicians. It allows users to ask questions and submit answers, which in turn are both rated by other users, leading to reputation ratings for the contributors. According to Martin, the possibility of building a reputation motivates the users to submit questions and answers [41]. MathOverflow is primarily for asking questions on mathematical research, especially related to unsolved problems and the extension of knowledge. Similar sites exist for other subjects, such as physics or chemistry (cf StackOverflow, physics.StackExchange, chemistry.StackExchange, tex.StackExchange), as well as for students in mathematics (cf. math.StackExchange). Since it contains a large collection of tagged questions, it is a valuable tool for engineering serendipity: another mathematician will come later, find that question useful and realise others have been thinking about those issues. This serendipity could even be helped, if combined with other external services such as one matching academics. 1There is plenty of evidence that professors whose research has a geographical component have started to use MOOCs in this way, even without the divide-and-conquer-followed-by-peer-feedback approach. The books are currently being written... 5 Paul-Olivier Dehaye Research Proposal SNSF We envision a social machine that combines the polymath and MathOverflow ideas with MOOCs: owing to the high technicality of the subject and the lack of structure in its presentation (e.g. with comments inside a blog), the onboarding cost can be quite high after a polymath project has started. In this system, a leading member, such as a Fields medalist, would suggest a question. Immediately different avenues and ideas would be explored. So far this is the same workflow as in polymath. However, with a proper forum system subcommunities could form around given ideas. Obviously this breaks down the community of solvers, but every so often, when needed, individual leaders would step in each of those subcommunities to summarize (text or video) the approaches that have been tried thus far. This maintains a form of coherence and helps anyone, at any time, to understand approaches that have been tried elsewhere. It encourages and helps solvers to explore a diversity of ideas, and maybe recombine them into successful ones. It is also very inclusive of late comers. It even allows amateurs to get a glimpse of collaborative mathematical research, and potentially to help, for instance by finding papers that might be relevant or performing quick computer experiments if needed. This is reminescent of the Kasparov vs. the World chess match described by Nielsen [44, Chapter 2]. Going in the opposite direction, forums that emulate MathOverflow can be used in parallel to MOOCs. The reputation system could be integrated into the peer-grading system and forums. It would eventually allow good citizens to edit previously posted questions, which simultaneously improves the quality of the questions and hones the students’ skills in mathematical writing. The teacher’s role in policing forums could then be reduced. MOOC forums have been studied with encouraging results [36]. It is interesting to observe that the polymath approach, done completely in the open, is quite unique to mathematics. In other areas, either scientists hoard their knwoledge (with some exceptions) or financial incentives are quicker to come in the picture (cf. Innocentive [3] and other crowdsolving platforms). At a technical level, nothing in Track A is speculative: it is only a matter of integrating different tools together and repurposing them. Figure 2. The Catalan numbers course in preparation on the coursera.org platform. 4.3. Track B: Systematic combinatorics. In this track, we want to explain how first year undergraduates and the general public can start to contribute to mathematical research. It consists of a succession of three experiments: Catalan, FindStat and sage. This track also exploits our preliminary experiment with MAT101, teaching one hundred mathematics students how to use the python programming language. Python is an extremely versatile, high-level, free and open programming language. More and more branches of science use python, via lots of area-specific modules (for instance biophython, astropython or scikit-learn). In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. There are 207 combinatorial interpretations of Catalan numbers at the moment [51]. In the future, we plan to offer a course, called Catalan, for first 6 Paul-Olivier Dehaye Research Proposal SNSF Figure 3. A self-hosted instance of open edX, with mockups for a few of the courses described here. year mathematics students (but open to the world, at a scale of tens of thousands), and meant to teach them advanced programming concepts. As such it is just an extension of the existing MAT101, based around the not so complicated Catalan numbers. However, by cleverly designing the homework problems, one could get the totality of students to produce something useful: on a large scale, we could look at the 2072 bijections between classes. With a gamification element, we can encourage students to perform literature searches, implement known bijections and find new ones, and the incentives can be tweaked to encourage each. If quality of the code produced is a concern, then feedback mechanisms involving peers are possible and will lead to better code. If speed of the code is a concern, then the rules can be altered to open up the code submissions and encourage improvements through microcontributions (cf. the MathWorks competitions [44]). The output of such a course would be modest, but would nevertheless be useful: it would establish a pattern of contributions at a higher cognitive level for citizen science. In addition, it could provide a good data set for interesting economic studies, based on the response of students to the scoring rules used. The FindStat project provides an online platform for mathematicians, particularly for combinatorialists, to gather information about combinatorial statistics and their relations. As of January 2014, the FindStat database contains 173 statistics on 17 combinatorial collections. One can use it to test if some given data is a known combinatorial statistics in the database, or test if this data can be obtained from known combinatorial statistics in the database by applying combinatorial maps. It is clear that the FindStat project would benefit from the Catalan course, and that members of the FindStat project could expand that course to a FindStat course, covering more classes and bijections. Sage [53] is an open-source computer algebra system written in python. It is made of more than 100 packages and aims at teaching, experimenting and producing research in mathematics. Its tremendous success is largely due to the fact that a very large pool of very high end mathematicians contribute and regularly write code, with the help of a very active community. The code written in the FindStat course could then be integrated into sage, with the process for code submission and review explained in a sage course. That course, with careful engineering, could in fact aim for much more general, but not basic, mathematics that needs to be implemented in sage. 4.4. Track C: Large bodies of knowledge. Serendipity plays a huge role in mathematics. The reader will remember that John McKay’s observation linking dimensions of the Monster group and coefficients in the Fourier expansion of the modular function j eventually led to the Monstrous Moonshine and Borcherds’ Fields medal. Thirty-five years later we have still not quite automated such observations. While one can argue the Online Encyclopedia of Integer Sequences [49] provides this service, we can do much better. The only obstacle is human: mathematicians do not realize the value of a common effort in this direction. 7 Paul-Olivier Dehaye Research Proposal SNSF On the same note but even more serious, finding relevant mathematical papers is hard. We rely on imperfect search engines and incomplete tagging systems to retrieve our content. Most of the time we refer to unusual finds as luck. The aim of this track is to help mathematicians build more successful information commons, places where they can collect and organize their data. Even more interestingly, this process could be standardized so that these different information commons could be interoperable, sustainable and more beneficial. There are two sides to the issue. First the data needs to be put in common, then it needs to be analyzed. We summarize the current situation for a few mathematical databases, and we then outline a solution. 4.4.1. Gathering data. The On-Line Encyclopedia for Integer Sequences (OEIS) [49] is a database that collects integers sequences of mathematical interest. Each sequence is labeled with possible mathematical descriptions from the people who discovered them. It has been successful at forming a community of contributors, also outside of the mathematical community, but less successful at improving the services offered beyond simple search features. The Knot Atlas [13] lists non-equivalent knots by crossing number. It is a particularly useful tool for researchers in knot theory. Anyone can contribute, but in Nielsen’s judgement it is a failure [2]: it has not managed to attract a large pool of contributors, hence its limited or non-existent growth. Similarly, the knotinfo database of knot invariants is produced centrally, and while it contains very interesting data its interface is substandard. The ATLAS of Finite Groups and Groupprops gather information about groups. The first concerns finite simple groups [12], while the second concerns all kinds of group properties and relationships between those properties. The two are not linked, and in fact not linked to other databases either. Clearly cross-searches between the ATLAS and Groupprops would be useful, as well as with the LMFDB database that we describe next. The LMFDB Collaboration2 is a growing community of more than fifty mathematicians and computer scientists that aims to classify L-functions, modular forms and other related mathematical objects. The current size of the database is around 3-4 Tb, with the data publicly accessible [52]. While there is a large amount of data, there is also a large disparity in the formats used. This is unfortunate, as the main appeal of the project is supposed to be to unify all L-functions, despite the diversity of their constructions. A very good effort was made by the LMFDB to provide context to what can be extremely complex objects. While the first focus is to present data on the site, it is also possible to recall in a non-obtrusive way definitions of concepts by so-called knowls. These definitions are meant to be "bits of knowledge", i.e. more focused than an encyclopedic entry. Like more and more scientific collaborations nowadays, the LMFDB project uses git and the online platform github. These allow everyone to create their own separate branch of the project and edit it by themselves independently of master branch. Once a relevant change has been done to the individual branch of the project and the community has approved the changes, it is pushed to the upstream repository. People from different backgrounds work as a community to expand the database and build the whole platform. 4.4.2. Analyzing data. Kaggle [4] is a platform to host competitions for data scientists and statisticians. Organisations can put forward data and statistical problems then data scientists can work as individuals or teams competing for cash prizes. The competitors develop predictive algorithms using a subset of test data called training data. Algorithms are run immediately on separate test data and ranked. The test data used for ranking is not revealed to the competitors. Kaggle promotes gamification through monetary prizes and leaderboards. Competitors are able to collaborate using the forum: sharing thoughts, helping each other and suggesting new ideas. The forums also enable experts to team up in future competitions. Even after competitions have concluded the forum posts are still publicly available. Kaggle charges the client to host the contest, and then after 6 months starts leasing the solution for a monthly fee [5]. Invite-only competitions help alleviate the problem of data privacy [19]. Instructors from institutions such as universities can use Kaggle for class for assignments and projects free of charge. The aspects of Kaggle that will be used in this proposal are instant grade feedback an updated leader-board and collaboration through team effort and the forums. Unlike Kaggle any solution will be released to the public. 2LMFDB stands for L-function and Modular Forms DataBase. 8 Paul-Olivier Dehaye Research Proposal SNSF 4.4.3. The problem. The collaborations collecting data face serious obstacles to growth. Some of those listed before have never grown beyond a handful of contributors, and the LMFDB has stalled in some ways: onboarding new members takes too much time due to the complexity of the tools used, the mathematical background necessary, and the diversity of the code used. At every LMFDB workshop, introductory courses in how to use git and github, as well as the mathematical background and the whole architecture of the project, are given. This translates into a huge waste of workshop time, and is already an area where online courses could help. Graduate students or newcomers to the collaboration could come more prepared. Of course a lot of this course material (git, github) exists already and would simply need to be augmented with what is particular to the LMFDB workflow. Beyond that, another obstacle to growth is that very few mathematicians worldwide know about these objects, with maybe for each object only a couple experts invited to any workshop. Again, the result is that we spend a lot of time teaching each other about the objects. While this is not purely wasted time, it is not efficient either. 4.4.4. Our solution. Our solution is to put effort in the outreach earlier in the process. An online course (open, but not massive) explaining the mathematical background behind the objects and/or how to actively contribute to the expansion of the database would help welcome new members, and even allow to recruit outside of the mathematical community. We envision a system where research collaborations form around teaching first, so that information is easily available for each other. We feel this would be beneficial for new people joining research teams. A flexiformalisation into MMT would help make the problems of handling and analysing data uniform across mathematics, and would enable vast reuse of the techniques. For the analysis part, these datasets could be combined with data science courses, and people could be encouraged to crowdstorm the datasets [1]. 4.5. Track Z: From semantic annotation to formalisation. If we control a web platform that sees enough activity at teaching mathematics, then we could collect a lot of data that would be useful for a formalisation effort. In fact, we could develop a series of tools that would help for each part of the effort of formalisation. This track thus consists of two main experiments, to be tried concurrently, each spanning multiple courses: 4.5.1. Formalisation. The first step in the direction of full formalisation might be to simply teach a course on one of the formalisation tools available (HOL, Mizar, Coq,...). Due to the nature of the topic, the homework there could certainly be automated! In a second stage, this course could be equipped with a more elaborate collaborative element, replicating the workflow of the recent large proofs that have been produced: for the Flyspeck project, for instance, Hales wrote a textbook [32] that outlined the general formalised proof he was hoping to get to. Instead, this proof could have been given online, with the participants in the course breaking down the results as was needed. A gamification element could definitely be added. This approach has been used at another level of complexity in the DeduceIt system [26]. On top, a very recent research project proposed to use this Flyspeck almost-formal text, to learn how to automatically turn it into a Hol Light formalisation [37]. Part of the project is to combine statistical methods based on learning from existing annotations and knowledge with automated reasoning in large theories (ARLT). More concretely, semantic ARLT methods would be used to confirm or reject the statistically predicted formalisations. This project relies on the growth of formal corpora, a trend that could be aided by our project. 4.5.2. Flexiformalisation - MMT. The more radical contribution towards formalisation would come from the MMT language. As described earlier, MMT is a knowledge representation format focused on modularity, logicindependence and flexibility. It does not "trap" its user into using a specific logic, and does not require a full fledged (and irrealistic) formalisation right away. Teaching a course on MMT would be helpful: with proper tools in the rest of the courses offered, users could annotate and tag content in the MMT syntax across courses hosted on the platform. Conversely, a MMT formalisation of content would be useful in several ways: • as described earlier, to generate the architecture for databases of examples; • to progressively build a tutor [27], offering assistance to students by serving relevant content from other sources; • to encourage reuse among instructors of course content, by offering them the option to precisely tag the course content. 9 Paul-Olivier Dehaye Research Proposal SNSF 5. Methods This project involves a lot of technology, with some existing and a little bit to be created. P Web Resources MAT086 Course Platform A Student Feedback Group Projects S S S S S S Assistant Feedback P A A A Figure 4. The MAT101-MAT086 system: circles represent students, boxes represent professor(s) and the pentagons assistants. Content is actively pushed to the students via online material, collected from the professor and the internet. Students have homework to submit via the platform, for which they receive instant feedback (loop on the left). They also do group projects, with changing groups, and the output of their work is uploaded back on the platform, for the benefit of other students. This whole system was engineered by the participants in MAT086, a seminar on online education that was convened offline. 5.1. Example courses: MAT101-MAT086 system. In the Fall 2013, the author taught two courses jointly: MAT101 was a Python for mathematicians course, and MAT086 was a Seminar on online education. MAT101 was taught online, while MAT086 was taught offline. In Figure 4, we describe the complex social machine used for the course. Similar setups would be replicated for a course like Catalan. 5.2. Technological issues. 5.2.1. Course platform. The MOOC area evolves very fast. Therefore, we do not wish to restrain ourselves by focusing on only one platform, such as coursera.org: this might be limiting in the future. A particularly interesting alternative for us is the open source open edX, first started by MIT and Harvard and now joined by more and more universities. It is based on the component architecture XBlocks which offers the possibility of creating simple independent courseware units (in python) and integrating them in many different ways. We have started developing new XBlocks already in January 2014, together with students. Since the platform is licensed under the AGPL license, any contribution to the platform has to be released openly, and vice versa we benefit from any other contribution. Once we are ready to more sustainably host our own instance (beyond the current small scale install), we will offer the option to anyone in the scientific community to host there a course with a citizen science component (we crowdsource that too!). Courses could already be hosted at scale on that instance, but could be shifted later to other instances using the same software (such as edx.org or France Université Numérique’s fun.org), or hosted at several of these in parallel. Beyond merely hosting, we will help professors develop teaching apps that contribute to science. Open edX’s XBlocks architecture [25] makes this easy for anyone with a limited programming background, and we would help others get started. The platform is meant to be hosted in the cloud, so those costs will have to be taken into account, and we will need to devise proper mechanisms for covering them. We discuss this in the resources section. 5.3. Pedagogical tools. 10 Paul-Olivier Dehaye Research Proposal SNSF 5.3.1. A simple tutor. After a student watches a video, they are asked if they need help. If they do, the system would ask them some questions: "What are the main concepts?", "Which did you not understand?", "What background are you missing?", etc. This serves to tag material in the course (and relations between different parts), if the professor has not offered that information ahead of time. This information can then be used to serve back to the student a video from another course with the same material. Such an effort could also be tied to gamification, with the system asking more and more precise questions from the student to disambiguate their understanding of the material. We expect to be able to implement this system by integrating the work of [27] via XBlocks. 5.3.2. Shared preparation of material. This project can only gain by expanding the number of courses it hosts. To enable this, we need to give a tangible advantage compared to other platforms. For this, we will offer the option to professors to tag their content (or for assistants and students to help in that effort). Since many of us have to teach essentially the same canon (at least for the first few years of higher education), this offers the possibility to quickly build alternative explanations of the same concepts. This can even be expanded to other languages, and help disambiguate between terms. See [10] for a similar effort. A simplification of this workflow can be enabled via the open source edx-presenter [42], originally developed by one of my students. 5.3.3. Note taking / glossary. Students could be offered the option of taking collective notes or maintaining a glossary, to help them keep track of definitions. The shared nature would encourage the production of (annotable) high quality content, which would be of high value for annotation and formalisation. 5.3.4. Group projects. Students need to be divided into subgroups (potentially very large), so their attention can be focused on smaller problems. This is clearly a direction that other open edX developers are going in. 5.3.5. Motivators. Different motivators could be used in the course. If the contributions relate well to the course content, then grades or reputation could be at stake. Otherwise cash prizes are possible. If a contest is set up, then there is also the possibility to offer travel grants and internships for the best contributors. 5.4. Math-specific tools. In general, the goal here is to formalise the content of a mathematics course, by building tools that students and professors want to use. We give in Figure 5 a graphical representation of the process. This is simply an adaptation to MMT of the concept of semantic games, usually applied to build or populate ontologies. Informal, Human Readable Q Quality Control Formal, Machine Readable Figure 5. The process used to bring content towards a more formal description. This picture applies at every step up the flexiformalisation ladder. At each stage, the system can take its current understanding of the content, and reformulate it for human consumption (via the mechanism of proof sketches [58], for instance). This is used to ask questions (Q) to the students that can be used after crosschecking and quality control, to ensure that the deductions are correct. 11 Paul-Olivier Dehaye Research Proposal SNSF 5.4.1. MMT integration. For course content involving scientific, particularly mathematical documents, our infrastructure will integrate the OMDoc/MMT framework. This is a knowledge representation framework that focuses on formal mathematical content but also degrades gracefully to partially formal or completely informal content. The language is logic-independent so that formal content can be written in any logic, and the system is application-independent so that it can be flexibly integrated with our infrastructure. OMDoc/MMT permits enriching course content (such as an open-source textbook [15]) with added-value services that are aware of the semantics of the displayed mathematical objects. Such services include navigation, definition lookup, dynamic interaction with mathematical objects (e.g., folding, type inference), change management, interactive examples, searching for mathematical objects, or change of notation. Based on OMDoc/MMT, we will develop additional MOOC-specific services. For example, we will permit students to write their own examples and have the system check them with respect to the course content. Moreover, we will develop interfaces that permit students to improve the formalisation status of the content. In practice, instructors can only partially formalise the content due to the effort involved. Here our systems helps because students can – as a part of their learning experience – improve the course materials. Specifically, students will be able to • annotate content with meta-information (e.g., this is a proof, this is what is being defined,...); • annotate content with cross-references; • formalise informal objects; • add examples; • ask questions about and cross-referenced with individual (sub-)formulas. 5.4.2. Understanding proofs. To help understand a proof, one can start from an existing proof, and ask students a variety of questions: "Does this proof work by induction? By contradiction?", "Write in your own words the contradiction that is reached", "What are we inducing over?", etc. In fact, assistants could tag the proofs, which would automatically create appropriate quizzes (to be graded with peer-grading if the output is written in natural language). Of course this also benefits any machine observing the interaction with the student. 5.4.3. Understanding how people write proofs. One could simply ask as a homework to write a proof of a particular statement (this is already done in Devlin’s "Introduction to mathematical thinking" class on coursera.org, and graded using peer-grading). One can then analyse the different stylistic features of the text written, or even try to recover the fundamentally different proofs of the same statement (this is already done for programming assignments on some coursera classes, see [43]). The interactions described so far are very basic, and meant mostly to help collect data that can be refined into a flexiformalisation. Depending on success, we hope to attract the computer formalisation community to integrate their tools with ours, and enlarge their pool of contributors by several orders of magnitude. 5.5. Feedback mechanisms / redundancy in the approach. This project could benefit from many feedback mechanisms: once a course on python for science is built, some participating students can contribute back to the project by building their own XBlocks and extensions to the platform. In general, any process we control and "master" would then be taught, to increase its momentum. Similarly, a completely open data approach (accounting for privacy issues) would help analyse most effectively all the data produced, by encouraging further contributions from participants in the analysis. In addition, there is a substantial amount of redundancy in the approach: different courses can be taught in parallel, and they do not all need to succeed at once. They can just be tweaked and re-run if needed. 5.6. Ethical issues. It is to be expected that ethical issues will be raised when collecting data in MOOCs or crowdsourcing platforms. The thought that an institution can track every individual’s action within a software application will be threatening to some potential users. In fact, this is one concern that needs to be addressed before applying learning analytics [21]: does an individual need to provide formal consent before data can be collected? Does an individual have the choice not to participate in a learning analytics project but still use the software? However, unlike learning analytics, the crowdsourcing projects we envision for MOOCs do not require the analysis of any one person’s action, but rather of the collection of all users taken together, which might alleviate some concerns. We plan to follow the stronger guidelines used for studies in education [7], in addition to the Swiss guidelines on the collection of user data. 5.7. Community outreach and management. A significant and conscious effort will also be dedicated to managing the community of participants, communicate the results obtained, etc. 12 Paul-Olivier Dehaye Research Proposal SNSF Part 3. Resources 6. Budget The accounting of the budget is summarised in a dedicated table elsewhere. We see three main expenses for this project: salaries for scientific collaborators (one postdoc and one PhD student), salaries for technical contributors (IT staff, software developers possibly via consultancy) and hosting costs. An additional source of expenses would be travel expenses: small travel grants could also be used to motivate remote students to participate in the experiments. The best students would then be invited to come and work with us for a few weeks. An essential need is with software engineers. Based on discussions with the engineers in charge of edx.org and StanfordOnline, we estimate that we would need someone at 20 percent to simply host the software, perform upgrades, backups, etc. Beyond that, we need another part-time engineer with different skills to extend the platform in the desired directions, for instance by programming XBlocks. We also require funding for some scientists (one PhD, one postdoc), preferably with a strong computer science background. They would implement the experiments outlined here (under my supervision), design new ones, and progressively help others implement theirs. In the spirit of making teaching useful for research, they would of course dedicate their teaching time to this project as well. Together with Huan Xiong, one of my current PhD students, we are starting to work on the Catalan course. Another PhD student, Patrick Kühn, is likely to work on an LMFDB-related online course. I would supervise and coordinate the work of all these project members, on top of setting up my own experiments and teaching within this project. This work would be done in parallel to supervising other students and postdocs in the context of other research proposals. This dedicated staff will help me get started, so I can start tapping into other sources of funding as soon as possible. 6.1. Crowdfunding. Kickstarter [6] is an online platform where people who are interested in a promising project can contribute through donations to its realisation. This has been used in the context of MOOCs and citizen science before [34], and would obviously be a very relevant source of funding. 6.2. External scientific projects. If we develop the infrastructure to support citizen science projects through MOOCs, then our expertise, both technical and pedagogical, could be valuable for other projects that might not necessarily want to build up an infrastructure themselves. Another source of funding would then be to collaborate on those projects, a bit like the Zooniverse project does [17]. 6.3. Interdisciplinary scientific projects. The interaction of students with this platform offers the opportunity for two additional types of research projects. 6.3.1. Education. Learning or educational analytics are concerned with collecting and analysing data about learners in order to understand and optimise their learning [48]. This also allows the teaching staff to make evidence-based and accountable decisions about admissions or at-risk students [21]. In this context MOOCs offer new opportunities: all learners produce data trails through their online activities, the analysis of which might offer valuable insights into the learning process [48]. However, whether any institution should be allowed to track the learners movements will raise ethical issues [21], different from the ones we face for our main research goal. Research into edX’s first MOOC [18] is very indicative of the possibilities of this line of research, and more and more conferences are organised on the topic, for instance [7]. 6.3.2. Economical and social studies. Since ultimately students are producing work for our research project, they need to be properly incentivised. Interesting studies can certainly be done on the strategies adopted by the students when facing the different motivators. The insight can be very helpful in order to implement efficient stick-and-carrot tactics. 6.4. Development for companies. Companies such as Innocentive [3] offer cash prizes to their problem solvers, and simply host the problem descriptions. We could do something similar with our more evolved products, and try to offer a basis for other paying services helping academics (such as services matching academics by skills). 13 Paul-Olivier Dehaye Research Proposal SNSF Part 4. Additional comment 7. Dedication to this problem Problems like these require a lot of dedication. A significant effort is needed to build a community and convince colleagues to participate. In some way, I have been vexed by the problem of interactive proof assistants for quite a while: I remember reading a long time ago in Logique, informatique et paradoxe by J.-P. Delahaye of the promise of these proof assistants. As I have learned more mathematics, I have realized this was still a long way away. I have also understood that working on such a problem could be dangerous for a career, but have kept myself informed of progress in that area. With my SNF-Förderungsprofessur grant, I have been more willing to take careful steps in this direction. For instance, I have organized a workshop on management of mathematical data [24], which was not entirely a success. At this workshop I had invited also an astronomer, Françoise Genova, who had been very successful in leading a large group towards federating different astronomy databases together. Were also invited two software engineers from the Logilab company (Nicolas Chauvat and Florent Cayre), who have developed software for the semantic web. All three had the same recommendation: to use ontologies to structure our data. On the other hand, all the mathematicians invited did not see the interest in any structured effort, because its rewards were too distant. After looking more carefully at the suggestion of ontologies, it was clear that they would not cut it: mathematics is too complex for them, they are designed to deal mostly with the real world. Fortunately, Florian Rabe’s MMT language is the perfect substitute for ontologies in a mathematical context (this is in fact why it was invented). I was still left with the problem of finding a way to convince fellow mathematicians to participate. When MOOCs came out, I understood very quickly their potential for this problem: mathematicians could contribute via their teaching time towards research as well. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] Crowdstorming of datasets. https://osf.io/gvm2z/. @Google presents Michael Nielsen: Reinventing Discovery. https://www.youtube.com/watch?v=Kf2qO0plUKs#t=18m. Innocentive. https://www.innocentive.com/. Kaggle. http://www.kaggle.com. Kaggle: big data. http://www.inc.com/magazine201403/darren-dahl/big-data-crowdsourcing-kaggle.html. Kickstarter. http://www.kickstarter.com/. Learning at Scale conference. http://learningatscale.acm.org. MathOverflow. http://mathoverflow.net/. Nature Open Innovation Pavilion. http://www.nature.com/openinnovation/index.html. Semantic Data Web lecture series. http://slidewiki.org/deck/750#tree-0-deck-750-1-view. ProofPeer project. http://proofpeer.net/, 2014. R. Abbott, J. Bray, S. Linton, S. Nickerson, S. Norton, R. Parker, I. Suleiman, J. Tripp, P. Walsh, and R. Wilson. Atlas of Finite Group Representations. http://brauer.maths.qmul.ac.uk/Atlas/v3/. D. Bar-Natan and S. Morrison. The Knot Atlas. http://katlas.org. B. Barras, S. Boutin, C. Cornes, J. Courant, J.-C. Filliatre, E. Gimenez, H. Herbelin, G. Huet, C. Munoz, C. Murthy, et al. The Coq proof assistant reference manual: Version 6.1. 1997. R. Beezer. A first course in linear algebra. T. Berners-Lee and M. Fischetti. Weaving the Web: the original design and ultimate destiny of the World Wide Web by its inventor. HarperBusiness, 1999. K. Borne and the Zooniverse Team. The Zooniverse: A framework for knowledge discovery from citizen science data. In AGU Fall Meeting Abstracts, volume 1, page 0650, 2011. L. Breslow, D. E. Pritchard, J. DeBoer, G. S. Stump, A. D. Ho, and D. T. Seaton. Studying learning in the worldwide classroom: Research into edX’s first MOOC, 2013. J. Brustein. Kaggle’s William Cukierski on Data Sharing Competitions. Business Week, March 6, 2014. S. Buswell, O. Caprotti, D. P. Carlisle, M. C. Dewar, M. Gaetano, and M. Kohlhase. The OpenMath standard. 2004. J. Campbell, P. DeBlois, and D. Oblinger. Academic analytics: A new tool for a new era, 2007. Committee on Planning a Global Library of the Mathematical Sciences; Board on Mathematical Sciences and Their Applications; Division on Engineering and Physical Sciences; National Research Council. Developing a 21st Century Global Library for Mathematics Research. The National Academies Press, 2014. P.-O. Dehaye. MAT101: python programming for mathematicians. http://edx.math.uzh.ch, 2013. P.-O. Dehaye and N. Thiéry. Online databases: from L-functions to combinatorics. http://aimath.org/pastworkshops/ onlinedata.html, 2013. edX Consortium. Xblock documentation. http://xblock.readthedocs.org/, 2014. E. Fast, C. Lee, A. Aiken, M. S. Bernstein, D. Koller, and E. Smith. Crowd-scale interactive formal reasoning and analytics. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST ’13, pages 363–372, New York, NY, USA, 2013. ACM. 14 Paul-Olivier Dehaye Research Proposal SNSF [27] M. Floryan. Evolving expert knowledge bases: applications of crowdsourcing and serious gaming to advance knowledge development for intelligent tutoring systems. PhD thesis, University of Massachusetts Amherst, 2013. [28] G. Gonthier. Formal proof–the four-color theorem. Notices of the AMS, 55(11):1382–1393, 2008. [29] G. Gonthier, A. Asperti, J. Avigad, Y. Bertot, C. Cohen, F. Garillot, S. Le Roux, A. Mahboubi, R. O’Connor, S. O. Biha, et al. A machine-checked proof of the odd order theorem. Interactive Theorem Proving, pages 163–179, 2013. [30] T. Gowers and M. Nielsen. Massively collaborative mathematics. Nature, 461(7266):879–881, 2009. [31] T. C. Hales. Introduction to the Flyspeck project. 2006. [32] T. C. Hales. Dense sphere packings, volume 400 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, 2012. A blueprint for formal proofs. [33] J. Harrison. HOL light: An overview. In Theorem Proving in Higher Order Logics, pages 60–66. Springer, 2009. [34] L. Hockenson. DIY science MOOC seeks funding on Kickstarter to conduct brain experiments at home. http://gigaom.com/ 2013/09/11/diy-science-mooc-seeks-funding-on-kickstarter-to-conduct-brain-experiments-at-home/. [35] B. Howe. Introduction to Data Science MOOC. https://www.coursolve.org/courseproject/2. [36] J. Huang, A. Dasgupta, A. Ghosh, J. Manning, and M. Sanders. Superposter behavior in mooc forums. In Proceedings of the First ACM Conference on Learning @ Scale Conference, L@S ’14, pages 117–126, New York, NY, USA, 2014. ACM. [37] C. Kaliszyk, J. Urban, J. Vyskocil, and H. Geuver. Developing corpus-based translation methods between informal and formal mathematics: Project description. arXiv:1405.3451, 2014. [38] M. Kohlhase. OMDoc: An Open Markup Format for Mathematical Documents (Version 1.2). Number 4180 in Lecture Notes in Artificial Intelligence. Springer, 2006. [39] M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages 241–253. Springer, 2006. [40] T. W. Malone, R. Laubacher, and C. Dellarocas. The collective intelligence genome. IEEE Engineering Management Review, 38(3):38, 2010. [41] U. Martin and A. Pease. Mathematical practice, crowdsourcing, and social machines. In Intelligent Computer Mathematics, pages 98–119. Springer, 2013. [42] K. Mösinger. Edx-presenter, a tool for shared course preparation. https://github.com/mokaspar/edx-presenter. [43] A. Nguyen, C. Piech, J. Huang, and L. Guibas. Codewebs: Scalable homework search for massive open online programming courses. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 491–502, Republic and Canton of Geneva, Switzerland, 2014. International World Wide Web Conferences Steering Committee. [44] M. Nielsen. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2011. [45] L. Pappano. The year of the MOOC. The New York Times, November 2, 2012. [46] F. Rabe. The MMT language. PhD thesis, Jacobs University, 2009. [47] P. Rudnicki. An overview of the Mizar project. In Proceedings of the 1992 Workshop on Types for Proofs and Programs, pages 311–330, 1992. [48] G. Siemens and P. Long. Penetrating the fog: Analytics in learning and education, 2011. [49] N. Sloane et al. The On-Line Encyclopedia of Integer Sequences (OEIS). http://www.oeis.org, 2013. [50] H. Stamerjohanns, M. Kohlhase, D. Ginev, C. David, and B. Miller. Transforming large collections of scientific publications to XML. Mathematics in Computer Science, 3(3):299–307, 2010. [51] R. P. Stanley. Catalan addendum. http://www-math.mit.edu/~rstan/ec/, 2013. [52] The LMFDB Collaboration. The L-functions and Modular Forms Database. http://www.lmfdb.org/, 2013. [53] The Sage Development Team. The Sage-Combinat community, Sage-Combinat: enhancing Sage as a toolbox for computer exploration in algebraic combinatorics. http://combinat.sagemath.org, 2011. [54] V. Voevodsky et al. The Univalent Foundations Program. Homotopy type theory: Univalent foundations of mathematics. Technical report, Institute for Advanced Study, 2013. [55] L. von Ahn. Duolingo: learn a language for free while helping to translate the web. In Proceedings of the 2013 international conference on Intelligent user interfaces, pages 1–2. ACM, 2013. [56] L. Von Ahn, M. Blum, N. J. Hopper, and J. Langford. CAPTCHA: Using hard AI problems for security. Advances in Cryptology, EUROCRYPT 2003, pages 294–311. [57] L. Von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum. reCAPTCHA: Human-based character recognition via web security measures. Science, 321(5895):1465–1468, 2008. [58] F. Wiedijk. Formal proof sketches. In Types for Proofs and Programs, pages 378–393. Springer, 2004. [59] N. Zafrin, N. Gillani, and M. Lenox. A New Use for MOOCs: Real-World Problem Solving. Harvard Business Review, July 2013. 15