ENGINEERING DISCOVERY IN MATHEMATICS

advertisement
SNSF CONSOLIDATOR GRANT
RESEARCH PROPOSAL
ENGINEERING DISCOVERY IN MATHEMATICS
PAUL-OLIVIER DEHAYE
Part 1. State of the art and objectives
1. Previous insights: the rise of social machines
1.1. Nielsen’s philosophy. In his book Reinventing Discovery [44], Nielsen argues that a radical change needs
to take place in the sciences, where too often discovery is left to chance. The challenge is to create an open
environment that enables one curious observer (out of many) to connect the dots, reshare their insight, and
iterate on that process. As explained by Gowers and Nielsen in their Nature paper [9], the polymath efforts
have achieved huge success [30] using these techniques, and the mathematical community is slowly warming
up to these collaborative ideas. We propose to go merely beyond observing such systems, and actually create
new ones to engineer discovery.
A prerequisite in Nielsen’s vision is to make accessible as much previous knowledge as possible. This is
achieved by building information pools that concentrate and organize science. Already, numerous successful
examples of such information commons have changed society, but also affected the way research is done:
the ArXiv, for instance, makes publications quickly available before the long yet necessary process of peerreviewing.
At the same time, large and effective collaborations need to sustain their size by encouraging newcomers
to compensate for dropouts, hence onboarding costs need to be reduced as much as possible. Only in that
way, can one create modular collaborations, where actors can jump in and out easily without jeopardizing the
entire project. This brings special challenges in mathematics due to the scarcity of experts and the complexity
of the subject matter.
Despite this complexity, Nielsen insists that microcontributions are always important (a blog comment in
polymath, for instance): they outwardly contribute little to the project, but give it momentum to make the
significant discoveries.
1.2. Social machine of mathematics. These trends have been studied more quantitatively by Martin and Pease
[41]. The concept of a social machine, first introduced by Berners-Lee [16], allows us to view a network of
humans and computers as a single problem-solving entity. The polymath projects are one example of a social
machine since it harnesses new technologies to provide a platform for massive collaborative mathematics.
Unlike traditional collaboration through published work, participants share their thoughts/ideas and not only
statements accompanied by a rigorous proof. Frontstage mathematics denotes the polished final product, while
backstage mathematics consists of the informal conversations that might eventually lead to a formal statement.
Projects such as polymath explicitly encourage researchers to share the backstage activities and collaborate at
a much earlier stage in their research. In addition, the traces of this shared process can then also be studied to
analyse how mathematical research works, as done by Martin and Pease [41]. They found that MathOverflow,
another example of a widely used social machine with approximately 23,000 users, is a very effective tool given
that 90 percent of the questions analysed were answered to the satisfaction of the questioner. In one third of
the answers concrete examples were given to support or refute a conjecture and over half of the responses cited
literature, which indicates that the mathematical community responds positively to such efforts.
Beyond polymath and MathOverflow, one can find more and more examples of such social machines in
mathematics: the sage computer algebra system or the LMFDB collaboration [52], for instance. There is also
a very recent EPSRC grant in the field of collaborative mathematics formalisation, called ProofPeer [11].
CO-WRITTEN WITH REDA CHHAIBI, PATRICK KÜHN, HELEN RIEDTMANN AND HUAN XIONG
1
Paul-Olivier Dehaye
Research Proposal
SNSF
1.3. Citizen science. This evolution towards collaborative research goes beyond merely mathematics.
Indeed, since 2009, Zooniverse.org [17] offers the possibility to anyone to contribute to scientific research,
by performing tasks that are difficult to automate. These tasks tend to be relatively simple: they consist mostly
of transcription (of XIXth century ship logs to assist modelling in climate science, for instance) or of feature
recognition/annotation tasks (pictures of galaxies, of surface of Mars, of the sea floor, etc). They are explicitly
geared towards distributing large data sets to humans. This established a new area called citizen science. Since
the participants are given minimal training (a couple minutes) the tasks are basic and very repetitive. Some extra
features (e.g. a forum) allow for a sense of community among the participants, and help discover unanticipated
facts, such as the existence of green pea galaxies orHanny’s voorwerp [17]. Even the journal Nature offers a
platform welcoming contributions from the general public on problems that baffle expert scientists [9].
1.4. Crowdsourcing and collaborative intelligence. This citizen science movement fits into yet another even
larger movement, that of crowdsourcing. Crowdsourcing occurs when tasks are distributed to the general public.
Those tasks might be extremely simple, yet very hard to automate on a computer.
This is exploited by the CAPTCHAs [56] system, for instance, where computer-generated swirling text is
used to assess whether a potential new user of a website is really a human or a robot. When mixed with
scanned words, this can be used to digitize books, as done by Google (reCAPTCHA [57]). In effect, Google
is thus picking the brain of users, and with clever crosschecking it can foolproof that system against malicious
participants. In fact, it is so successful that Google has now transitioned to showing pictures of street numbers,
in order to improve its pinpointing of particular houses in Google Maps. An example of crowdsourcing platform
for more generic tasks would be the Amazon Mechanical Turk service.
In all these examples, "workers" never interact with each other and operate within very defined constraints.
The system distributes very simple tasks massively and performs cross-checks to test for quality.
Wikipedia uses another approach. Anybody is welcome to edit any given article and, because this requires
high levels of expertise, little is automated. On the other hand, it recognises many user access levels, from mere
user to administrators, ombudsman, steward, etc. Each level comes with rights and responsibilities, such as
protecting pages where edit wars occur. This helps manage a sometimes unwieldy pool of contributors.
When both aspects are present (automation and high-level tasks), can one properly say to have a social
machine. Facebook is the canonical example (in the sense of Berners-Lee, beyond the mere presence of the
word social in social network). Indeed, it solves the problem of collecting personal data for marketing purposes,
by crowdsourcing to its users the option of tagging interesting content on the internet, writing wall posts,
identifying people in pictures, etc.
As a consequence of the diversity of possible workflows, a new field has emerged, called collaborative intelligence. Experts there aim to identify the features that make particular crowdsourcing approaches successful,
and have narrowed down on specific traits [40]. For instance, some projects decide to exploit hierarchies, while
others use an egalitarian approach. The mechanism used to motivate the participants varies as well (fun, recognition, money,...). The main challenge in collaborative intelligence is to design systems that allow attacking
problems that usually require higher cognitive skills, and that are impossible to achieve single-handedly.
1.5. MOOCs. In 2012, with the founding of Udacity, Coursera, and the edX consortium, MOOCs formally
took off [45]. They originated from the distribution of course material for classes taught using the flipped teaching model, in particular videos, lecture notes and machine graded homeworks. The demand for such courses
had been unanticipated, and they have now brought higher education to millions of students all over the world.
The massive levels of investments that they have obtained (USD 10s M each) have helped them to sustain a
very rapid growth, both in terms of contents provided and number of students. It is hard to overemphasise the
degree of dedication that some students display in those courses. Naturally, beyond simply doing their homework, students start helping each other, produce extremely detailed notes, contribute to the course material,
and produce stellar projects. In other words, students do real work, some of it with commercial value, mostly
because they are grateful for the education they receive.
The emphasis in those MOOCs is scalability, and some have taken to call them xMOOCs, to contrast them
with an earlier experiment: cMOOCs, where the c stands for connectivist. This adjective is more concerned with
the communication means used by the students. More interesting to underscore is the contrast in the relation
between students and instructor: in xMOOCs, instructors will actively push content to students, while in
cMOOCs instructors will try to pull ideas originating from the class forums and highlight them (the connectivist
highlights that in fact students will communicate with other tools than just the provided class forums).
This insight came to the author after organising in the Fall of 2013, a course called MAT101, still viewable
online [23]. We will give more details about this course in the methodology.
2
Paul-Olivier Dehaye
Research Proposal
SNSF
Figure 1. Mathopoly, a game written by MAT101 students Markus Neumann, Xuan Hong
Nguyen, Andrin Kalchbrenner and Silvan Biffiger. The complexity of this final project highlights the mix of skills employed: pure coding skills, GUI and creativity (the monopoly board
matches the layout of the math institute, professors’ offices represent Monopoly properties,
students can buy blackboards, etc).
An additional insight is that each MOOC is ultimately a carefully crafted social machine, one that attracts
easily of thousands of users who are there to learn, and that this learning might happen precisely by doing tasks
that are hard to automate. MOOC platforms are ultimately only software. However, they are very useful
software: they scale well, their development is rapid because of their large financial backing and they can be
repurposed or adapted easily.
1.6. Flexiformalisation. Mathematical literature, from textbook to research material, knows different level of
formalisation, ranging from the fully formal level to the level of mathematical natural language.
It has been long realized that this is a problem, and indeed the recent project of a Digital Mathematics Library
aims to integrate all the mathematical literature together [22]. Ultimately, the goal would be to build tools that
answer questions such as "What are the most frequent ways to prove that a continuous function is, in fact,
Lipschitz?". This could provide assistance to mathematicians trying to prove something by hand but also to any
system trying to handle formal mathematics, for instance an automated theorem prover, by narrowing down
more quickly on tried and tested strategies.
The formalisation level refers to efforts using formal proof checkers or automated theorem provers such
as Mizar [47], HOL (Light) [33] or Coq [14]. These have made significant progress over the past years: Four
Color Theorem in 2005 and the Odd Order Theorem in 2012 by Gonthier and his team [28, 29], almost-complete
proof of the Kepler conjecture by Hales and the Flypseck team [31], and efforts around the univalence axiom
and homotopy type theory [54] by Voevodsky et al. However, it still seems unlikely that these efforts would
manage to attract a significant part of the mathematical community, and there is an incompatibility problem as
each is based on slightly different formal logics.
Instead of aiming for the most formal human-produced content, an alternative, trying to automate the content
formalisation, is to start from sources of papers on the ArXiv and extract formal knowledge. This is the
approach followed by Kohlhase et al. with their ArXMLiv project [50]. Their work attempts to put semantic
meaning on LATEX macros, and for this they crowdsource a lot of the work to the students at Jacobs University.
One of the main results is that they are able to produce an intelligent search engine for ArXiV formulas [39].
Spanning all levels between full formalisation and this thin semantic layer above the ArXiv, stands flexiformalisation and in particular the MMT language, the next logical step after the OpenMath [20] and OMDoc [38]
presentation formats. MMT is a knowledge representation format focused on modularity, logic-independence
and flexibility. For instance, one could formalise in MMT the high level definition of Ring (i.e. at the level most
mathematicians communicate) without specifying entirely the underlying logic used, and individuals could still
3
Paul-Olivier Dehaye
Research Proposal
SNSF
redefine Ring according to their own preference (if the existence of a multiplicative unit is assumed, for instance). MMT theories can be edited relatively simply and directly, and are not expected to be full formalisations. It aims to replicate as closely as possible the level of formality used by mathematicians in their everyday
work. In other sciences, one uses ontologies for this, but it will not work for mathematics, due simply to the
complexity of the subject matter [46].
While very few mathematicians will directly contribute to an MMT flexiformalisation effort, due to a lack of
time, the key insight is that many mathematicians have to devote incompressible time to teaching if done
offline, and that they will want to use a platform that helps them save time teaching.
1.7. Mediation through computers offers new opportunities to collect and analyze data. This insight is of
course the basis of Google’s and Facebook’s business models of collecting information.
MOOCs offer specific opportunities in that respect, since the content has to be didactically presented and
hence broken down and progressively built up. Simultaneously, several feedback loops can be added to assess
what is deduced.
2. Potential impact
We want to enable a new scientific workflow for mathematical research and teaching, tailored for large
collaborations and that enables the general public to contribute. This will offer a new way to propel mathematics
forward.
This will affect:
• students, who are provided better learning tools and start contributing to real research as part of their
studies;
• large collaborations of mathematicians, who will be offered better tools to work together;
• the general crowdsourcing community, which will have means to raise the cognitive level of the microtasks offered;
• the whole mathematical community, which will have rich content-aware mathematical text to work
with.
3. Innovative aspects
The most innovative aspect of this proposal is to realise that each individual MOOC is a social machine
in itself, and that the software allows custom engineering of each machine towards research. This enables
individual professors to create a new social machine, use it to push science problems to participants, to pull
new scientific results from the same community, and to iterate on this process. By engineering these smart
systems, we are able to not only enable individual learning but also a global one, i.e. discovery.
The cognitive level of those tasks can be made much higher than with existing citizen science projects,
thanks to the teaching architecture around the tasks and the numerous features of MOOC platforms, that include
simultaneously peer feedback, complex input solutions, and high level automated grading.
Different social machines could even be combined together if too many different skills would be required
from participants.
The innovation here is to marry all those insights together. Notice that they have been combined pairwise
before:
• Duolingo [55] uses crowdsourcing to translate websites, and teaches languages at the same time;
• MOOCs (or even classical courses) use gamification to encourage students;
• polymath crowdsources a scientific problem but does not use what would be considered standard in a
MOOC, such as a forum system with categories;
• a couple innovative MOOCs have been used for crowdsourcing [59, 35], but not exactly with a research
goal.
Part 2. Methodology
4. Studies or experiments envisaged
We will now describe several tracks. Each track consists of successive experiments to perform. The tracks
are independent of each other so they could and should be approached simultaneously.
4
Paul-Olivier Dehaye
Research Proposal
SNSF
Each of these tracks presents the risk that not enough people would want to participate. To alleviate this,
from the start, we need strong outreach efforts to advertise those experiments and their goals. At the same time,
if needed, the experiments can simply be rerun at very low cost.
The last track, Track Z, is the most complicated but will bring the highest gain. By engaging in the other
tracks as well, we make Track Z more realistic: other tracks will provide us with more experience, more
financing options, more visibility and more data.
4.1. Preliminary track: commonality of needs. Before introducing our methodology for the mathematical
tracks, we want to outline an example highlighting interdisciplinary needs.
Imagine a course in human geography entitled Migrations. Instead of the actual format of MOOCs pushing
videos and homework on the students, we want to explore a more collaborative model: there are around 200
countries in the world, making for 2002 possible international migrations. As a first task in such a MOOC,
students could be given the opportunity to vote on the hundred most interesting such migration paths, and each
of those would be studied in the next homework by a much smaller group. The rest of the course would follow
the students along and help them treat this homework as a research topic, exploring implications along different
dimensions: consequences for the cultures and economies of the inbound and outbound countries, etc. Among
their lectures, the students would be taught how to properly document their arguments, and to be critical of
each other’s work, both within their working groups and later, via peer feedback, of each others’. Ultimately,
not everything that would be created by the students would be useful to the professor, but this would certainly
provide useful preliminary material for the book that the professor always wanted to write1.
In a course entitled Financial Mathematics, one could imagine simulating a stock market, asking separate
groups of students to assume different roles. Some will be market makers or hedgers, while a third group would
be made of pure speculators. From all those interactions, a market price would emerge and give a primary
market. There, the teacher can easily illustrate the concepts of market equilibrium and portfolio diversification.
In parallel, in a more advanced course, students could be taught about the theory of financial pricing and
offered to trade derivatives products. The class then witnesses the rise of a secondary market. A different
course could provide different asset classes for the first class to monitor, interest rates for instance.
All these examples highlight that these approaches, which are sometimes used in classrooms on a small
scale, will definitely be tried on a larger scale soon. We are thus not only convinced that the ideas below are
technically feasible, but that they will be highly facilitated by software contributions from the other sciences as
well.
4.2. Track A: Easy initiatives for engineering knowledge discovery and archiving. The traditional notion
that all mathematical discoveries are made through solitary endeavours no longer holds true. Indeed, massive
collaborative projects, such as MathOverflow and the polymath projects, have already begun to change how
some leading mathematicians do part of their research. In this track, we outline how to make this method of
collaboration even more effective and prevalent.
The polymath projects [30] are global collaborations to solve open conjectures. In 2009, Tim Gowers
initiated the first polymath project – now referred to as polymath 1 – by asking the followers of his blog to post
ideas on how to find a combinatorial proof of the Hales-Jewett density theorem. This social experiment turned
out to be an unexpected success, which has led to further polymath projects.
MathOverflow is an interactive mathematics website [8], which is both a collaborative knowledge base
and an online community of mathematicians. It allows users to ask questions and submit answers, which in
turn are both rated by other users, leading to reputation ratings for the contributors. According to Martin, the
possibility of building a reputation motivates the users to submit questions and answers [41]. MathOverflow
is primarily for asking questions on mathematical research, especially related to unsolved problems and the
extension of knowledge. Similar sites exist for other subjects, such as physics or chemistry (cf StackOverflow,
physics.StackExchange, chemistry.StackExchange, tex.StackExchange), as well as for students in mathematics (cf. math.StackExchange).
Since it contains a large collection of tagged questions, it is a valuable tool for engineering serendipity:
another mathematician will come later, find that question useful and realise others have been thinking about
those issues. This serendipity could even be helped, if combined with other external services such as one
matching academics.
1There is plenty of evidence that professors whose research has a geographical component have started to use MOOCs in this way,
even without the divide-and-conquer-followed-by-peer-feedback approach. The books are currently being written...
5
Paul-Olivier Dehaye
Research Proposal
SNSF
We envision a social machine that combines the polymath and MathOverflow ideas with MOOCs: owing
to the high technicality of the subject and the lack of structure in its presentation (e.g. with comments inside a
blog), the onboarding cost can be quite high after a polymath project has started.
In this system, a leading member, such as a Fields medalist, would suggest a question. Immediately different
avenues and ideas would be explored. So far this is the same workflow as in polymath. However, with a proper
forum system subcommunities could form around given ideas. Obviously this breaks down the community of
solvers, but every so often, when needed, individual leaders would step in each of those subcommunities to
summarize (text or video) the approaches that have been tried thus far. This maintains a form of coherence and
helps anyone, at any time, to understand approaches that have been tried elsewhere. It encourages and helps
solvers to explore a diversity of ideas, and maybe recombine them into successful ones. It is also very inclusive
of late comers. It even allows amateurs to get a glimpse of collaborative mathematical research, and potentially
to help, for instance by finding papers that might be relevant or performing quick computer experiments if
needed. This is reminescent of the Kasparov vs. the World chess match described by Nielsen [44, Chapter 2].
Going in the opposite direction, forums that emulate MathOverflow can be used in parallel to MOOCs. The
reputation system could be integrated into the peer-grading system and forums. It would eventually allow good
citizens to edit previously posted questions, which simultaneously improves the quality of the questions and
hones the students’ skills in mathematical writing. The teacher’s role in policing forums could then be reduced.
MOOC forums have been studied with encouraging results [36].
It is interesting to observe that the polymath approach, done completely in the open, is quite unique to mathematics. In other areas, either scientists hoard their knwoledge (with some exceptions) or financial incentives
are quicker to come in the picture (cf. Innocentive [3] and other crowdsolving platforms).
At a technical level, nothing in Track A is speculative: it is only a matter of integrating different tools together
and repurposing them.
Figure 2. The Catalan numbers course in preparation on the coursera.org platform.
4.3. Track B: Systematic combinatorics. In this track, we want to explain how first year undergraduates
and the general public can start to contribute to mathematical research. It consists of a succession of three
experiments: Catalan, FindStat and sage.
This track also exploits our preliminary experiment with MAT101, teaching one hundred mathematics students how to use the python programming language. Python is an extremely versatile, high-level, free and open
programming language. More and more branches of science use python, via lots of area-specific modules (for
instance biophython, astropython or scikit-learn).
In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various
counting problems, often involving recursively defined objects. There are 207 combinatorial interpretations
of Catalan numbers at the moment [51]. In the future, we plan to offer a course, called Catalan, for first
6
Paul-Olivier Dehaye
Research Proposal
SNSF
Figure 3. A self-hosted instance of open edX, with mockups for a few of the courses described here.
year mathematics students (but open to the world, at a scale of tens of thousands), and meant to teach them
advanced programming concepts. As such it is just an extension of the existing MAT101, based around the not
so complicated Catalan numbers. However, by cleverly designing the homework problems, one could get the
totality of students to produce something useful: on a large scale, we could look at the 2072 bijections between
classes. With a gamification element, we can encourage students to perform literature searches, implement
known bijections and find new ones, and the incentives can be tweaked to encourage each. If quality of the
code produced is a concern, then feedback mechanisms involving peers are possible and will lead to better code.
If speed of the code is a concern, then the rules can be altered to open up the code submissions and encourage
improvements through microcontributions (cf. the MathWorks competitions [44]). The output of such a course
would be modest, but would nevertheless be useful: it would establish a pattern of contributions at a higher
cognitive level for citizen science. In addition, it could provide a good data set for interesting economic studies,
based on the response of students to the scoring rules used.
The FindStat project provides an online platform for mathematicians, particularly for combinatorialists, to
gather information about combinatorial statistics and their relations. As of January 2014, the FindStat database
contains 173 statistics on 17 combinatorial collections. One can use it to test if some given data is a known
combinatorial statistics in the database, or test if this data can be obtained from known combinatorial statistics
in the database by applying combinatorial maps. It is clear that the FindStat project would benefit from the
Catalan course, and that members of the FindStat project could expand that course to a FindStat course,
covering more classes and bijections.
Sage [53] is an open-source computer algebra system written in python. It is made of more than 100
packages and aims at teaching, experimenting and producing research in mathematics. Its tremendous success
is largely due to the fact that a very large pool of very high end mathematicians contribute and regularly
write code, with the help of a very active community. The code written in the FindStat course could then be
integrated into sage, with the process for code submission and review explained in a sage course. That course,
with careful engineering, could in fact aim for much more general, but not basic, mathematics that needs to be
implemented in sage.
4.4. Track C: Large bodies of knowledge. Serendipity plays a huge role in mathematics. The reader will
remember that John McKay’s observation linking dimensions of the Monster group and coefficients in the
Fourier expansion of the modular function j eventually led to the Monstrous Moonshine and Borcherds’ Fields
medal. Thirty-five years later we have still not quite automated such observations. While one can argue the
Online Encyclopedia of Integer Sequences [49] provides this service, we can do much better. The only obstacle
is human: mathematicians do not realize the value of a common effort in this direction.
7
Paul-Olivier Dehaye
Research Proposal
SNSF
On the same note but even more serious, finding relevant mathematical papers is hard. We rely on imperfect
search engines and incomplete tagging systems to retrieve our content. Most of the time we refer to unusual
finds as luck.
The aim of this track is to help mathematicians build more successful information commons, places where
they can collect and organize their data. Even more interestingly, this process could be standardized so that
these different information commons could be interoperable, sustainable and more beneficial. There are two
sides to the issue. First the data needs to be put in common, then it needs to be analyzed. We summarize the
current situation for a few mathematical databases, and we then outline a solution.
4.4.1. Gathering data. The On-Line Encyclopedia for Integer Sequences (OEIS) [49] is a database that collects integers sequences of mathematical interest. Each sequence is labeled with possible mathematical descriptions from the people who discovered them. It has been successful at forming a community of contributors, also
outside of the mathematical community, but less successful at improving the services offered beyond simple
search features.
The Knot Atlas [13] lists non-equivalent knots by crossing number. It is a particularly useful tool for
researchers in knot theory. Anyone can contribute, but in Nielsen’s judgement it is a failure [2]: it has not
managed to attract a large pool of contributors, hence its limited or non-existent growth. Similarly, the knotinfo
database of knot invariants is produced centrally, and while it contains very interesting data its interface is
substandard.
The ATLAS of Finite Groups and Groupprops gather information about groups. The first concerns finite
simple groups [12], while the second concerns all kinds of group properties and relationships between those
properties. The two are not linked, and in fact not linked to other databases either. Clearly cross-searches
between the ATLAS and Groupprops would be useful, as well as with the LMFDB database that we describe
next.
The LMFDB Collaboration2 is a growing community of more than fifty mathematicians and computer
scientists that aims to classify L-functions, modular forms and other related mathematical objects. The current
size of the database is around 3-4 Tb, with the data publicly accessible [52]. While there is a large amount of
data, there is also a large disparity in the formats used. This is unfortunate, as the main appeal of the project is
supposed to be to unify all L-functions, despite the diversity of their constructions.
A very good effort was made by the LMFDB to provide context to what can be extremely complex objects.
While the first focus is to present data on the site, it is also possible to recall in a non-obtrusive way definitions
of concepts by so-called knowls. These definitions are meant to be "bits of knowledge", i.e. more focused than
an encyclopedic entry.
Like more and more scientific collaborations nowadays, the LMFDB project uses git and the online platform
github. These allow everyone to create their own separate branch of the project and edit it by themselves
independently of master branch. Once a relevant change has been done to the individual branch of the project
and the community has approved the changes, it is pushed to the upstream repository. People from different
backgrounds work as a community to expand the database and build the whole platform.
4.4.2. Analyzing data. Kaggle [4] is a platform to host competitions for data scientists and statisticians. Organisations can put forward data and statistical problems then data scientists can work as individuals or teams
competing for cash prizes. The competitors develop predictive algorithms using a subset of test data called
training data. Algorithms are run immediately on separate test data and ranked. The test data used for ranking
is not revealed to the competitors.
Kaggle promotes gamification through monetary prizes and leaderboards. Competitors are able to collaborate using the forum: sharing thoughts, helping each other and suggesting new ideas. The forums also enable
experts to team up in future competitions. Even after competitions have concluded the forum posts are still
publicly available. Kaggle charges the client to host the contest, and then after 6 months starts leasing the solution for a monthly fee [5]. Invite-only competitions help alleviate the problem of data privacy [19]. Instructors
from institutions such as universities can use Kaggle for class for assignments and projects free of charge.
The aspects of Kaggle that will be used in this proposal are instant grade feedback an updated leader-board
and collaboration through team effort and the forums. Unlike Kaggle any solution will be released to the public.
2LMFDB stands for L-function and Modular Forms DataBase.
8
Paul-Olivier Dehaye
Research Proposal
SNSF
4.4.3. The problem. The collaborations collecting data face serious obstacles to growth. Some of those listed
before have never grown beyond a handful of contributors, and the LMFDB has stalled in some ways: onboarding new members takes too much time due to the complexity of the tools used, the mathematical background
necessary, and the diversity of the code used.
At every LMFDB workshop, introductory courses in how to use git and github, as well as the mathematical
background and the whole architecture of the project, are given. This translates into a huge waste of workshop
time, and is already an area where online courses could help. Graduate students or newcomers to the collaboration could come more prepared. Of course a lot of this course material (git, github) exists already and would
simply need to be augmented with what is particular to the LMFDB workflow.
Beyond that, another obstacle to growth is that very few mathematicians worldwide know about these objects,
with maybe for each object only a couple experts invited to any workshop. Again, the result is that we spend a
lot of time teaching each other about the objects. While this is not purely wasted time, it is not efficient either.
4.4.4. Our solution. Our solution is to put effort in the outreach earlier in the process. An online course (open,
but not massive) explaining the mathematical background behind the objects and/or how to actively contribute
to the expansion of the database would help welcome new members, and even allow to recruit outside of the
mathematical community.
We envision a system where research collaborations form around teaching first, so that information is easily
available for each other. We feel this would be beneficial for new people joining research teams.
A flexiformalisation into MMT would help make the problems of handling and analysing data uniform across
mathematics, and would enable vast reuse of the techniques.
For the analysis part, these datasets could be combined with data science courses, and people could be
encouraged to crowdstorm the datasets [1].
4.5. Track Z: From semantic annotation to formalisation. If we control a web platform that sees enough
activity at teaching mathematics, then we could collect a lot of data that would be useful for a formalisation
effort. In fact, we could develop a series of tools that would help for each part of the effort of formalisation.
This track thus consists of two main experiments, to be tried concurrently, each spanning multiple courses:
4.5.1. Formalisation. The first step in the direction of full formalisation might be to simply teach a course on
one of the formalisation tools available (HOL, Mizar, Coq,...). Due to the nature of the topic, the homework
there could certainly be automated! In a second stage, this course could be equipped with a more elaborate
collaborative element, replicating the workflow of the recent large proofs that have been produced: for the
Flyspeck project, for instance, Hales wrote a textbook [32] that outlined the general formalised proof he was
hoping to get to. Instead, this proof could have been given online, with the participants in the course breaking
down the results as was needed. A gamification element could definitely be added. This approach has been
used at another level of complexity in the DeduceIt system [26].
On top, a very recent research project proposed to use this Flyspeck almost-formal text, to learn how to
automatically turn it into a Hol Light formalisation [37]. Part of the project is to combine statistical methods based on learning from existing annotations and knowledge with automated reasoning in large theories
(ARLT). More concretely, semantic ARLT methods would be used to confirm or reject the statistically predicted formalisations. This project relies on the growth of formal corpora, a trend that could be aided by our
project.
4.5.2. Flexiformalisation - MMT. The more radical contribution towards formalisation would come from the
MMT language. As described earlier, MMT is a knowledge representation format focused on modularity, logicindependence and flexibility. It does not "trap" its user into using a specific logic, and does not require a full
fledged (and irrealistic) formalisation right away.
Teaching a course on MMT would be helpful: with proper tools in the rest of the courses offered, users could
annotate and tag content in the MMT syntax across courses hosted on the platform.
Conversely, a MMT formalisation of content would be useful in several ways:
• as described earlier, to generate the architecture for databases of examples;
• to progressively build a tutor [27], offering assistance to students by serving relevant content from other
sources;
• to encourage reuse among instructors of course content, by offering them the option to precisely tag the
course content.
9
Paul-Olivier Dehaye
Research Proposal
SNSF
5. Methods
This project involves a lot of technology, with some existing and a little bit to be created.
P
Web Resources
MAT086
Course Platform
A
Student
Feedback
Group
Projects
S
S
S
S
S
S
Assistant
Feedback
P
A
A
A
Figure 4. The MAT101-MAT086 system: circles represent students, boxes represent professor(s) and the pentagons assistants. Content is actively pushed to the students via online material, collected from the professor and the internet. Students have homework to submit via
the platform, for which they receive instant feedback (loop on the left). They also do group
projects, with changing groups, and the output of their work is uploaded back on the platform,
for the benefit of other students. This whole system was engineered by the participants in
MAT086, a seminar on online education that was convened offline.
5.1. Example courses: MAT101-MAT086 system. In the Fall 2013, the author taught two courses jointly:
MAT101 was a Python for mathematicians course, and MAT086 was a Seminar on online education. MAT101
was taught online, while MAT086 was taught offline. In Figure 4, we describe the complex social machine used
for the course. Similar setups would be replicated for a course like Catalan.
5.2. Technological issues.
5.2.1. Course platform. The MOOC area evolves very fast. Therefore, we do not wish to restrain ourselves by
focusing on only one platform, such as coursera.org: this might be limiting in the future.
A particularly interesting alternative for us is the open source open edX, first started by MIT and Harvard and
now joined by more and more universities. It is based on the component architecture XBlocks which offers the
possibility of creating simple independent courseware units (in python) and integrating them in many different
ways. We have started developing new XBlocks already in January 2014, together with students. Since the
platform is licensed under the AGPL license, any contribution to the platform has to be released openly, and
vice versa we benefit from any other contribution.
Once we are ready to more sustainably host our own instance (beyond the current small scale install), we will
offer the option to anyone in the scientific community to host there a course with a citizen science component
(we crowdsource that too!). Courses could already be hosted at scale on that instance, but could be shifted
later to other instances using the same software (such as edx.org or France Université Numérique’s fun.org),
or hosted at several of these in parallel. Beyond merely hosting, we will help professors develop teaching apps
that contribute to science. Open edX’s XBlocks architecture [25] makes this easy for anyone with a limited
programming background, and we would help others get started.
The platform is meant to be hosted in the cloud, so those costs will have to be taken into account, and we
will need to devise proper mechanisms for covering them. We discuss this in the resources section.
5.3. Pedagogical tools.
10
Paul-Olivier Dehaye
Research Proposal
SNSF
5.3.1. A simple tutor. After a student watches a video, they are asked if they need help. If they do, the system
would ask them some questions: "What are the main concepts?", "Which did you not understand?", "What
background are you missing?", etc. This serves to tag material in the course (and relations between different
parts), if the professor has not offered that information ahead of time. This information can then be used to
serve back to the student a video from another course with the same material.
Such an effort could also be tied to gamification, with the system asking more and more precise questions
from the student to disambiguate their understanding of the material. We expect to be able to implement this
system by integrating the work of [27] via XBlocks.
5.3.2. Shared preparation of material. This project can only gain by expanding the number of courses it hosts.
To enable this, we need to give a tangible advantage compared to other platforms. For this, we will offer the
option to professors to tag their content (or for assistants and students to help in that effort). Since many of
us have to teach essentially the same canon (at least for the first few years of higher education), this offers the
possibility to quickly build alternative explanations of the same concepts. This can even be expanded to other
languages, and help disambiguate between terms. See [10] for a similar effort.
A simplification of this workflow can be enabled via the open source edx-presenter [42], originally developed by one of my students.
5.3.3. Note taking / glossary. Students could be offered the option of taking collective notes or maintaining a
glossary, to help them keep track of definitions. The shared nature would encourage the production of (annotable) high quality content, which would be of high value for annotation and formalisation.
5.3.4. Group projects. Students need to be divided into subgroups (potentially very large), so their attention
can be focused on smaller problems. This is clearly a direction that other open edX developers are going in.
5.3.5. Motivators. Different motivators could be used in the course. If the contributions relate well to the
course content, then grades or reputation could be at stake. Otherwise cash prizes are possible. If a contest is
set up, then there is also the possibility to offer travel grants and internships for the best contributors.
5.4. Math-specific tools. In general, the goal here is to formalise the content of a mathematics course, by
building tools that students and professors want to use. We give in Figure 5 a graphical representation of the
process. This is simply an adaptation to MMT of the concept of semantic games, usually applied to build or
populate ontologies.
Informal,
Human Readable
Q
Quality
Control
Formal,
Machine Readable
Figure 5. The process used to bring content towards a more formal description. This picture applies at every step up the flexiformalisation ladder. At each stage, the system can take
its current understanding of the content, and reformulate it for human consumption (via the
mechanism of proof sketches [58], for instance). This is used to ask questions (Q) to the students that can be used after crosschecking and quality control, to ensure that the deductions are
correct.
11
Paul-Olivier Dehaye
Research Proposal
SNSF
5.4.1. MMT integration. For course content involving scientific, particularly mathematical documents, our
infrastructure will integrate the OMDoc/MMT framework. This is a knowledge representation framework that
focuses on formal mathematical content but also degrades gracefully to partially formal or completely informal
content.
The language is logic-independent so that formal content can be written in any logic, and the system is
application-independent so that it can be flexibly integrated with our infrastructure.
OMDoc/MMT permits enriching course content (such as an open-source textbook [15]) with added-value
services that are aware of the semantics of the displayed mathematical objects. Such services include navigation, definition lookup, dynamic interaction with mathematical objects (e.g., folding, type inference), change
management, interactive examples, searching for mathematical objects, or change of notation.
Based on OMDoc/MMT, we will develop additional MOOC-specific services. For example, we will permit
students to write their own examples and have the system check them with respect to the course content.
Moreover, we will develop interfaces that permit students to improve the formalisation status of the content.
In practice, instructors can only partially formalise the content due to the effort involved. Here our systems
helps because students can – as a part of their learning experience – improve the course materials. Specifically,
students will be able to
• annotate content with meta-information (e.g., this is a proof, this is what is being defined,...);
• annotate content with cross-references;
• formalise informal objects;
• add examples;
• ask questions about and cross-referenced with individual (sub-)formulas.
5.4.2. Understanding proofs. To help understand a proof, one can start from an existing proof, and ask students
a variety of questions: "Does this proof work by induction? By contradiction?", "Write in your own words the
contradiction that is reached", "What are we inducing over?", etc. In fact, assistants could tag the proofs, which
would automatically create appropriate quizzes (to be graded with peer-grading if the output is written in natural
language). Of course this also benefits any machine observing the interaction with the student.
5.4.3. Understanding how people write proofs. One could simply ask as a homework to write a proof of a
particular statement (this is already done in Devlin’s "Introduction to mathematical thinking" class on coursera.org, and graded using peer-grading). One can then analyse the different stylistic features of the text written,
or even try to recover the fundamentally different proofs of the same statement (this is already done for programming assignments on some coursera classes, see [43]).
The interactions described so far are very basic, and meant mostly to help collect data that can be refined
into a flexiformalisation. Depending on success, we hope to attract the computer formalisation community to
integrate their tools with ours, and enlarge their pool of contributors by several orders of magnitude.
5.5. Feedback mechanisms / redundancy in the approach. This project could benefit from many feedback
mechanisms: once a course on python for science is built, some participating students can contribute back to
the project by building their own XBlocks and extensions to the platform. In general, any process we control
and "master" would then be taught, to increase its momentum. Similarly, a completely open data approach
(accounting for privacy issues) would help analyse most effectively all the data produced, by encouraging
further contributions from participants in the analysis.
In addition, there is a substantial amount of redundancy in the approach: different courses can be taught in
parallel, and they do not all need to succeed at once. They can just be tweaked and re-run if needed.
5.6. Ethical issues. It is to be expected that ethical issues will be raised when collecting data in MOOCs or
crowdsourcing platforms. The thought that an institution can track every individual’s action within a software
application will be threatening to some potential users. In fact, this is one concern that needs to be addressed
before applying learning analytics [21]: does an individual need to provide formal consent before data can be
collected? Does an individual have the choice not to participate in a learning analytics project but still use
the software? However, unlike learning analytics, the crowdsourcing projects we envision for MOOCs do not
require the analysis of any one person’s action, but rather of the collection of all users taken together, which
might alleviate some concerns. We plan to follow the stronger guidelines used for studies in education [7], in
addition to the Swiss guidelines on the collection of user data.
5.7. Community outreach and management. A significant and conscious effort will also be dedicated to
managing the community of participants, communicate the results obtained, etc.
12
Paul-Olivier Dehaye
Research Proposal
SNSF
Part 3. Resources
6. Budget
The accounting of the budget is summarised in a dedicated table elsewhere.
We see three main expenses for this project: salaries for scientific collaborators (one postdoc and one PhD
student), salaries for technical contributors (IT staff, software developers possibly via consultancy) and hosting
costs.
An additional source of expenses would be travel expenses: small travel grants could also be used to motivate
remote students to participate in the experiments. The best students would then be invited to come and work
with us for a few weeks.
An essential need is with software engineers. Based on discussions with the engineers in charge of edx.org
and StanfordOnline, we estimate that we would need someone at 20 percent to simply host the software,
perform upgrades, backups, etc. Beyond that, we need another part-time engineer with different skills to extend
the platform in the desired directions, for instance by programming XBlocks.
We also require funding for some scientists (one PhD, one postdoc), preferably with a strong computer
science background. They would implement the experiments outlined here (under my supervision), design new
ones, and progressively help others implement theirs. In the spirit of making teaching useful for research, they
would of course dedicate their teaching time to this project as well. Together with Huan Xiong, one of my
current PhD students, we are starting to work on the Catalan course. Another PhD student, Patrick Kühn, is
likely to work on an LMFDB-related online course.
I would supervise and coordinate the work of all these project members, on top of setting up my own
experiments and teaching within this project. This work would be done in parallel to supervising other students
and postdocs in the context of other research proposals.
This dedicated staff will help me get started, so I can start tapping into other sources of funding as soon as
possible.
6.1. Crowdfunding. Kickstarter [6] is an online platform where people who are interested in a promising
project can contribute through donations to its realisation.
This has been used in the context of MOOCs and citizen science before [34], and would obviously be a very
relevant source of funding.
6.2. External scientific projects. If we develop the infrastructure to support citizen science projects through
MOOCs, then our expertise, both technical and pedagogical, could be valuable for other projects that might
not necessarily want to build up an infrastructure themselves. Another source of funding would then be to
collaborate on those projects, a bit like the Zooniverse project does [17].
6.3. Interdisciplinary scientific projects. The interaction of students with this platform offers the opportunity
for two additional types of research projects.
6.3.1. Education. Learning or educational analytics are concerned with collecting and analysing data about
learners in order to understand and optimise their learning [48]. This also allows the teaching staff to make
evidence-based and accountable decisions about admissions or at-risk students [21]. In this context MOOCs
offer new opportunities: all learners produce data trails through their online activities, the analysis of which
might offer valuable insights into the learning process [48]. However, whether any institution should be allowed
to track the learners movements will raise ethical issues [21], different from the ones we face for our main
research goal.
Research into edX’s first MOOC [18] is very indicative of the possibilities of this line of research, and more
and more conferences are organised on the topic, for instance [7].
6.3.2. Economical and social studies. Since ultimately students are producing work for our research project,
they need to be properly incentivised. Interesting studies can certainly be done on the strategies adopted by the
students when facing the different motivators. The insight can be very helpful in order to implement efficient
stick-and-carrot tactics.
6.4. Development for companies. Companies such as Innocentive [3] offer cash prizes to their problem
solvers, and simply host the problem descriptions. We could do something similar with our more evolved
products, and try to offer a basis for other paying services helping academics (such as services matching academics by skills).
13
Paul-Olivier Dehaye
Research Proposal
SNSF
Part 4. Additional comment
7. Dedication to this problem
Problems like these require a lot of dedication. A significant effort is needed to build a community and
convince colleagues to participate.
In some way, I have been vexed by the problem of interactive proof assistants for quite a while: I remember
reading a long time ago in Logique, informatique et paradoxe by J.-P. Delahaye of the promise of these proof
assistants. As I have learned more mathematics, I have realized this was still a long way away. I have also
understood that working on such a problem could be dangerous for a career, but have kept myself informed of
progress in that area.
With my SNF-Förderungsprofessur grant, I have been more willing to take careful steps in this direction.
For instance, I have organized a workshop on management of mathematical data [24], which was not entirely a
success. At this workshop I had invited also an astronomer, Françoise Genova, who had been very successful in
leading a large group towards federating different astronomy databases together. Were also invited two software
engineers from the Logilab company (Nicolas Chauvat and Florent Cayre), who have developed software for
the semantic web. All three had the same recommendation: to use ontologies to structure our data. On the
other hand, all the mathematicians invited did not see the interest in any structured effort, because its rewards
were too distant. After looking more carefully at the suggestion of ontologies, it was clear that they would not
cut it: mathematics is too complex for them, they are designed to deal mostly with the real world. Fortunately,
Florian Rabe’s MMT language is the perfect substitute for ontologies in a mathematical context (this is in fact
why it was invented). I was still left with the problem of finding a way to convince fellow mathematicians to
participate. When MOOCs came out, I understood very quickly their potential for this problem: mathematicians
could contribute via their teaching time towards research as well.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
Crowdstorming of datasets. https://osf.io/gvm2z/.
@Google presents Michael Nielsen: Reinventing Discovery. https://www.youtube.com/watch?v=Kf2qO0plUKs#t=18m.
Innocentive. https://www.innocentive.com/.
Kaggle. http://www.kaggle.com.
Kaggle: big data. http://www.inc.com/magazine201403/darren-dahl/big-data-crowdsourcing-kaggle.html.
Kickstarter. http://www.kickstarter.com/.
Learning at Scale conference. http://learningatscale.acm.org.
MathOverflow. http://mathoverflow.net/.
Nature Open Innovation Pavilion. http://www.nature.com/openinnovation/index.html.
Semantic Data Web lecture series. http://slidewiki.org/deck/750#tree-0-deck-750-1-view.
ProofPeer project. http://proofpeer.net/, 2014.
R. Abbott, J. Bray, S. Linton, S. Nickerson, S. Norton, R. Parker, I. Suleiman, J. Tripp, P. Walsh, and R. Wilson. Atlas of Finite
Group Representations. http://brauer.maths.qmul.ac.uk/Atlas/v3/.
D. Bar-Natan and S. Morrison. The Knot Atlas. http://katlas.org.
B. Barras, S. Boutin, C. Cornes, J. Courant, J.-C. Filliatre, E. Gimenez, H. Herbelin, G. Huet, C. Munoz, C. Murthy, et al. The
Coq proof assistant reference manual: Version 6.1. 1997.
R. Beezer. A first course in linear algebra.
T. Berners-Lee and M. Fischetti. Weaving the Web: the original design and ultimate destiny of the World Wide Web by its inventor.
HarperBusiness, 1999.
K. Borne and the Zooniverse Team. The Zooniverse: A framework for knowledge discovery from citizen science data. In AGU
Fall Meeting Abstracts, volume 1, page 0650, 2011.
L. Breslow, D. E. Pritchard, J. DeBoer, G. S. Stump, A. D. Ho, and D. T. Seaton. Studying learning in the worldwide classroom:
Research into edX’s first MOOC, 2013.
J. Brustein. Kaggle’s William Cukierski on Data Sharing Competitions. Business Week, March 6, 2014.
S. Buswell, O. Caprotti, D. P. Carlisle, M. C. Dewar, M. Gaetano, and M. Kohlhase. The OpenMath standard. 2004.
J. Campbell, P. DeBlois, and D. Oblinger. Academic analytics: A new tool for a new era, 2007.
Committee on Planning a Global Library of the Mathematical Sciences; Board on Mathematical Sciences and Their Applications; Division on Engineering and Physical Sciences; National Research Council. Developing a 21st Century Global Library for
Mathematics Research. The National Academies Press, 2014.
P.-O. Dehaye. MAT101: python programming for mathematicians. http://edx.math.uzh.ch, 2013.
P.-O. Dehaye and N. Thiéry. Online databases: from L-functions to combinatorics. http://aimath.org/pastworkshops/
onlinedata.html, 2013.
edX Consortium. Xblock documentation. http://xblock.readthedocs.org/, 2014.
E. Fast, C. Lee, A. Aiken, M. S. Bernstein, D. Koller, and E. Smith. Crowd-scale interactive formal reasoning and analytics. In
Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST ’13, pages 363–372, New
York, NY, USA, 2013. ACM.
14
Paul-Olivier Dehaye
Research Proposal
SNSF
[27] M. Floryan. Evolving expert knowledge bases: applications of crowdsourcing and serious gaming to advance knowledge development for intelligent tutoring systems. PhD thesis, University of Massachusetts Amherst, 2013.
[28] G. Gonthier. Formal proof–the four-color theorem. Notices of the AMS, 55(11):1382–1393, 2008.
[29] G. Gonthier, A. Asperti, J. Avigad, Y. Bertot, C. Cohen, F. Garillot, S. Le Roux, A. Mahboubi, R. O’Connor, S. O. Biha, et al. A
machine-checked proof of the odd order theorem. Interactive Theorem Proving, pages 163–179, 2013.
[30] T. Gowers and M. Nielsen. Massively collaborative mathematics. Nature, 461(7266):879–881, 2009.
[31] T. C. Hales. Introduction to the Flyspeck project. 2006.
[32] T. C. Hales. Dense sphere packings, volume 400 of London Mathematical Society Lecture Note Series. Cambridge University
Press, Cambridge, 2012. A blueprint for formal proofs.
[33] J. Harrison. HOL light: An overview. In Theorem Proving in Higher Order Logics, pages 60–66. Springer, 2009.
[34] L. Hockenson. DIY science MOOC seeks funding on Kickstarter to conduct brain experiments at home. http://gigaom.com/
2013/09/11/diy-science-mooc-seeks-funding-on-kickstarter-to-conduct-brain-experiments-at-home/.
[35] B. Howe. Introduction to Data Science MOOC. https://www.coursolve.org/courseproject/2.
[36] J. Huang, A. Dasgupta, A. Ghosh, J. Manning, and M. Sanders. Superposter behavior in mooc forums. In Proceedings of the First
ACM Conference on Learning @ Scale Conference, L@S ’14, pages 117–126, New York, NY, USA, 2014. ACM.
[37] C. Kaliszyk, J. Urban, J. Vyskocil, and H. Geuver. Developing corpus-based translation methods between informal and formal
mathematics: Project description. arXiv:1405.3451, 2014.
[38] M. Kohlhase. OMDoc: An Open Markup Format for Mathematical Documents (Version 1.2). Number 4180 in Lecture Notes in
Artificial Intelligence. Springer, 2006.
[39] M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages
241–253. Springer, 2006.
[40] T. W. Malone, R. Laubacher, and C. Dellarocas. The collective intelligence genome. IEEE Engineering Management Review,
38(3):38, 2010.
[41] U. Martin and A. Pease. Mathematical practice, crowdsourcing, and social machines. In Intelligent Computer Mathematics, pages
98–119. Springer, 2013.
[42] K. Mösinger. Edx-presenter, a tool for shared course preparation. https://github.com/mokaspar/edx-presenter.
[43] A. Nguyen, C. Piech, J. Huang, and L. Guibas. Codewebs: Scalable homework search for massive open online programming
courses. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 491–502, Republic and
Canton of Geneva, Switzerland, 2014. International World Wide Web Conferences Steering Committee.
[44] M. Nielsen. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2011.
[45] L. Pappano. The year of the MOOC. The New York Times, November 2, 2012.
[46] F. Rabe. The MMT language. PhD thesis, Jacobs University, 2009.
[47] P. Rudnicki. An overview of the Mizar project. In Proceedings of the 1992 Workshop on Types for Proofs and Programs, pages
311–330, 1992.
[48] G. Siemens and P. Long. Penetrating the fog: Analytics in learning and education, 2011.
[49] N. Sloane et al. The On-Line Encyclopedia of Integer Sequences (OEIS). http://www.oeis.org, 2013.
[50] H. Stamerjohanns, M. Kohlhase, D. Ginev, C. David, and B. Miller. Transforming large collections of scientific publications to
XML. Mathematics in Computer Science, 3(3):299–307, 2010.
[51] R. P. Stanley. Catalan addendum. http://www-math.mit.edu/~rstan/ec/, 2013.
[52] The LMFDB Collaboration. The L-functions and Modular Forms Database. http://www.lmfdb.org/, 2013.
[53] The Sage Development Team. The Sage-Combinat community, Sage-Combinat: enhancing Sage as a toolbox for computer exploration in algebraic combinatorics. http://combinat.sagemath.org, 2011.
[54] V. Voevodsky et al. The Univalent Foundations Program. Homotopy type theory: Univalent foundations of mathematics. Technical
report, Institute for Advanced Study, 2013.
[55] L. von Ahn. Duolingo: learn a language for free while helping to translate the web. In Proceedings of the 2013 international
conference on Intelligent user interfaces, pages 1–2. ACM, 2013.
[56] L. Von Ahn, M. Blum, N. J. Hopper, and J. Langford. CAPTCHA: Using hard AI problems for security. Advances in Cryptology,
EUROCRYPT 2003, pages 294–311.
[57] L. Von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum. reCAPTCHA: Human-based character recognition via web
security measures. Science, 321(5895):1465–1468, 2008.
[58] F. Wiedijk. Formal proof sketches. In Types for Proofs and Programs, pages 378–393. Springer, 2004.
[59] N. Zafrin, N. Gillani, and M. Lenox. A New Use for MOOCs: Real-World Problem Solving. Harvard Business Review, July 2013.
15
Download