Justify: A Web-Based Tool for Consensus Decision Making Christopher Fry MIT Media Lab 20 Ames St., Cambridge, MA, USA cfry@media.mit.edu ABSTRACT Traditional unstructured-debate-then-vote democracy leaves a lot to be desired. Consensus decision-making processes may encourage better decisions but suffers from real-time constraints. Online discussions can help, but large-scale unstructured online discussions are unwieldy, redundant, and ambiguous. The Justify system helps participants clearly state concepts and organize them in a meaningful structure. It hides detail until desired, and automatically performs summarization at all levels. These characteristics facilitate the emergence of the best ideas along with the rationale for each, crucial for buy-in. Justify is a language for expressing rationale and a development environment for analyzing discussion. There are 150 kinds of "points", which enable users to express questions, answers, background information, support, opposition, votes, math and more. Points share three essential characteristics: 1. Each point must contain exactly one idea, providing an unambiguous target for critiques. 2. Each point must live in a particular spot in the hierarchy of points, clarifying context. 3. Each point declares its intent as a type, benefitting both humans and conclusion-generating algorithms. Author Keywords Argumentation, decision support, consensus, voting. ACM Classification Keywords H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. General Terms Human factors, Languages. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IUI 2012, February 14-17, Lisbon, Portugal. Copyright 2012 ACM xxx-x-xxxx-xxxx-x/xx/xx...$10.00. Henry Lieberman MIT Media Lab 20 Ames St., Cambridge, MA, USA lieber@media.mit.edu THE PROBLEM We have widespread discontent in the world today. Both rich and poor countries in the mid-east and Africa are engaged in civil war. The protests from both the right (ie. tea party in the USA) and left (ie. occupy your-city-here) are widespread. These movements are characterized by bottom-up decision-making and aversion to a formal organization with elected leaders. There are few rational plans for how to fix what's wrong. Rather than have decisions motivated by which protesters are the loudest, we need to encourage processes for rational debate. NONLINEAR ARGUMENTATION The fundamental problem with debate is this: arguments are tree-structured, but debate is sequential. Any position in an argument can have its pros and cons, which in turn can be argued for or against; any position can be defended or refuted in a variety of ways, thus generating a tree structure. But people arguing verbally or online are constrained by a linear structure. Any participant must choose to address only a single point at any moment in time. Short sequences where one person replies to another are fine, but as the argument grows, limitations of human short-term memory and attention often derail the debate. People can't remember what the speaker is referring to, what are all the arguments pro and con the speaker's point, whether the speaker really answered the question he or she was asked, what the speaker neglected (intentionally or not) to say. A traditional way of linearizing a tree structure of text is the outliner. Each node of the tree is summarized as a line of text. Nodes at greater depths are indented. Each node can be expanded from a single-line summary to the full text of that node. Justify takes the outliner as its core interface metaphor. Justify adds two things to the basic outliner interface. First, it has a very rich ontology of types of points raised in the argument and their relationship to other points. This encourages good behavior by the participants. Their declaration of the intent of raising each point helps guard against hidden agendas and "debate tactics" intended to mislead. It keeps the argument structured at all times. Figure 1. The Justify user interface is displayed in a browser window. The buttons in the upper right allow the showing of documentation in the lower right pane. The lower right pane is used for documentation or detail. Now shown in the lower right pane are the initial instructions. The points hierarchy is shown in the large lower left pane. A point is represented by a single line in this pane. The top point is the justify_repository. The first child of that point is Justify Help. Second, it enables automatically generating summarizations of the argument at every level. It enables using diverse summarization techniques such as logical deduction techniques or voting. PROBLEM ANALYSIS THROUGH THE UI LENS We can apply the discipline of application design, including its user interface, to analyze decision making. We need to collect enough of the relevant information and be confident that we have done so. We need to filter out the wrong or useless information so that it doesn’t distract us. We need to organize this information in the right presentation, while realizing that different views are optimal for different tasks. We need to summarize at numerous levels so as not to miss the forest for the trees. CONCEPT GRANULARITY In nearly every traditional ‘view’ of an argument, the granularity of information is too coarse. Take what is called a “debate” on US national TV. A candidate is given a few minutes to talk, another a few minutes to respond. Or, candidates may be asked a question about a complex topic and allowed to give a lengthy answer. It is not simply that candidates rarely answer the question they are asked. Its that their response contains so many different ideas that a challenger is faced with picking which ones to address. Not only doesn’t the challenger remember all the points, they select even fewer to respond to. Listeners of the debate fare no better in recalling which ideas were strategically omitted. Dangling questions never get resolved. Incomplete rebuttals distract listeners from more substantial issues and we’re left with a miasma of disconnected information. Consensus meetings don’t address this problem either. Usually too many people want to speak at a given time. A “stack” of pending people is formed. Once an individual “gets the floor”, they cram in as many ideas into their “air time” as possible since it will be a while before they’re allowed to speak again. Participants wishing to respond to just one of a speaker’s ideas are told to wait their turn. That might take 10 minutes or more, with several intervening speakers, after which the context of their response is largely forgotten as the discussion meanders to new areas. THE ADVANTAGE COMMUNICATION OF ASYNCHRONOUS Real time constraints make the above problems nearly unavoidable. When our discussion goes on-line, we are freed from the tyranny of the clock. We can take as long as we want to compose our utterance, allowing the conscientious author to perfect meaning and concision. Even more important, numerous users can speak (by which I mean “type”) at the same time without colliding at the listeners’ ears (by which I mean “readers’ eyes”). There is no need for the frustrations of waiting your turn in “the stack”. Unfortunately traditional on-line forums solve only part of the problem. Comments typically fall within one of two extremes. Either they are too terse to be useful or unambiguous, or they contain numerous points, only a small percentage of which are addressed by subsequent comments. Those subsequent comments are usually “out of place” because, by the time a thoughtful responder submits their relevant reflection on the latest comment, often several other less-thoughtful responders have interjected comments that distance the thoughtful comment from its intended context. JUSTIFY POINTS Justify encourages a much finer granularity of utterances than real time discussions or the significant comments in on-line forums. Each Justify point contains just ONE idea. This permits comments on it to unambiguously address just that one idea. Justify encourages points to be so constrained by: Making its primary representation 1 line of text (ie too small to contain more than one idea) Requiring the point’s author to declare a specific type for the point, making two ideas of different types within a point impossible Providing other users a direct way to critique a point because it contains more than one idea. To avoid the “too terse to be unambiguous” condition that short on-line forum comments often exhibit, a Justify point cannot be created without assigning a highly specific type to it that captures its core semantics (pro, con, clarifying question, etc.) THE MYRIAD PROBLEM OF NIT-PICKING LITTLE POINTS We might structure a paragraph or an utterance to be a list of ideas with a certain coherency. If we interject comments about each idea in-line, we break the coherency of that list. Our rich annotations obscure the basic list. So we really need two views, one with all the detail and another simple and straightforward. Justify’s outline UI is ideally suited to this task. As in any decent hierarchy browser, you can display or hide any level. One problem with most outline editors is that in viewing a list of items, its context, ie its true position in the hierarchy is scrolled off the top of the screen. Justify’s focus operation hides the uncles, granduncles, etc. of a point, showing only its direct line ancestors. For deeply nested lists, even that can be too much, particularly as they eat up the indentation whitespace on the left, leaving too narrow a column of real content. Justify’s set top operation makes the current point display as if it’s the root, at the top left of the screen, gaining back all that whitespace on the left. Paste lets you insert a cut or copied point anywhere in the hierarchy. Pastewrap let’s you paste a point on top of another point, effectively replacing a point with the pasted point but making the replaced point become a child of the pasted point, pushing it down one level. Siblings to subpoints makes all the siblings of a point into subpoints of the point whereas subpoints to siblings does the opposite. These novel editing utilities make reconfiguration of a complex argument easy. JUSTIFY INTERFACE See the illustration of the Justify interface in Figure 1. The Justify UI includes documentation not just in prose documents, but as points, so that you can hierarchically browse them. Justify Documentation contains children that describe each of the point types. Jusity Comments let’s users give feedback about Justify itself, permitting other users to support or oppose those comments, all using regular Justify points. We have expanded the Justify Playground discussion to display an area where users can try out the UI. Let’s examine the point Does democracy work? from left to right. First a button that allows us to show or hide the point’s children. Next is a pull down menu of operations for editing and displaying the point. The brown rectangle to its right is the conclusion of the point. Next is question, the type of the point followed by pro_or_con, the subtype of this point. Does democracy work? is the title of the point. Following that is admin, the author of the point. Lastly we have id=2, an identifier that other points can use to refer to this point. Most of these point characteristics are clickable to reveal additional detail in the lower right pane. In the top pane, the Preferences button lets us selectively hide each of these characteristics for all points, decreasing screen clutter. AUTOMATIC SUMMARIZATION AT ALL LEVELS As discussions grow deep and wide, it becomes increasingly difficult to ascertain the gist of the discussion. Are there unanswered questions? Has a point been refuted? What’s a vote tally? Each point type has an algorithm for computing its conclusion based on its subpoints and attributes. Examine the point in figure 1 of Does democracy work? This question is a ‘yes or no’ question. For consistency with rationale and to maximize point interoperability, we label it pro or con and expect its conclusion to be a pro or con point. There are many ways to summarize an argument or to draw conclusions from an argument. Some of these may themselves be controversial. They range from formal techniques like proof in first-order logic, to various voting schemes. Justify allows for multiple summaization techniques, so that the convenience of automatic inference can be obtained, but its methods can be examined or debated if necessary. Here is one simple algorithm: If all the subpoints have pro conclusions, the conclusion is pro. If all subpoints have con conclusions, the conclusion is con. If there’s a mix, the conclusion is undecided. Does democracy work? has just one subpoint, not by itself. Since that subpoint has no subpoints and it is of type con, the conclusion of this subpoint is con and that forces the conclusion for Does democracy work? to be an unfortunately accurate con (ref Erdman & Susskind). However, participants in this discussion are likely to add additional rationale which could well change the question’s conclusion to a green pro or a yellow undecided. One way to browse a hierarchy is to expand only those points whose conclusions you disagree with. You may learn new rationale that causes you to change your mind, or detect an omission. By adding subpoints to a point, you can perhaps change the conclusion to what you originally expected or even to a conclusion that neither you nor the previous authors expected. Thus Justify can help users discover new solutions to problems that were hidden by complexity. A more sophisticated tack is to employ algorithms such as Truth Maintenance [Doyle 79] (aka Reason Maintenance or Belief Revision). Particularly relevant might be the multiagent distributed truth maintenance of [Mason and Johnson 89]. These algorithms compute dependency relations amongst conclusions with multiple supports and propagate the effect of changes to beliefs. Since each reasoning step that Justify takes is immediately viewable, users can easily verify the validity of the conclusion. If you decide a conclusion is not valid, you can’t simply edit it. You must add points that indicate additional rationale for Justify’s conclusion algorithm to consider. Figure 2. An argument evaluating candidates for a Best Paper Award. Our top level prioritize point represents each criteria as another prioritize subpoint. Each prioritize subpoint as one subpoint, a box that packages up the evaluations from the reviewers as numbers. Under Writing Style Evaluations we get a score of 75 for Touch and a score of 25 for Smell. For this criteria, Touch is the clear winner and is represented as the conclusion for Clearest Writing Style. For the Most Innovative, the scores are nearly reversed so that Smell wins as is indicated in its conclusion. Our top level prioritize adds up the scores from the criteria for each paper giving Touch an overall score of 75 + 30 = 105 and Smell an overall score of 25 + 70 = 95. Thus Touch is the conclusion of Best User Interface Paper. At times you may realize that it is not an omission of rationale that’s causing a sub-optimal conclusion, but that the algorithm itself is at fault. For instance, changing from consensus to majority rule could be warranted. Such modifications should be principled, must be definitive, can only be made by a point’s author or the discussion moderator, will be easily viewable by all and may, like all other changes, be critiqued using additional Justify subpoints. The conference chair complains to the judge that the theme of this year’s conference is innovation, so that the Smell paper should win. We can represent the preference for innovation by wrapping a Scale point around Most Innovative and giving it a multiplier of 2. This effectively doubles the scores for under Innovative Evaluations increasing Touch from 30 to 60 and Smell from 70 to 140. Since the initial score for Smell is higher, its increment is greater than the increment to Touch. Now Smell’s overall score exceeds that of Touch and the conference chair is happy. SCENARIO This stylized discussion can be expanded with many other point types. We could use the math.sum point to add up review scores for instance. We might use a con point to nullify the effect of a lousy reviewer. We can even critique the scale factor of 2 since perhaps innovation should be given a greater or lesser weight. More significant, we can break down any criteria with sub-criteria by adding prioritize subpoints, giving us a higher resolution evaluation. Imagine you are the judge deciding the best paper at a user interface conference. You have narrowed down the candidates to a paper on Touch and one on Smell. You have two criteria, Writing Style and Innovative. The reviewers have giving you scores for each paper for each criteria. We can represent this information in Justify like so: CARRYING OUT DECISIONS Making the best decision is crucial but meaningless if it isn’t acted upon. Justify allows decisions to be grouped together under agenda points. We can also add action points that bubble up from deeply nested discussions to tell actors how to accomplish the decisions in an agenda. That makes clear what should be done. Harder, though is providing the motivation for doing an action. People are frequently unmotivated to perform an action if they don’t understand why its important. A complex decision (whether Justify is used or not) might entail many points. We see summaries of why a course of action is appropriate in Supreme Court opinions or referenda question descriptions. These are often too long for us to read or too short to capture the detail we’re interested in. They are impossible to write optimally for a broad audience as different readers will want different levels of detail for specific aspects of the decision. This is where Justify’s dynamic presentation of a decision shines. Conclusions at each level are obvious and concise. More detail for a particular point is just a click away. Whether a decision was commissioned by a busy CEO or created by a legislature, those that must act on it or at least are affected by it, can benefit from the multi-level summaries of Justify. types useful in a wide variety of processes. Fundamentally it is a language, where, like a good programming language, the pieces are optimized to be combinable in a maximum number of configurations, tailored to the problem at hand. Points represent both instances of their type, and methods that are called to return a conclusion. VOTING Figure 3. A question with two subpoints. Like a programming language, you need a development environment to facilitate creating and debugging your programs/discussions. Justify also provides the user with multiple views of their discussion to ease understanding. Voting is considered to be the decision-making process of first resort in modern democratic societies. Justify considers voting the process of last resort, to be used only when reasoning fails. There are a bunch of different voting strategies in Justify. The crudest are one answer like the common ‘pick one candidate, majority wins’ and pro or con as is common on referenda. Better than either of these for most situations is preferential voting allowing voters to order candidates. Justify represents this as a prioritize point. A more novel kind of voting is how_much. Voters specify a number within a pre-determined range. The numbers from each ballot point are combined to form the conclusion of the vote. How the numbers are combined is indicated in an attribute to the vote point. Possibilities are: average, median, highest, lowest, sum and product. Here we use the how much combiner attribute of sum as is typical in the pork barrel legislature process of decision making. Decision making processes where the participants don’t have to make trade-offs frequently lead to poor long term decisions (as in the above example employing vote.how_much). A novel voting point in Justify that forces voters to make trade-offs is apportion, where each voter must allocate pieces of a fixed budget to one of a set number of candidates. The sum of the pieces on each ballot must not exceed the budget. The amounts for each piece from each ballot are averaged together, making staying within a budget easy. If a legislator’s ballot was public, his constituents would have an easy way to evaluate the legislator’s performance, making the desirability of such a voting technique among legislators low. THE POWER OF LANGUAGE Justify does not prescribe a set methodology for making decisions. Rather it is a toolbox with 150 different point Figure 4. A different view of the above that uses natural language generation to present the points an stepping to navigate the hierarchy. FUTURE WORK Justify has not yet been tried on a large group of users. This is necessary to find out what works best in the visual user interface and to extend and refine the point set to improve the conceptual user interface. The tool has a modular design to ease such improvements. Automatic classification of points The potential exists to incorporate AI techniques for automatically classifying point types and argument types from natural language. Justify’s ontology of points is complex, and users may have trouble initially learning the ontology or correctly classifying points. Some automatic suggestion of point types could go a long way towards reducing the cognitive burden of point identification for users. We look toward natural language processing techniques based on Commonsense Knowledge, such as [] and [] to help in this regard. RELATED WORK Argumentation systems have a long history. The paper [Conklin, et al 2003] is a survey that includes landmark systems from Doug Engelbart’s work on Augmentation and Hypertext from 1963 through NoteCards, gIBIS [Conklin 1988] and QuestMap through Compendium [Conklin 2003]. Conklin’s work on Compendium incorporates the best ideas of the previous systems so I will concentrate my analysis on it. Compendium uses various question and answer types as does Justify, including templates for filling out such nodes in a network, as does Justify. A crucial aspect of “lessons learned” here is about being able to quickly record informal statements and, as needed formalize them. Justify permits this since you can create a “generic” point such as idea or information and change its type later, say to a pro or con. Pros and cons can further be refined to more specific types should the author like. The display in Compendium is that of a 2-D graph of “icons on strings” where the strings represent typed links between the icon nodes. This notation is semantically flexible, but requires more work in graphical arrangement and declaring link types than Justify’s outline/hierarchy. The expand/contract nature of a good outliner makes hiding of detail particularly easy. No reference to education has been made in the body of this paper though the authors believe that structuring rationale can be a high-leverage learning tool. We would like to acknowledge Buckingham’s work on Cohere and the conceptual framework described in [Buckingham Shum 2010]. Conklin, Buckingham Shum and other researchers have done groundbreaking work in many aspects of knowledge representation. To their credit, they have tackled difficult issues in real-time meeting knowledge capture, a use case for Justify that I hope some day it can support. We would also like to credit a project named SIBYL (ref Lee) done by Jintae Lee at the Center for Coordination Science directed by Thomas Malone. Fry worked in the early 1990’s at this center. The SIBYL project was instrumental in introducing Fry to the field of formal representation of argumentation. Malone’s work of planetwide importance continues at MIT’s Center for Collective Intelligence. Iyad Rahwan (ref Rahwan) tackles representing argumentation in the Semantic Web technologies of XML,. RDF and OWL. This highly structured work promises to make an ontology that can be standardized and shared across the web. Although Justify is implemented using a programming language based on XML and is deployed on the web, I have not attempted to make a shared ontology, nor used a more traditional reasoning engine such as OWL. CONCLUSION Clearly we need better ways to select government representatives than the process employed in the United States of “incumbent + large campaign donors = victory” [Lessig 11]. But even with the most altruistic representatives, traditional debate and deliberation are poor at synthesizing and selecting the best solutions. Our inability to choose wisely is by no means limited to government. Businesses, non-profits, universities, households and individuals could all use help in decision making. In fact, one way to characterize a human being is “a complex decision-making animal”. The distinction between an individual and a group may seem stark given all the problems we have cooperating, yet if you accept Marvin Minsky’s “Society of Mind” thesis, [Minsky 06] intra-head competing agents may, in many ways, mimic inter-head agents. To recap, the contributions of Justify are: Extensions to outline editors that facilitate navigation, viewing and editing described in the body of the paper. A greatly expanded set of point types including not just questions, answers and rationale but conditional nodes, math nodes, various error/status nodes plus an enlarged set of voting, question, and rationale nodes, most of which are not described in this paper due to space constraints. Automatic computation and propagation of summarizations which facilitate determination of completeness and error checking. REFERENCES 1. Buckingham Shum, Simon and De Liddo, Anna (2010). Collective intelligence for OER sustainability. In: OpenED2010: Seventh Annual Open Education Conference, 2-4 Nov 2010, Barcelona, Spain. 2. Conklin, J., Selvin, A., Buckingham Shum, S. and Sierhuis, M. (2003) Facilitated Hypertext for Collective Sensemaking: 15 Years on from gIBIS. Keynote Address, Proceedings LAP'03: 8th International Working Conference on the Language-Action Perspective on Communication Modelling, (Eds.) H. Weigand, G. Goldkuhl and A. de Moor. Tilburg, The Netherlands, July 1-2, 2003. [www.uvt.nl/lap2003] 3. Jeff Conklin and Michael L. Begeman. 1988. gIBIS: a hypertext tool for exploratory policy discussion. In Proceedings of the 1988 ACM conference on Computersupported cooperative work (CSCW '88). ACM, New York, NY, USA, 140-152. 4. J. Doyle. A Truth Maintenance System. AI. Vol. 12. No 3, pp. 251–272. 1979. 5. [reference deleted for blind review] 6. Jintae Lee. 1991. SIBYL: A qualitative decision management system. In Artificial intelligence at MIT expanding frontiers, Patrick Henry Winston and Sarah Alexandra Shellard (Eds.). MIT Press, Cambridge, MA, USA 104-133. 7. Lessig, Lawrence, Republic Lost: How Money Corrupts Congress and a Plan to Stop It, Grand Central Publishing, 2011. 8. Malone, T. W., Lai, K. Y., & Fry, C. Experiments with Oval: A radically tailorable tool for cooperative work. ACM Transactions on Information Systems, 1995, 13, 2 (April), 177-205. 9. Mason, C. and Johnson, R. DATMS: A Framework for Assumption Based Reasoning, in Distributed Artificial Intelligence, Vol. 2, Morgan Kaufmann Publishers, Inc., 1989. 10. Malone, T. W., and Klein, M., Harnessing Collective Intelligence to Address Global Climate Change, Innovations, Summer 2007, Vol. 2, No. 3, Pages 15-26. 11. Minsky, Marvin, The Society of Mind, Simon & Schuster, New York, 1988. 12. I. Rahwan, B. Banihashemi, C. Reed, D. Walton and S. Abdallah (in press). Representing and Classifying Arguments on the Semantic Web. The Knowledge Engineering Review. (to appear) 13. Susskind, L., The Cure for Our Broken Political Process: How We Can Get Our Politicians to Stop Fighting and Start Resolving the Issues that Truly Matter, with Sol Erdman, (Potomac Publishers), 2008.