BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY VOLUME 1 - Technical / Management Details (40 pages max excluding covers) Organization / Company CAGE Code DUNS / CEC Number TIN Number Type of Business Proposal Title University at Albany, SUNY OTHER EDUCATIONAL System Design Perspective Category (Check ONLY ONE Box) (See Section 5.1) _X_ 1. End-to-End System ___ 2. Component Elements ___ 3. Cross Cutting / Enabling Technologies If "Component Elements" Category Selected Above (Check ALL that Apply (See Section 5.1.2) ___ ___ ___ ___ Data Strategy (See Section 5.2.3) ___ Focused Data Strategy _X_ Diverse Data Strategy ___ Other (Identify): HITIQA-2 Question Understanding and Interpretation Determining the Answer Formulating and Presenting the Answer Other (Identify): Page 1 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY BAA 03-06-FH - VOLUME 1 - Technical / Management Details (CONTINUED) Team Members / Type of Business University at Albany, State University of New York / Other Educational Rutgers, State University of New Jersey /Other Educational City University of New York, Lehman College /other Educational Principal Investigator(s) Name(s) Mail Address Professor Tomek Strzalkowski Phone Number Fax Number E-mail Address Administrative Contact Name Mail Address 518-442-2608 518-442-2606 tomek@csc.albany.edu Ms. Linda Donovan Phone Number Fax Number E-mail Address 518-437-4555 Proposal Duration Cost - Year 1 Cost - Year 2 Total Cost 24 months $ $ $ HITIQA-2 University at Albany, SUNY 1400 Washington Avenue, SS-262 Albany, NY 1222 ldonovan@uamail.albany.edu Page 2 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY Part I: Summary of Proposal. (Tomek + Paul) National security depends more than ever upon accurate, high-quality information being available at the right time to support important policy decisions. A key element of this process is the work of the intelligence analyst, who must quickly and efficiently produce the right information from a potentially enormous number of sources, reports, documents and databases. Today's information retrieval and factoid question answering technology provides some help, but is also a source of growing frustration and missed opportunities. There is little doubt that a more powerful technology is needed to address question answering needs of professional intelligence analysts. The first phase of ARDA’s AQUAINT Program has pushed the QA technology out of its infancy into the realm where we can start seeing its benefits in the future. The HITIQA team has made a significant progress in Phase 1 of AQUAINT. We have developed a prototype analytical QA system in which we demonstrated preliminary solutions to several key goals of the AQUAINT program: Accepting complex, analytical questions in a form natural to the analyst. “Understanding” the questions in context of available unstructured data. Negotiating this understanding with the analysts through a multimodal dialogue. Providing access to related adjunct information uncovered through the search and framing process. Delivering to the analyst means for exploring the answer space via interactive visualization. Generating a preliminary answer out of fused “headlines” and fragments of original source material In addition, we have completed extensive studies in several aspects of usability and efficacy of the emerging QA technology and its use in the intelligence environment: A series of usability studies, including hands-on sessions with the USNR and USAF analysts, confirmed the strengths of overall HITIQA approach, while also exposed limitations of the current prototype. Web-based implementation made the technology available to a variety of users who provided feedback and helped us to capture their expectations. Two series of information quality experiments have produced a large volume of data collected from dozens of users over many sessions and across different backgrounds, leading to a preliminary conclusion that the perception of information quality is an acquired and highly individualized phenomenon. Similarly, experiments with information fusion revealed that HITIQA technology thus represents a radical departure from the “factoid” question answering that has dominated the research landscape until now. While factoid systems have made significant strides in accuracy and efficiency, as demonstrated by the most recent TREC QA evaluations, their utility has always been limited in the world of professional analysts. Therefore, based on our successes with HITIQA-1 system and our growing experience and understanding of the analytical process, and given the goal of AQUAINT Phase 2, we propose another radical leap forward to turn HITIQA from a helper tool into an indispensable, highly adaptable and personalizable “analyst’s HITIQA-2 Page 3 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY assistant”. In doing so, we will address the following Phase 2 goals, schematically illustrated in Figure 1: Question Answering as Part of a Larger Information Gathering Process: HITIQA2 will support the analyst throughout the entire analytical process associated with an “analytical scenario”. This means not only accepting a series of interrelated questions but also providing their interpretation in context of the overall information task. Accessing, Retrieving and Integrating Diverse Data Sources: HITIQA-2 will exploit structured data as a source of pre-processed information of direct interest to the analyst, as well as a source of knowledge that can adapted to provide a better understanding of unstructured, unprocessed, novel information. Interact with the system, using questioning strategies natural to the analyst: We will advance current triaging dialogue and visual browsing in HITIQA-1 to full problem-solving dialogue and exploratory navigation that will provide cooperative environment where the system actively assists the analyst in his/her work. Explore boundaries of statistical and linguistic approaches to QA: HITIQA is already a hybrid system encompassing a variety of statistical and linguistic methods for information processing. This will be significantly expanded by adding knowledge acquisition methods that will utilize structured databases to learn how to process unstructured data with accuracy comparable to manually built knowledge-based methods, while also scalable to new and diverse domains. Adapt to analyst’s preferred problem solving style: We will build into HITIQA automated mechanism for adapting the system’s performance to closely match the analyst’s personal preferences and style. This will be achieved over time through an adaptation process that tracks analyst’s information selections and interaction patterns and adjusts system’s behavior accordingly. Maintain analyst’s confidence in the QA process: HITIQA-2 will create and maintain a persistent network of successive models reflecting the analyst’s information exploration strategy and a changing peripheral context. This will include a working space of the currently active answer model, as well as the backdrop of secondary information which can be explored to guarantee completeness. Evaluating, Validating and Presenting the Answer: In HITIQA-2, the answer, in the form of a preliminary analytical report, will be assembled from the structured knowledge sources and unstructured data items. This will be accomplished through adoption of frame-based semantics, shared among multiple data sources. HITIQA-2 Page 4 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY Figure 1. HITIQA-2 Concept and Components A. Summary of Innovative Claims HITIQA has been conceived as a long term research project to address the challenges for the intelligence community identified in the AQUAINT Program as a whole. In Phase 1 we attacked a number of these challenges, finding solutions to some and making inroads into others. We have also discovered additional challenges that need to be solved before the QA technology can have visible impact on the work of the intelligence analyst. What we propose for Phase 2 is therefore not an incremental addition to our Phase 1 work; rather the challenges before us require an entire new set of innovations to be delivered. These innovations are summarized in Table 1 below by laying out Phase 1 advances and proposed Phase 2 goals against the grid of overall objectives for HITIQA project. Table 1: HITIQA-2 Advances compared to HITIQA-1 base HITIQA INNOVATIONS PHASE 1 PHASE 2 Questions Single analytical Scenarios involving series of questions and strategy Dialogue Clarification triage Clarification Navigation and Problem- HITIQA-2 Page 5 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY solving Answers Fused headlines and text passages Fused passages and generated reports Semantics Data-driven in general domain Manual fit over specialized domain Data and knowledge-driven, Domain adaptable Knowledge acquisition from structured sources Task-level persistence & adaptability Not adaptable One model per interaction Feedback with source fusion Successive models following analyst’s task strategy Model backdrop context User-level persistence & adaptability None No personalized features Adapts to user information selection and judgments Keeps memory of interactions Always on if needed Visualization Answer space topology Interaction alternative to dialogue Event and relationship map Navigation/exploration Coordinated multimodal interaction Information Quality & Usability Measured per source 9 empirical quality criteria Measured per source & topic Individualized criteria based on the analyst’s pattern of use Evaluations & Usability Studies Program-level pilots Short-sessions with users USNR sessions Program-level metric-based evaluations Sustained usability testing with USNR, USAF, other analysts Data sources Unstructured text Unstructured text Structured databases Web based sources B. Summary of Technical Rationale Brief summary of the technical rationale, technical approach, and constructive plans for accomplishment of technical goals. The key technical challenge to developing a practical QA system for the intelligence analyst is equip it with capacity to substantially assist the analytical process. This means being able to augment analyst’s capabilities for locating and correlating information, detecting information of a certain kind etc, and doing so without prior detailed knowledge of the topic. Achieving this requires an imaginative combination of existing knowledge and projecting it over the new data. --- structured data can be converted into knowledge HITIQA-2 Page 6 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY --- this knowledge can be projected over novel information to achieve an initial, partial structuring --- this structuring is can be used to derive extraction tools to locate further items of interest in unstructured data --- the partial understanding of the data can be refined through the dialogue with the analyst so that (1) a model of answer space is created that can be navigated, (b) the system grasp of the task domain semantics is refined, and (c) a revised model is built. --- the system’s growing grasp of analyst’s goal and strategy is projected from the refined model into the larger data context in the form of data fusion (based on analyst’s perceived usefulness of information rather than any specific “objective” metric such as relevance). --- all these observed data are rendered into 3-D interactive visualization that provides an orthogonal interaction mode to the language based dialogue • • • • Scenario-based QA Interaction – Complete problem solving support – Builds successively more accurate answer space models – Detects when full answer space seen Adaptable Analyst’s Personal Assistant – Adapts to analyst’s background and style – Keeps memory of interactions, tasks and solutions – Optimizes source selection for relevance & quality Instant Domain “Expertise” and Adaptation – Adapt and absorb structured data sources as knowledge – Optimize through bootstrap learning – Project knowledge structure over unstructured sources Tryouts and Evaluation – Metrics evaluations on answer completeness and compactness – Focus on usability testing (e.g. USNR and USAF analysts) – Integrated multi-modal interaction and navigation Box 1 HITIQA-2 Key Research Objectives C. Schedule and Milestones Schedule and milestones for the proposed research, including overall estimates of cost for each task. A one-page graphic illustration that depicts major milestones of the proposed effort arrayed against the proposed time and cost estimates must be included. D. Summary of Deliverables A summary of the deliverables associated with the proposed research. E. Key Personnel A clearly defined organizational chart of all anticipated program participants with brief biographical sketches of key personnel and significant contributors, their roles (including HITIQA-2 Page 7 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY role of Principal Investigator) and their level of effort in each year (calendar year or academic / summer year) of the program. A chart, such as the following, is suggested. Participants Organization Role Year 1 Year 2 Prof. Tomek Strzalkowski University at Albany Key Personnel/ PI, PM 25% 25% Prof. Deborah Andersen University at Albany Significant Contributor 25% 25% Ms. Sharon Small University at Albany Significant Contributor 100% 100% Doctoral Candidate 1 University at Albany Contributor 50% 50% Doctoral Candidate 2 University at Albany Contributor 50% 50% Graduate Assistant 1 University at Albany Contributor 50% 50% Graduate Assistant 2 University at Albany Contributor 50% 50% Graduate Assistant 3 University at Albany Contributor 50% 50% Prof. Paul Kantor Rutgers University Key Personnel/ co-PI 25% 25% Prof. Nina Wacholder Rutgers University Significant Contributor 25% 25% Prof. K.B. Ng Rutgers University Significant Contributor 25% 25% Graduate Assistant 1 Rutgers University Contributor 50% 50% Graduate Assistant 2 Rutgers University Contributor 50% 50% Prof. Boris Yamrom City University of New York Key Personnel 25% 25% Graduate Assistant 1 City University of new York Contributor 50% 50% Professor Tomek Strzalkowski – University at Albany, SUNY Education: Simon Fraser University, PhD Computer Science, 1986. Experience: Dr. Strzalkowski is an Associate Professor of Computer Science at SUNY Albany. Prior to joining SUNY, he was a Natural Language Group Leader and a Principal Computer Scientist at GE CRD. Prior to GE, he was an Assistant Professor of Computer Science at New York University. He received his PhD in Computer Science from Simon Fraser University in 1986 for work on the formal semantics of discourse. He has done research in a wide variety of areas in computational linguistics, including database query systems, formal semantics, and reversible grammars. He has directed research projects in natural language processing and information retrieval sponsored by ARDA, DARPA and NSF, including work under several TIPSTER contracts. While at GE, he was developing advanced text summarization systems for the Government. Dr. Strzalkowski has published over a hundred scientific papers on computational linguistics and information HITIQA-2 Page 8 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY retrieval. He is the editor of two books: Reversible Grammar in Natural Language Processing, and Natural Language Information Retrieval. Current sources of support include DARPA-funded AMITIES project (2001-04; 20% commitment) and ARDAfunded cross-document summarization project (2000-02; 20%). Pending proposals: DARPA EELD (2001-06; 20%) and NSF ITR (2001-04; 10%). Professor Paul B. Kantor (http://scils.rutgers.edu/~kantor) Rutgers University Education: Ph.D. Theoretical Physics, Princeton University (1963) Experience: Dr. Kantor is Professor of Information Systems in the School of Communication, Information and Library Studies at Rutgers, the State University of New Jersey. Previously he served as a faculty member at Case-Western Reserve University, in the departments of Physics, Library Science, System Engineering, and Operations Research. At Rutgers since 1991, he has directed numerous research projects on the development and evaluation of library and information systems, most notably the ANLI system for augmenting a library online catalog with hyperlinks, and the AntWorld project. Prof. Kantor is also a Member of the internationally renowned Rutgers Center for Operations Research (RUTCOR), director of the Alexandria Project Laboratory, and director of the Rutgers Distributed Laboratory for Digital Libraries. He is author of more than 160 journal articles, book chapters, conference papers and technical reports, and his research has been supported by the ONR, the Institute for Defense Analysis, NSF, DARPA, and other organizations. He is a regular participant in the NSF Information and Data Management planning conferences, serves as a reviewer for numerous scientific and scholarly journals, and is a Fellow of the American Association for the Advancement of Science and the founding Editor in Chief of the journal Information Retrieval. Current projects include the DARPA-funded Novel Approach to Information Finding (AntWorld) N66001-97-C-8537 (15%). Pending projects include Dynamic Indexing and Archiving of Brain Images (NSF/ITR 11%) and Disruption of Quantum Coded Messages (NSF ITR/SY. 15%) <ADD ALL key personnel and significant contributors> HITIQA-2 Page 9 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY Part II: Detailed Proposal Information. This part shall provide the detailed, in-depth discussion of the proposed research. Specific attention must be given to addressing both the risks and payoffs of the proposed research making it desirable to pursue. This Part shall provide: A. Innovative Claims (Tomek, Paul, Boris) This is the centerpiece of the proposal and should succinctly describe the unique proposed contribution. Question Answering as Dialogue with Heterogeneous Data Instant Domain Expertise: Knowledge acquisition from structured data Knowledge bootstrapping and induction Scenario based extended interaction support – creation of multiple models, model evolution, model revision – sessions-based persistence Personalization of interaction style and personal preferences – user-based persistence. Learning from choices made: optimizing all aspects of the system – retrieval, fusion, quality, framing, answer. Self adaptation: create good-enough initial system, improves with usage Answer generation: from headlines to reports Data integration across sources – frame based Visualization: exploring navigation dimension – 3D, event-relationship map, time; tightly coupled with dialogue Information quality – aspects? Attitudes? – Quality Model derived for an analyst – multiple models can be used to provide various viewpoints Information fusion: view outside of the model window, zoom in on out parts means more retrieval and new model. B. Detailed Technical Rationale The technical rationale should clearly show why the proposed technical approach is expected to achieve the stated purpose within the proposed cost and time schedule. The rationale shall also describe the rationale for the claims and deliverable products outlined elsewhere in the proposal and show how past / current performance justifies an award in this technical area. Background on HITIQA-1 advances and accomplishments, data-driven semantics of questions, clarification dialogue (triage) etc. All this is assumed “solved” and available as building blocks for Phase 2. B.1. Question Answering as Multimodal Dialogue with Heterogeneous data Sources (Tomek) Adaptation of structured data into knowledge Interactive QA from structured data Projecting structures over unstructured data HITIQA-2 Page 10 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY B.2. Scenario level interaction using evolving successive models (Tomek) Follow up questions, continuations, changes of direction, detours, drill-downs Model structure – frames and links B.3. Acquiring domain knowledge from structured data (Tomek) Instant frames from structured data Validation against current data and task Bootstrapping and tuning from the task data B.4. Self-adaptation and Personalization (Tomek, Paul, Nina) Adaptation to task and approach – capturing a strategy Adaptation to the analyst interaction style Selecting and loading other styles and strategies Adaptable source fusion B.5. Advanced Answer Generation (Tomek) Generation of reports from frames, models, and structured data B.6. Information Quality Models (Paul, Nina, KB) Personalized quality models Quality model selection and exchange B.7. Navigating Answer Space through 3-D interactive visualization (Boris) Model navigation, model background, moving lens Controlling background processes C. Statement of Work SOW describing the effort’s scope, the specific tasks to be performed and their associated schedules. At a minimum, the statement of work shall consist of the following sections: C.1. Scope of the Proposed Work A statement as to what the SOW covers: objectives and goals and major milestones for the effort. Key elements are task development and deliverables. TABLE 2. SCHEDULE OF MILESTONES AND DELIVERABLES TIMING monthly 3 month 6 month HITIQA-2 MILESTONES Page 11 DELIVERABLES Status report Quarterly Report #1 Quarterly Report #2 3/9/2016 BAA 03-06-FH AQUAINT Phase II SUNY Albany/Rutgers/CUNY 8 month 9 month 10 month 12 months 15 months HITIQA version 1.0 Year 1 Report & Review Quarterly Report #4 18 months Quarterly Report #5 21 months Quarterly Report #6 24 months Final system assembly Components API’s HITIQA version 2.0 Final Report & Review Quarterly Report #3 C.2. Technical and Task Requirements A description of tasks, representing the work to be performed, developed in an orderly progression and in enough detail to establish the feasibility of accomplishing the overall program goals. The overall effort should be grouped into major tasks and identified in a work breakdown structure (WBS)-like numbering system. Proposed costs shall have a one-to-one correlation to this reporting structure, which shall be depicted in the cost volume. Task 1 Task 2 D. Description of Results A description of the results, products, transferable technology and an expected technology transfer path must be included. E. Comparison with On-going Research, Highlighting the uniqueness of the proposed effort / approach and differences between the proposed effort and current state-of-the-art clearly stated. Identify the advantages and disadvantages of the proposed work with respect to potential alternative approaches. Compare with other AQUAINT work (LCC, IBM, NMSU, ISI…) Compare with other QA works HITIQA-2 Page 12 3/9/2016 BAA 03-06-FH AQUAINT Phase II F. SUNY Albany/Rutgers/CUNY Previous Accomplishments Discussion of Offeror's previous accomplishments / work in this or closely related research areas. G. Description of Facilities that would be used for the proposed effort. H. Statement about Government-furnished Property If any portion of the research is based on the use of Government-owned resources of any type, the Offeror shall specifically identify the property or other resource required, the date the property or resource is required, the duration of the requirement, the source from which the resource will be obtained, if known, and the impact on the research if the resource cannot be provided. If no Government-furnished property is required for conduct of the proposed research, this section shall consist of a statement to that effect. I. Proposed Research Team Detailed description of the support, including formal teaming agreements, required to execute the Offeror's proposal. Discussion of teaming relationships should include the programmatic relationship of team members; the unique capabilities and relevant accomplishments and concise summary of qualifications of all team members (key personnel and significant contributors), with information about their major sources of support and commitments of their time; the task responsibilities of team members; the teaming strategy among the team members; and the management approach for the team. Full resumes / curriculum vitae of key personnel and significant contributors should be included in Volume 2 (Additional Reference Information) of the proposal. J. Proprietary Claims A summary of any proprietary claims to results, prototypes, or systems The Offeror shall submit a separate list of all technical data or computer software that will be furnished to the Government with other than unlimited rights in accordance with DFARS 252.227-7017, Identification and Assertion of Use, Release or Disclosure Restrictions. All AQUAINT contractors will be required to provide deliverables (software and documentation) for integration with other AQUAINT Program contractor’s products for use in testbed evaluations and demonstrations in an end-to-end simulated operational environment. (See Section 5.4 for more information about the testbed.) K. Proposed Evaluations Description of how progress toward completion of their research goals will be measured, including a description of the evaluations to be performed, a schedule of implementation and type of report to be prepared. HITIQA-2 Page 13 3/9/2016 BAA 03-06-FH AQUAINT Phase II L. SUNY Albany/Rutgers/CUNY Data Sources Identification and description of anticipated data sources to be utilized in pursuit of the project research goals. AQUAINT, CNS, WMD Additional structured databases from CNS, USGS, … M. Participation in AQUAINT Testbed Summary of a plan, schedule and process for participation in the AQUAINT testbed. HITIQA-1 system is being currently deployed at the MITRE testbed. We plan that the HITIQA-2 prototype could be inserted at the end of first year of Phase 2. We assume that HITIQA-2 will be backward compatible with HITIQA-1 thus allowing insertion of selected components as soon as they are developed. HITIQA-2 Page 14 3/9/2016