Big Data, Big Records (Department of Interior)

Big Data, Big Records
NOVA ARMA NCC-AIIM
US. Department of the Interior
Office of the Chief Information Officer
John Montel
Office of the Chief Information Officer
Policy Planning and Management
February 27, 2013
Carrie Mallen
IQ Business Group
eDiscovery Practice
Department of the Interior
•
•
•
•
•
•
•
•
•
Cabinet level agency
14 Bureau Offices
Employ’s ~70,000 / 280,000 volunteers
Manages $16.8B operating budget
Manages 500 million acres of surface land
Manages 479 dams and 348 reservoirs
Supplies 30% of the nation's energy production
Produce 55,000 different maps each year
Protects ~500 million recreational and cultural visitors
U.S. Department of the Interior
http://www.doi.gov/facts.html
2
IT Transformation
• Unified Messaging (BisonConnect)
– Google apps for Government
• Enterprise Information (eERDMS)
– Enterprise eArchive System
– Enterprise Content System
– Enterprise Forms System
– Enterprise Dashboard System
U.S. Department of the Interior
3
eEDRMS Program Vision
Provide the Department of the Interior with a single
cohesive integrated information management program
designed to support and manage departmental records
related to email, documents, and content in the Cloud
eERDMS Program Objectives
•
•
•
•
•
•
•
•
•
Capture all unified messaging journaled email records
Capture all mobile content records
Capture all lines of business records
Capture all business system records
Develop a super bucket records schedule
Develop an online automated litigation hold process
Support Freedom of Information Act requests
Support litigation early case assessment needs
Support Congressional and Department inquiries
Program Capabilities
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Records Management DoD 5015 v3
Records, Document and Email Archiving/Journaling
Records and Document Auto Classification
Records and Document Content Management
Records and Document Imaging
Records and Document Management
Records and Document Scanning
Records and Document Workflow
Records and Document Collaborating Workspaces
Records and Document Auditing
Records and Document Advanced Early Case Assessment & Review
Records and Document Mobility Content Management
Section 508 Compliance out of the box
Optional: Advanced Legal Review, Social Media Capture, Email
Management, National Shredding Program & National Digitization
Program, Migration Services and Support Staff Services.
6
OMB Directive M-12-18
Requires to the fullest extent possible - eliminate paper
and use electronic recordkeeping.
Expected benefits:
 improved performance and promotion of openness
and accountability
 further identification and transfer to the National
Archives and Records Administration (NARA) of the
permanently valuable historical records
 minimizing costs and operating more efficiently
U.S. Department of the Interior
7
eERDMS Environment
Enterprise
Content
System
ERA
Enterprise
Forms
System
Human Resources
Contracts
Enterprise
Records
System
Security
NIEM
Personnel XML Finance
Programs Operations
Administration
Logistics
Enterprise
Fax
System
Enterprise
Dashboard
System
Enterprise
Social
System
U.S. Department of the Interior
8
Big Data, Big Business
• 600+ million emails a year
– 70 Million in Jan 2013
– 100 Million Estimated for February 2013
– 1.2B emails received
– 15.5M records produced a day
•
•
•
•
•
•
•
22 Billion data points generated
5,500+ FOIA cases a year
200+ ongoing litigation cases
100+ million printed pages a year
4,100+ mobile devices
15,000 Fax devices
Exabyte / Zettabyte of electric content
U.S. Department of the Interior
9
Records Management Objectives
Provide the Department with:
• a single, simplified, integrated Records Retention Schedule
for managing Bureau/Office records
• a Retention Schedule based on Lines of Business shared
across Bureaus/Offices
• a Retention Schedule which reduces the complexity of the
existing Schedules to allow for the use of auto-classification
tools for assigning retention periods to Department records
We are, integrating knowledge for tomorrows workforce
10
Starting Point
• 14 Bureaus/Offices in DOI
• 200 existing Retention
Schedules
• 2,330 retention instructions
• Some Big Bucket Schedules
• Some Traditional Schedules
• Some schedules in draft
• Some schedules at NARA
awaiting approval
Traditional
Big Bucket
Traditional
Simplified Schedule
11
Department Records Schedule (DRS)
Strategy
•
•
•
•
•
•
Started with the Existing DOI Retention Schedules
Identified the Department’s Lines of Business
Created Crosswalks
Created Summary Worksheets
Drafted Super Bucket Retention Schedules, Ver 1
Entered Super Bucket Retention Schedules, Ver 1 in
eERDMS
• and then……..Auto-Classification
12
Policy Bucket
•
•
•
•
Controls and Oversight
Planning and Budgeting
Litigation and Judicial Activities
Regulatory Development
13
Mission Bucket
•
•
•
•
•
•
•
•
•
Biological Resources
Culture & Heritage
Disaster Management
Education
Energy
Environmental Management
Financial Management
Geospatial Services
Grants & Cooperative
Agreements
• Intelligence Operations
• Land & Marine Conservation
• Land Management Planning
•
•
•
•
•
•
Land Use
Minerals
Public Health & Safety
Water
Water Quality
Wildland Fire
14
Administrative Bucket
• Accounting
• Administration/Housekeeping
– Ultra Transitory?
– Transitory; out of office, Amazon, eBay, twitter,
early dismissal, marketplace, Credit Union,
Advisory notices, holiday notices, Dept. wide
notices
• Human Resources
• Information and Technology
15
Crosswalks
• Mapped each schedule item in
every schedule to the
Department’s Lines of Business
• Developed crosswalks
• Vetted crosswalks with
Bureaus/Offices Records
Officers
• Some Bureaus/Offices were
very involved with the process
Schedules
Lines of
Business
Vetted
Crosswalks
16
Results
200 schedules / 2330
retention Periods
1 schedule /
207 retention
periods over
47 LOBs
Former
SuperBucket
17
Auto-Classification
•
•
•
•
•
•
Definition/How it Works
Exemplars/Why
Testing and Refinement
Training
Implementation
Legal Defensibility
18
Auto-Classification
Definition of auto-classification:
• Tool that provides automatic identification,
classification, retrieval, and archival and disposal
capabilities for electronic records
• Tool that uses a hybrid approach that combines
machine learning, rules, and content analytics
• Tool that uses a rules engine and scans content
for words, phrases, tone, etc. to identify semantic
relationships to assign records classification and
retention periods to content (Open Text)
19
Auto-Classification
Auto-Classification Process
• System uses exemplars of each file node to train system to
recognize patterns, tone, etc.
• Find “like” (similar) feature used to gather additional
exemplars
• Use exemplars to create a model
• Precision and recall numbers need to be 75% or better
• Refine model with additional exemplars over time
• Auto-classification run on incoming email content to assign
retention periods.
21
Hold Options
• Search-Based Holds
• User-Based Holds
• Location-Based
Holds
• Classification-Based
Holds
Other
Considerations
– Journaling
– “Live” Content
– Content at Risk
Select Users to be on Hold - per
Matter
Option for
selecting entire
results set
Copyright © 2010 Open Text
Corporation. All rights
reserved.
Slide 23
User Based Holds
.
Slide 24
User Based Holds
Date ranges can be
applied
Applies a hold to all
items Created By,
Owned By or have a
version added by the
users in the specified
date range.
Slide 25
Users Can be Removed
Slide 26
More Advanced Search
eDiscovery Early Case Assessment
• Live exploration
– Search and explore data
before collection
and preservation
• Reduce involvement of IT in
collection
SharePoint
• Only relevant ESI required for
hold is automatically collected
to central hold repository
• Further cull and deduplicate
prior to export of fully
processed ESI
• Remote collection from
disparate enterprise data
sources - including ECM Suite
Copyright © Open Text
Corporation. All rights
reserved.
Desktops
ECA
Email
EESSuite
File Servers
EES Suite
Any Review Platform
Slide 28
Communication and Outreach
• Shared vision and goals up, down, and across
the organization
• Bureau/Office Records Officers Work Group
• Records Officer Task Force with leadership role
• Staff dedicated to supporting the effort with
the client
29
Thank you
John Montel
eRecords Service Manager
Service Planning and Management
Department of the Interior
Office of the Chief Information Officer
1849 C. Street, N.W.
Room 7444
Washington, DC. 20040
T. (202) 208-3939
C. (202) 604-1149
F. (202) 501-2360
E. john_montel@ios.doi.gov
Carrie Mallen
eDiscovery SME
IQ Business Group
Prime for eEDRMS
Department of Interior
Room 2012
Washington, DC 20040
C. 415 577-3982
E. cmallen@iqgroup.com
30