Content Management:
The Puzzle, The Challenge, and
The Opportunity
Shu-Shang Sam Wei, Ph.D.
Software Architect
EMC Documentum Content Management Offerings
© 2005 EMC Corporation. All rights reserved.
Google as an example
Enterprise Content Management
2
Yahoo! as another example
Splits:02-Sep-97 [3:2], 03-Aug-98 [2:1], 08-Feb-99 [2:1], 14-Feb-00 [2:1],
12-May-04 [2:1]
Enterprise Content Management
3
Baidu for another Example
Enterprise Content Management
4
What Does it Tell Us
• There is a strong desire/demand to search on the Web
• We are in an Information Explosion Age
- Number of emails (SPAM excluded) sent every day in North America
tripled to 11.9 billion since 1999 (Wall Street Journal, 8/26/2004)
- Google is doing 2 billion searches a month
- Yahoo! generates 10 terabyte data a day (The Library of Congress)
- eBay hosts 1.4 billion auctions and 16 million active auctions at any
moment of time
• Internet has made the search significantly easy/efficient
- Scott McNearly (CEO, Sun Microsystems) joked:
“Google has become one of the most important tools IT has ever
deployed on the corporate system”
Enterprise Content Management
5
What Does it Tell Us (Cont.)
• Information exists in many different forms (and places):
•
•
•
email/IM, video, audio, database, Blog, Web pages etc.
Unstructured data (content based) is becoming more important
than structured data (number based)
- 70 ~ 90% of corporation data are unstructured
Unstructured data impose more challenge on management
Enterprise content management (ECM)
- not confined in organizing data,
- involves exploiting business know-how
• to avoid critical failures,
• to operate more efficiently and
• to become more productive and profitable
Enterprise Content Management
6
The Puzzle of ECM
•
•
•
•
•
•
•
•
•
•
•
Search
Knowledge Management
Document Management
Lifecycle Management
Web Content Management
Collaboration
Portals
Digital Asset Management
Email Management
….
The list is still growing
Enterprise Content Management
7
Search
• More than half of professional people spend more than 2
hours/day searching for info for their jobs
• Software created in late 1970s and early 1980s could search
millions of documents, primarily for education, medical
research, and large legal cases
• In late 1980s, search extends to Web. Internet becomes a
popular place for sharing info
• Search tool can be confusing if it returns tons of pages for you
to choose
• Basic search features: full text search, Boolean expression,
wildcarding, proximity, parametric search, thesauri, synonyms,
relevant order
Enterprise Content Management
8
Search (cont)
• Advance searches
- Adjustable ranking
- Hyperlink ranking (Google’s engine)
- Hit highlighting
- Auto summary
- User behavior learning
- Natural language queries
- Dynamic clustering of results
- Concept mining and extraction
- Federated search
- Auto classification based on taxonomies
- Taxonomy navigation
Enterprise Content Management
9
Knowledge Management
• Poorly managed knowledge costs Fortune 500 about $12
billion/year (IDC, Business 2.0, February 2002)
• Knowledge is applying information to resolve a problem
- Information must be organized and filtered
- Layer of intelligence gathering info about info
- Knowledge is context aware
- Authoritative, hierarchical taxonomies and thesauri greatly
improve info access for decision making and innovation
• Knowledge management is about the application of knowledge
• An effective KM system should reduce the impact on
established routines and extend existing enterprise applications
Enterprise Content Management
10
Knowledge Management (cont)
• Knowledge management system provides a community of
practice for people to share their knowledge
• The cycle of knowledge management
Find/Create
Reuse
Organize
Share
Enterprise Content Management
11
Document Management
• Emerged in 1980s to help airline, pharmaceutical and financial industries
handle paper-based processes that drive their business
• To comply with stringent government regulations (FDA in pharmaceutical,
FAA in Airlines)
•
•
•
•
•
•
•
•
Document capturing/imaging, dissemination, and annotation
Version control
Compound document
Document renditions
COLD (Computer Output to Laser Disk) and Archiving
Security and permissions control
Audit trails
Library services
Enterprise Content Management
12
Lifecycle Management
• Information carry different meaning to content over time
• Typical cycle
- Creation
- Processing
- Retention and archiving
- Disposition
• Active processing
- Redaction, review and markup
- Electronic (password based) and digital (PKI + encryption) signing
- Classification and taxonomies
- Compound document assembly
- Publication
Enterprise Content Management
13
Lifecycle Management (cont)
• Retention, archiving and disposition
- Storage management
• Migrating inactive contents to low cost system
- Archiving
• Indexed and accessible manner, or
• Secured and easily restored upon request
- Record management
•
•
•
•
•
Based on U.S. DOD 5015.2 certification standard
E-mail included
Manage retention policies
Create “holds” on content
Keep audit trail on all actions
Enterprise Content Management
14
Web Content Management
•
•
•
•
•
•
•
•
Internet becomes an important place for business
Information posted on web needs to be current up to minute
Automation is essential due to the complexity
Web content: static or dynamic, structured or unstructured
Web content editing
Use templates and style sheets to separate content from layout
Support distributed team-based collaboration
Internationalization support
Enterprise Content Management
15
Collaboration
• Link processes and people to create a combined work
•
•
environment where ideas and knowledge are shared to
accomplish a project
Tools used
- E-mail/IM
- Application sharing
- Web conferencing (meeting, whiteboard, poll, chat)
- Intranets/extranets
- Groupware (eRoom)
- Repositories
Future tool will seamlessly connect content, people and
processes between back/front office
Enterprise Content Management
16
Portals
• Provide Web browser a single point access to corporate info
• Portlets (widgets, gadgets) are connector programs to present
info from another application or information source
• Allow personalization
• Support customizable search, navigation and access to contents
• Hosting services
- ASP rent the software and charged by use
- Backup and maintenance done by ASP
Enterprise Content Management
17
Digital Asset Management
• Rich media is defined as images, audio, video and other
visually oriented unstructured content (like animation and
presentations)
• Managing rich media becomes crucial due to broadband
support and technology enhancement
•
•
•
•
It’s a challenge moving large digital media files
Need to consider the rights and licensing permissions
Meta-data is extensively used for managing the content
Online education is a good example
Enterprise Content Management
18
Email Management
• Email has become a pervasive communication tool in corporate
• An employee receives around 70 emails a day in average
• Messaging technology includes fax, voice, IM and virtual
meeting services
• Messaging system is the largest content repository
• It can store up to terabytes of data which is a challenge to
manage
- Support audit trail
- Integrated with Records Management
- Provide legal compliance
Enterprise Content Management
19
Business Process Management
• A shorter business process cycle can reduce operational cost, increase
profits and meet customer demands
• BPM describes how people interact with technology added to automate
processes, information and each other to get jobs done
• BPM enables organizations to leverage and extend their existing
technologies to support the processes driving the success of business
• Workflow is the combination of tasks that define a process
• Web-based open standards (XML, SOAP, or WSDL into process
management) allow new standard of application integration and sharing
real-time info that drives the daily operations
• Organizations can use BMP to build processes that adapt to new market
conditions
• BPM allows processes to be modeled, refined and modified as needed
Enterprise Content Management
20
How They Work Together
Structured
Document
BPM
Management
DAM
Imaging
Archive
Workflow
Web Content
Management
Records
Management
Portals
Unstructured
Projects
Groupware
Web
Conferencing
Classifications
Knowledge
Management
IM
Search
Email
People to People
People to Information
Enterprise Content Management
21
Collaboration and Content
• Link processes and individuals across the enterprise
• Create a work environment where teams can share and
circulate ideas, experience and knowledge
• All the information created as a by-product of collaborative
work are securely captured, managed, and transformed into
invaluable corporate knowledge
• These knowledge assets are preserved in a repository as
contents for shared and reused through an organization
• Collaboration and content are interconnected by process
Enterprise Content Management
22
The Role of Collaboration
Structured
Document
Management
DAM
Imaging
Archive
Web Content
Management
Records
Management
Collaboration
Portals
Unstructured
Classifications
Knowledge
Management
Search
People to People
People to Information
Enterprise Content Management
23
Structured
Collaboration, Content and Process
Process
Content
Unstructured
Collaboration
People to People
People to Information
Enterprise Content Management
24
ECM Services Architecture
Users
Exec
Solutions
Sales
Research
ERP
Mobile
Email
Services
Web
Desktop
Portal
Intranet
Client
Server
Collaboration
Repositories
Admin
Dedicated
Embedded
ServiceOriented
Architecture
Production
ECM
ERP
Content
Email
Enterprise Content Management
Storage
Device
Web
Content
25
A Loan Management Example
Enterprise Content Management
26
The Challenge
• Additional Enterprise Requirements
- Close to constant respond time regardless of info amount
• Ingestion rate 25M files per day
• Classification with content analysis 0.25M files per day
• Classification without content analysis 2.5M files per day
- System requires being available 99.999% of the time
• Less than 5.256 minutes down time in a regular year
• Automatic crash/disaster recovery
-
Real-time info even for decision support system
Allow easy customization
Easy administration
Provide a unified client interface
Enterprise Content Management
27
Response from Software Vendors
• Database and Content Management Companies
- Data Partition
- Real Application Clusters (Oracle)
- Cache Fusion (Oracle)
- Grid Computing (Oracle)
- Pluggable Components
- Self-tuning/healing
- Data warehouse
• Traditional offline database doesn’t work well
• Materialized views, In-memory database, Bitmap Indexes,
Bitmap Join Indexes, clustering, multi-table inserts
- Online Backup and Recovery
- Distributed databases and (hot) replication
Enterprise Content Management
28
Response from Software Vendors (cont)
• Fulltext Companies
- Collections Partition
- Better indexing mechanism for meta-data and content
- Better taxonomy support
• Language Support
- Object-Oriented Programming (C++, Java, C#)
- Agile/Aspect Programming
- Dynamic Class Loaders
- Service Oriented Architecture
Enterprise Content Management
29
Response from Hardware Vendors
• AMD, Intel and Apple
- Dual processor
- 64-bit PC
- Dual-core (Athlon 64 x 2, Pentium 4D, Power PC G5)
- Quad-core (Opteron 2006, Power Mac G5 Quad)
• Sun offers 8-core chip, UltraSPARC T1, end of 2005
- Each core runs up to four instruction threads
- Address energy consumption issue by using only 70 watts
- Cheaper and faster than IBM mainframe
Enterprise Content Management
30
The Opportunity
BY 2009, worldwide new ECM software license revenue
will reach $2.0B up from $1.2B in 2004 with a 10.6%
CAGR
In Thousands
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
2002
2003
2004
2005
2006
Enterprise Content Management
2007
2008
2009
31
Big Players Attracted to the Market
• Oracle
- 10g advertising completely aimed at EMC/Documentum
- Large developer community established
• Could turn into RSI strategy
- Focusing on search
Enterprise Content Management
32
Yet Another Big Player
• Microsoft
- Strategically, still thinking
about mindshare for
applications
• Office 12 aimed at EMC,
but will lack infrastructure
support services (ala CSS)
• Integrated interface and
server offerings will mean
increased ubiquity of
deployment (land grab)
- Still missing the ILM
aspects, however
Microsoft Stakes Out the Middle
By Carolyn A. April, VARBusiness
Tue. Sep. 27, 2005
From the October 03, 2005 VARBusiness
…Microsoft has been methodically crafting
its answer to midmarket IT challenges. Its
approach? To create single products that
combine the company's ERP and CRM
applications, Microsoft Office – as a frontend interface and interface and server
offerings into integrated, out-of-the-box
solutions.
Enterprise Content Management
33
Competition from Open Source
• Somewhere, someone is developing an open source CMS
• Analysts telling VC’s, customers to:
- “… demand that even proprietary vendors have strategies to
compete with open source”
• Documentum should have a field response to open source
- Options:
• prepare standard response for sales reps
• Acquire a standalone CMS system and open it up; sell service /
support
• Migrate parts of Content Server to open source
Enterprise Content Management
34
Where Is EMC Positioned?
• Acquired Documentum in 2003
- The leader in ECM
• Q3 revenue was $2.37 billion ($1 billion in software)
- Up 17% from a year ago
- 9th consecutive quarter of double-digit growth
- 12th quarter in a row met or exceeded own targets
- Net income was up
• 93% on a year-to-year basis including a tax-related benefit
• 45% without including that benefit.
- The best performance among any IT company in the world.
Enterprise Content Management
35
Gartner 2005 Report on ECM
American Cherokee Strip Land Run, September 16, 1893
Enterprise Content Management
36
The Trend of Computing
Users/
Clients
Networks
Servers
Databases
Storage
Devices
Enterprise Content Management
37
The Trend on Storage Device
• Storage Area Network (SAN)
- High-speed special-purpose network
- Interconnects different kinds of data storage devices
- Associated data servers on behalf of a larger network of
-
users.
Support
• Disk mirroring, backup and restore, archival and retrieval of
archived data
• Data migration from one storage device to another
• Sharing of data among different servers in a network.
Enterprise Content Management
38
The Trend on Storage Device (cont)
• Network Attached Storage (NAS)
- Hard disk storage that is set up with its own network
address
- Not attached to the department computer that is serving
applications to a network's workstation users.
- By removing storage access and its management from the
department server, both application programming and files
can be served faster because they are not competing for the
same processor resources.
Enterprise Content Management
39
Researches on (NAS and SAN)
• Active Storage
- Provide a mechanism for service migration
• focus on limited application such as image processing, data mining
and other database related tasks
- Exploit the processing power in storage device
• Acharya etc. proposed a stream-based programming model (1998)
• Xiaonan etc. proposed a Multi-View Storage System (MVSS) with a
flexible interface (2001 ~ 2003)
• Evan etc. proposed a parallel file systems (2005)
• Sivathanu etc. introduced an RPC-based framework (2002)
• Amiri etc. dynamically partitions application and change function
placement within a cluster due to the load characteristics (2000)
• Object-based Storage
- Object-based Storage Device (OSD) T10 protocol
- Make use of an intelligent object interface
Enterprise Content Management
40
Conclusion
• Lots of opportunities are still there for academy and industry
- Better Algorithms
•
•
•
•
Performance
Scalability
Reliability
Automatic Failover
- Better Programming Models
- Better Problem Modeling Mechanism
- Parallelism Needs Finer Granularity
• Changes are a must for survival and success
• Big players have a better chance to win
Enterprise Content Management
41