The evolution of data capture in census projects

advertisement
Emerging technologies
2010 Censuses Challenges
Shoshani Eli
Managing Director Asia Pacific
UN Workshop Thailand 2008
Agenda





Introduction
Who we are?
Data capture methods
Eflow Platform
Summery
2
“Counted” by eFLOW world wide
1,374,026,304
3
TIS’s Experience in Census Projects
India 2001
Turkey 1997
Brazil 2000
South Africa 2001
Ireland 2001
Germany (DP) 1999
Cyprus 2002
Turkey 2000
Kenya 2001
Slovak Republic 2001
Hong Kong 2001
Italy 2002
Slovak Republic 2006
Hong Kong 2006
South Africa Pilot 2007
Ireland 2006
Largest market share worldwide in
census projects information capture
4
2008 won
Belarus
Argentina
Thailand
5
Overview - Top Image Systems
 Founded 1991
 Data Extraction and Workflow solutions. Specialized in
Censuses Project
 Since 1996, traded on NASDAQ (TISA)
 ~250 employees
Local Offices in the Region:
Asia
Shanghai, Japan, Singapore, Hong Kong,
Guangzhou (R&D) and Australia
Europe
America’s
United Kingdom, Germany,
Italy, Spain, France, Benelux
Boston, Rio De Jenero
 Present in app. 40 countries
 Strong partner network worldwide
 Around 800 installed systems worldwide
The evolution of data capture in
census projects
From OCR into IDR Solution
Key From
Paper
OMR
8
Key From
Image
Automated
Data
Capture
eFLOW
Intelligent
Data
Capture
The evolution of data capture in
census projects
Manual data entry (key from paper)
Slow
High error rate in the data entry process
Recruitment, training and management of personnel
key from Image:
Archive
Approx 30-40% faster than key from paper
9
Key From
Paper
Key From
Image
The evolution of data capture in
census projects
OMR (hardware readers for checkbox)
OMR
–
Requires specially printed forms and special scanners
–
Cannot handle handwritten/printed data
–
Forms are not user-friendly
–
OMR requires more answers => more space => increased paper
expenditures => more handling and printing costs
–
Not flexible, difficult to adjust to other applications once census is over
–
No possibility to add business rules: computation, validations, coding
10
TIS’s Experience in Census Projects
India 2001
Turkey 1997
Brazil 2000
South Africa 2001
Ireland 2001
Germany (DP) 1999
Cyprus 2002
Turkey 2000
Kenya 2001
Slovak Republic 2001
Hong Kong 2001
Italy 2002
Slovak Republic 2006
Hong Kong 2006
South Africa Pilot 2007
Ireland 2006
Largest market share worldwide in
census projects information capture
11
The evolution of data capture in census
projects
Automated data capture
– Requires less human intervention, enables to complete the census
data capture much faster (less space, less salaries, less hardware)
Automated
Data
Capture
– Ensures data integrity – enables the use of automatic AND manual:
online validations, exception handling, coding
– The most advanced and proven technology for Censuses,
recommended by the UN and used by all modern countries for
census projects
– Full flexibility in the type of data gathered (checkbox, handwritten,
alpha and numeric, barcode…)
– Provides all capabilities of the OMR and plus much more
– Creates a correlation between the image and the actual form
– Remote capabilities enable all forms to be scanned locally and
then sent to a central site for processing
12
eFLOW
The evolution of data capture in census
projects
Intelligent data capture platform
by using OCR/ICR/barcode/PDA/Web/email:
– Automated data capture +
– Smart - automatic classification for documents
 Smart understands and differentiates between various
types of documents and languages and Based on state-ofthe-art Machine Learning algorithms
– Freedom
 Artificial intelligence algorithms which provides enough
information for the system to find the location of the
fields on its own
13
Intelligent
Data
Capture
Unified content Platform
Census Data base
Suggest14 a Single platform for all enterprise content
India 2001
Turkey 1997
Brazil 2000
South Africa 2001
Ireland 2001
Germany (DP) 1999
Cyprus 2002
Turkey 2000
Kenya 2001
Slovak Republic 2001
Hong Kong 2001
Italy 2002
Slovenia 2006
Hong Kong 2006
South Africa Pilot 2007
Ireland 2006
15
Lessons learned
The customer says it best…
Saving of 25%
Saving of 12%
(Source: CSO – Central Statistic Office Ireland)
16
The customer says it best…
(Source: CSO – Central Statistic Office Ireland)
17
The customer says it best…
Benefits of the eFlow Technology
(Source: CSO – Central Statistic Office Ireland)
18
First, several general lessons…

Invest in creating the right application for the project
– System Design





High level business process
Functional design
Technical/Detailed design
Code Guidelines conventions
Technical DR, with the R&D
– Development
 Project DR
 Code review
–
Budget control
–
Bi-weekly reports
–
…
19
First, several general lessons…
 Spend time on getting the form right
– Meet organization standards
– Form Design
 Prepare and optimize with a pilot
 Training & support
20
Indian Census 2001
TIS partners with CMC, Indian governmental agency with years of
experience and offices all over India.
Form Processing Technology:
Around 500 million A3 images
More than 2 million enumerators
The technology was implemented at 15 processing centers at
major state capitals
Data was captured using only 25 high-end Kodak 7520DS
Scanners
16 languages
The advanced technology in 2001 – eFLOW ver.1.0
Two phases
21
present new advanced
technologies to meet 2010 census
challenges
eFLOW 5.0 – Next Generation…
22
Main improvements in eFLOW to meet
Census Challenges
 Architectural changes
 Core changes
 Recognition technologies
 Modules
 Features
23
eFLOW Architectural Improvements
 Core redesigned, built in .NET technology
– Microsoft .NET is the Microsoft strategy for connecting
systems, information, and devices through Web services so
people can collaborate and communicate more effectively
 Customization by .NET Embedded
– Speeds up Runtime – X200 faster
 Custom Code now part of CAB
– no need to manage DLLs separately
 Debug inside eFLOW
– No need to install development environment
24
.Net allows an Object Oriented design
approach
House
Batch
Batch
25
Person
eFLOW Architectural Improvements
 Improved flexibility
– Multiple active applications on the same server
(run phases in parallel)
 balance workload and personnel
 Ensuring on going work of all team
members
– Multiple sites
– Support of multiple servers and cluster
26
New eFLOW Architecture - Sites
FormID
Export
27
Monitoring
and Management
28
Architectural Improvements
(cont.)
 Easier management of application:
– Control all stations from any location
 Automatic stations similar to Windows
Services
– Remote activation of stations, no need physically
access server room
– Restart/Start/Control of stations from a centralized
place (remotely) using eFLOW Controller and
Enterprise manager
29
Controller
30
Architectural Improvements
 Handling Huge batches:
– Ability to handle huge batches of 300-3000 pages each
– Ability to process lots of batches in parallel
– A stable, robust platform
(Pic from eFLOW’s
performance test)
31
Architectural Changes
(cont.)
 Load balancing
– Load balancing between stations (get notifications
automatically and better allocation of employees)
– Automatic load balancing according to the
numbers of batches in a queue
– Priority handling - Using the eFLOW capabilities for
automatic prioritization by code (for example
according to county, region etc)
32
Architectural Changes
Improved security mechanism
33
(cont.)
Advanced approaches
 Automatic EFI Matching
– Improving template
recognition station speed
via the “Force EFI”
mechanism, a unique
barcode posted on each
page
34
Advanced approaches
(cont.)
 Auto Coding
– Coding tasks and data validations performed on the
data capture platform: a ‘cost-effective’ solution
– Use one of the statistic software's in the market like
ACTR (Canadian statistical software for coding some
fields)
– Use Approximate Search tools for improving results via
DB (Exorbyte)
35
Advanced approaches
(cont.)
 Dynamic Dictionary update
– Lookup and dictionaries via DB (and not txt files)
 Export
– Reconstruct the original form according to the
template
36
Advanced features (cont.)

Splitting & Merging - Using the build in eFLOW4
splitting/merging mechanism

Handling Problematic batches by Improved Split/Merge
abilities
– Taking out physically bad pages (or bad household) and
continue to work with the rest of the batch
– Split/Merge automatically without the need to build a specific
station for merging of data

Additional powerful interfaces exposed in the CSM for faster
development time
– Priority (for example according to county, region etc)
– Load balancing between stations (get notifications automatically
and better allocation of employees)
37
Modules

Statistical report
–
Statistical report to monitor the daily,
weekly, monthly rate per user/station
– Quality checking using

Licenses
–
Flexible licenses policy
 Per station
 Per number of pages processed
38
Statistic Reporter (e.g Crystal Reports)
39
Recognition technologies OCR/ICR Engines
RICOH
(Japanese)
LIGATURE
PENPOWER
(Chinese)
JUSTICR
OCE
ABBYY
KADMOS
EXPERVISION
INLITE
A2IA
OMNIPAGE
TIS
NESTOR
40
Custom stations approach
41
eFLOW Receives Everything




42
Mobile Devices
MNIC
Web Completion
Remote scanning
Web Completion
eFLOW 4.x
Web Completion
Employees
LAN
Active Directory
eFLOW server
eFLOW Thick
Clients
DMZ Network
Segment
eFLOW Web Completion Server
Internet Backbone
ng
rci es
u
tso ye
Ou mplo force
e
c
e e ork
rvi
Se H o m n e w
d
i
are m Onl
Sh k Fro
r
Wo
Web Browser
44
Ex
in terna
t
(Ve he b l par
nd usin ties
ors
e
/C ss p invol
us
tom roce ved
ers ss
/ Pa
rtn
e
rs)
Web Browser
Summery

Data capture and IDR platform (paper, electronic, mobile) and not a
recognition product

Proven solution in census data capture! no need to invest time and
money in new technology and vendor, minimizing the risk

Extensive experience in the design, development and implementation of real
census and other high volume form processing projects. Largest market share
worldwide in the processing of census projects,

Huge experience based on long researches for the special needs of the Indian
Census.

Maximum flexibility, redundancy and robust platform ensuring you meet
project timetable to release census results.
45
India 2001
Turkey 1997
Brazil 2000
South Africa 2001
Ireland 2001
Germany (DP) 1999
Cyprus 2002
Turkey 2000
Kenya 2001
Slovak Republic 2001
Hong Kong 2001
Italy 2002
Slovenia 2006
Hong Kong 2006
South Africa Pilot 2007
Ireland 462006
Summery
Thank you
Download