Optical Character Recognition for Logistics Reporting

advertisement
Optical Character Recognition
for Logistics Reporting
Contributors: Joy Kamunyori, Mike Frost, Ashraf Islam
A recording of the WebEx session can be found here:
https://jsi.webex.com/jsi/lsr.php?AT=pb&SP=MC&rID=75382
732&rKey=f3bc9ca3232b8b42
Testing Methodology
Select Tools
Collect Forms
Perform &
Document
OCR Tools
•
•
•
•
•
OmniPage Professional 18 (desktop-based, licensed)
Abbyy FineReader 11 (desktop-based, licensed)
Tesseract-OCR (desktop-based, open-source)
Evernote (mobile phone–based, free)
Captricity (web-based, paid)
Testing Protocol
1. Pass field-filled logistics management information
system (LMIS) form through application
2. Fill out blank LMIS form carefully and pass through
3. Record number of correctly vs. incorrectly identified
fields (numeric)
4. Calculate character recognition accuracy rates.
Form 1: Tanzania Essential Medicines R&R
Form 2: Tanzania Essential Medicines
Supplementary Form
Form 3: Zimbabwe ARV R&R
OmniPage Professional 18
• Licensed tool—$499.99
• General impressions:
– Easy to use after initial orientation
– Fast processing (less than 1 minute)
– Can verify/validate recognized text
Interface
Output
Wizara ya Afya na Ustawi wa Jam ii Integrated Logistics System
FOMU 2C: FOMU TUPU YA TAAR I FA NA MAOMBI (R&R) TA DAWA/VIFA A VYA TIBA VYA ZIA DA N° 388151
MALI YA ZIADA
Namba ya MSD
Mali
Kip
Wadi ya Kilichopokele
UPotevu,
we kipindi
Wadihiki
ya Makadirlo .ya Kia51! Kiasi
,
Kilichoagi
' I'm cha Ugavi
Kuanzla (B)
Marekebisho
Mwisho Matumizi Kinacholutajika
(G)
(A)
(C)
(D)
[A+B+C.D] (E.3)x7-D]
1E)
(F)
10 l0 ‘1)
CD(ACt tA -t.DC.
A., 9C C)
e)
O
a ii.
s--G,
3
A-bat-kl..-7
10101 D 3,s"`2„tou_CAC..)
IA -Q4Pc- OCx-k--i L...b
0C
c
2 kl6-6)
C93"1:''
V triffrivcpi
0
0
0
0
0 kr.C1
q2::
go
g . c 0 .y...,,, I, Lz.1 .
to io to
^../1.1 IN ryll
& p_S lel ICD
CLANkl_r-30
0C
24 S—LA
0.3
ViID to L 0 ._62
C_./....L 4.krv.
CD t. r- c
0
0
0
0\
g6"
z
L:7^ ^_c,r...3
lootO
\_)3I t VI. f.)
C
c
C
c)
iq
s („)
3
ha_
_r-tt-:%,t,
,
e,-co„),:2_,_ks,_,,--4:
to 01,, v.,
t-__Ty,k Lu.st (A-) C
0 C.)
0 1 g,
`7 („ .____,
Bei
(H)
Gharama (GxH)
)(Iasi
Gharama
(i)
Kilicho iltyo
idhinishwaidhinishwa
(J)
(K)
a-A ;71C.AD
.
.
Rr c:ND r)
9 oo 0
.
.,...,..2)
Zii IDOtt
20 t Deo '3)
4-,,Dot
;
/}
k-i-I C'o •
4
_...,
Zahanati au kituo cha afya kutuma kwa DMO nakala ya juu na ya kali. Tunza nakala ya chini. Vihlaya kutuma MSD nakala
JumlaGharama:
ya juu. Tunza na nakala
Jumla Wyo.
ya kati. Tupa nakala
:'.... --`,:.ya chini
13 g2CADidhinishwa:
Hospitali kutuma MSO nakala ya 1:111 Tunza nakala yake
OmniPage Professional 18
Accuracy rates (numerical fields):
• Forms filled out in the field:
– TZ essential medicines: 13%
– TZ supplementary form: 21%
• Forms filled out by tester:
– TZ essential medicines: 53%
– TZ supplementary form: 76%
Abbyy FineReader 11
• Licensed tool—$169.99
• General impressions:
– Easy to use after initial orientation—
harder to learn to use than OmniPage
– Fast processing (1–3 mins)
– Can verify/validate recognized text
Interface
Output
Wizara ya Afya na Ustawi wa JamiiIntegrated Logistics System
FOMU 2C: FOMU TUPU YATAARIFA NA MAOMBI (R&R) YA DAWA/VlFAA VYA TlBA VYA ZlADAN° 088151
Zahanan aj kituo cha afya kutuma kwa Df/O naka*a ya juu na ya kali Tunza nakala ya chini. Wtlaya kutima MSD nakala ya juu. Tunza na nakala ya kali Hospitali kut
Tupa nakaia ya chini
MALI YAZIADA
Mamba Mali
Kipimoch Idadi ya Kilichopo Upotevu/ Idadi ya Makadiri Kiasi
Kiasi
Bei
Gharama Kiasi
Gharama
CouQv\icvoro
OcV^
2Jo
/Vt^uuT
0o
6£
o
£Tk
s>
SH/ioo
•
IDIDID
Cou-Ct v\ D
ft
D
o
o
Q^
3
33>rtoo
3
V
iTfvTvwi
laowr
b
<?o
B
o
o
O
O
o
toe.
22)?»
2>
^OGO
0.6
vj Cv oa\
(O
Pi CDmu^Se
.ojmo
^
D
o
o
o
o
2 if
Sb.
3
Sfr.OOt ■6
CAl-Av^v O
o
o
o
o
3>
2adOC
.3
UjtoiO
PlCKb
. c?
a
o
o
o
3><^
3
2^001:
'j
tDtolo
^fecbo^- ro
o
o
o
\ Q.
3> ^
3
^&oo
3
•
Jumla
Gharama:
13^,20
Jumla
iliyo^dhinish
wa:
*•
Abbyy FineReader 11
Accuracy rates (numerical fields):
• Forms filled out in the field:
– TZ essential medicines: 10%
– TZ supplementary form: 10%
• Forms filled out by tester:
– TZ essential medicines: 39%
– TZ supplementary form: 43%.
Tesseract-OCR
• Open-source tool
• General impressions:
–
–
–
–
Does not have a graphical user interface
Is a command line tool—needs to be run from command line
Difficult for users who do not know command line use
Requires input file in image format (i.e., .png, .jpg)
Tesseract-OCR
• In the example below, we ran Tesseract with a
scanned image file and an output file to hold the
recognized text:
Interface
Program install location
Output text file name
Program name
Scanned image
Source File
Output
Evernote
• Can send pictures of documents
• Not useful for character recognition or data entry
• Allows tagging on the image, e.g., district/facility
Captricity
• Web-based, paid service
• Offers several tiers of pricing:
– “Pay as you go”—$0.01 per field
– Discounts as number of fields increase
– “Premier” tier—$335/month for 50,000 fields
• $0.0067 per field
– “Enterprise” tier—custom tier, depending on volume
• provides dedicated account manager and support
• volume discounts.
Captricity
Process:
1.User creates template for form
2.System creates digital fingerprint from template
3.Compares uploaded form to digital fingerprint
– Fixes skews, or flips form, if needed
4.Does human validation field-by-field
– never see the entire form
– preserves privacy
5.Output in .csv file.
Captricity
General impressions:
• Initially, time intensive
– must separate forms into single files, per page
– must set up templates for each page, e.g., one page form
took 10 minutes to create
• Requires Internet connection
• Approximately 24-hour turnaround for first time
– turnaround time is reduced after first processing.
Interface
Output
Captricity:
Accuracy rates (numerical fields)
• Forms filled out in the field:
– TZ essential medicines: 65%
– TZ supplementary form: 99%
– Zim antiretrovirals: 52%
• Forms filled out by tester:
– TZ essential medicines: 98%
– TZ supplementary form: 100%
– Zim antiretrovirals: 98%
Research conclusion: Captricity looks most
promising
Digging deeper…
Captricity Positives
• Shows best results
– Validation of output is critical
• Fast turnaround time
• Digitization is accurate
– data entry staff did not introduce new errors
• Cloud storage can store data indefinitely
• Output in .csv format (readable by a database).
Captricity Negatives
• Requires Internet connection; must be used at higher
levels of supply chain
• Set up is time-intensive; must—
– split up forms
– create templates
– rotate to landscape
• Validation/reconciliation can be time consuming
• Cost can be high, but discounts available for high
volume
– Cheaper than hiring data entry clerks?
Use Cases for LMIS Reporting
Using Captricity
Use Case 1
Central database
District: Upload and
verify
SDP/CHW: Send
paper report
Use Case 2
Central database
District: Upload
and verify
SDP/CHW: Take photo
of form
Use Case 3
Central: Upload
and verify
District:
Aggregate
reports
SDP/CHW: Send
paper report
Central database
Thank You! Questions?
Download