Million Books to the Web

advertisement
Million Books to the Web
An Example of Indo-US Collaboration
Lessons Learnt & The Road Ahead
Prof N. Balakrishnan
Supercomputer Education and Research Centre
Indian Institute of Science
Bangalore India
School of Computer Science
Carnegie Mellon University
Pittsburgh USA
Indo-US Workshop on Open Digital Libraries & Interoprability
Washington, DC
June 23, 2003
Lessons from the past
• fires of Alexandria
– irrevocably severed our access to any of the works of the ancients.
• introduction of printing technology
– several Indian and Chinese knowledge disseminated by word of mouth
and on palm leaves virtually disappear or inaccessible
• New cultural revolutions
– edifices built by destroying the past irrevocably
– later revolutions seek solace in attempting to preserve what was
destroyed
– we need to preserve our heritage independent of the political and
social ups and downs
A single wanton act of destruction
can destroy an entire line of heritage
Lessons from Reality
In a thousand years:
only a few of the paper documents we have today will survive
the ravages of deterioration, loss, and outright destruction.
Existing archives of paper many other works still in existence
today are rare
- only accessible to a small population of scholars and
collectors at specific geographic locations
Contrary to the popular beliefs, the libraries, museums, and
publishers do not routinely maintain broadly comprehensive
archives of the considered works of man
No one can afford to do this, unless the archive is digital
The Approach
• Technology Driven Vision
• Decide on the stake holders
– Never make it exclusive
• Pilot Projects to perfect technology
• Bring in advanced management concepts
– like People Maturity Models
– Quality assurance
– automate wherever possible
Continued…
The Approach
• Lessons from the past
– Too many Digital Library Projects
– with half-life of less than 2 years from the date of
“Launch” or a long incubation time
– Follow Nike – JUST DO IT
• Digital Library must have two ingredients
– A knowledge Amplifier
– Free-access, giving avenues for every one to make
economic benefit
• still contribute to multiplication of knowledge by circulation
• In India, it should be a test bed for our
Language Technology Research
– a show case for our heritage
Elements of Technology
• Microprocessors
• Memory
• Connectivity
• Software
All these technologies are growing
exponentially
Communication Revolution
If you are amazed at the drop in cost of computing,
wait till you see what is going to happen to bandwidth.
Network technology will increase 10-100 times faster
than processor technology
-Andy Grove, Titan of Intel
Bandwidth will double every year
Network speeds become comparable to
interconnect speeds
Together, the technology of Computers
and Communications Revolutions aim at
Death of Time and Distance
Anytime, Anyplace and Anyone
The World of Computers &
Communication
 Small fish eat the Big Fish
 Microprocessors offer performances comparable to
supercomputers; Paradigm Shift from Dinosaurs to
mammals- from performance to functionality
 NETWORK is everywhere
 Web is a preferred medium of communication for
everyone - including the military & the terrorists
 Companies that make more and more Software Free
– capitalize more- Open archives
Processor of Tomorrow
• Carbon Nano Tubes
– 5 to 10 atoms wide
– promise to replace silicon soon
• Flexible Transistors
– made from plastic, oraganic materials
• Silicon will live for 15 years
• Moore’s law will live longer
• 1000 times growth in 10 years
The winner will be decided by:
Material Convergence + Human Like interactions
Processor of Tomorrow
• A billion Transistors at 10 to 20 GHz Clock
rates by 2010
• 128 G Bytes of Main Memory
• Terra byte of Disk Storage- may be
Holographic
• Speech input/ output ASR
• Multiligual
• Terrabit connectivity at PC
• The DL plans of today must be sensitive to
this
The Road Ahead
Bill Joy’s
Nightmare
Brilliant
SuperHumans
Emulating Human
Performance:
See, Hear, Talk, and “Think”
Rich
Knowledge
Content
Expert Systems
Medium
Data Analysis
Poor
Scientific
Calculations
Evolution
The future trends:
• Browser will be the only medium of
communication.
• It will be active- with voice and video, language
independent.
• Mobility will be the key.
• Small form factor devices such as Palms, PDAs
and Tablets would be the future.
• We would soon see TVPCT at the cost of a TV
• We will witness major convergence between ICT,
Nano Technologies and Biological Sciences
Electronic Resources and the
Library of the Future
E-mags; E-books; E-music;
E-Movies
Dedicated E-book Readers
• Dedicated readers – about
20,000
• Palm devices – 6,000,000
• PC’s – hundreds of millions
• “For people accustomed to
reading text on a computer
for hours at a time, e-book
screen clarity is a nonissue.”
• A low cost E-Book reader
design on in India
http://www.eink.com/technology/index.htm
• E Ink is made up of millions of microcapsules
– each the diameter of a human hair
• Each microcapsule contains
– positively charged white particles &
– negatively charged black particles
• that float in a clear fluid
• A film of transistors supplies the voltage to the
capsules
• A negative charge makes the white particles
move to the top of the microcapsule
– an opposite electric field pulls the black particles to
the bottom of the microcapsules, mimicking the
effect of print.
• Electronic ink is a real power miser
E-ink/e-paper (Lucent)
The technology has been identified and
development is well under way
By the year 2003, we envision electronic
books
• that can display volumes of information
as easily as flipping a page,
• permanent newspapers that update
themselves daily via wireless broadcast
• Just as today's books give people easy
access to everyday information,
tomorrow's books will provide the
same easy access to the dynamic data of
the information age
The world of publishing will never be the same
Indian Institute of Science’s
Simputer
•
•
•
•
A hand held Linux Box at around US$ 200
Has the state of the art browser
Color screen
very good speech synthesizer
– In English and many Indian Languages
• A very powerful tool for access with wireless
• Soon to be modified as an E-book
www.simputer.org
www.picopeta.com
www.ncoretech.com
The Challenges in Computing
Tomorrow’s computing needs
are not in mflops and Gflops
The computer to process
Information, recognition and
DM like a Human
Small inexpensive
Robots, swarms will
be a reality
Ray Kurzweil:
The Age of Spiritual Machines
“A $1,000 PC (in 1999-dollars)…
– 2009 = trillion calculations/second
– 2019 = 20 million billion
calculations/second
(the human brain)
– 2029 = 2 * 1019 calculations/second
(1,000 human brains)
Ray Kurzweil:
The Age of Spiritual Machines
• 2009: “Computer displays have all the
display qualities of paper- high
resolution, high contrast, large viewing
angle, and no flicker. Books, magazines,
and newspapers are now routinely read
on displays that are the size of small
books.”
• 2009: “At least half of all (business)
transactions are conducted online.”
Ray Kurzweil:
The Age of Spiritual Machines
• 2009: “There is effective convergence of
all media, which exist as digital objects
(that is, files) distributed by the everpresent high-bandwidth, wireless
information web. Users can instantly
download books, magazines,
newspapers, television, radio, movies,
and other forms of software to their
highly portable personal communication
devices.”
2009
• A $1,000 PC delivers Terahertz speeds
• PCs with high resolution visual displays come in a
range of sizes
– from those small enough to be embedded in clothing and
jewelry
– to the size of a thin book
• Cables are disappearing
– Communication between components uses wireless
technology, as does access to the Web
• The majority of text is created using continuous
speech recognition
– Also ubiquitous are language user interfaces.
• Most routine business transactions (purchases, travel,
etc.) take place between a human and a virtual
personality
– Often the virtual personality includes an animated visual
presence that looks like a human face
Ray Kurzweil:
The Age of Spiritual Machines
• 2019: “Reading books, magazines, newspapers,
and other Web documents; listening to music;
watching three-dimensional moving images
(for example, television, movies); engaging in
three-dimensional visual phone calls; entering
virtual environments (by yourself, or with
others who may be geographically remote);
and various combinations of these activities
are all done through the ever-present
communications Web and do not require any
equipment, devices, or objects that are not
worn or implanted.”
Ray Kurzweil:
The Age of Spiritual Machines
2029: “The ever learning Society”
• Learning now constitutes the primary focus of the
human species.
• Human learning is accomplished using virtual
teachers (and virtual libraries?).
• Learning is enhanced by widely available neural
implants, which improve memory and perception but
cannot yet download knowledge directly.
• Automated agents are learning, on their own without
human assistance. Machines can now create
significant new knowledge with little or no human
intervention; unlike humans, machines easily share
knowledge structures with one another.
And Then There Was Music
•
•
•
•
RealJukeBox
Win Amp
MP3
Napster
The Growth rates
• The processor performance doubles every 18
Months
• The Network bandwidth doubles every year
• The storage capacity doubles every nine
months
• Soon you will have processor bottleneck
• 1000 times growth in storage in 10 years – I
already have 250 GB on a single disk-
Recognition verses Recall
• Recognition is like seeing your friend’s face
in a sea of faces
– even if he has changed since you last saw him
– storage intensive and fast
• Recall is like figuring out how to repair your
car’s carburetor using a manual and you
have never done that before- applying
knowledge to a new situation- processor
intensive and less storage
• Brian works on recognition
• Present day computers prefer recall –
remember the Y2K
• Future computers would work like the
brain- recognition
Recognition verses Recall- what
it does to our DL
• We will move away from quantitative search
(key word match) to “aboutness” and content
based retrieval
• In Future the documents will be read more by
computers than by humans – will it change
the way we write ? Would we think in html
or in xml ?
• From mere Text data to 3d Objects, voice and
video
• Multiligual
• Every conceivable form of knowledge
expression
Technology Driven vision for The
Digital Library
• We can store everything
– all the knowledge of the human race
– in all forms
– that is the Universal Digital Library
• Cost of Selection is stationary but
storage cost is plummeting
It is not about contents aloneIt is about networking of people
Education
Universities
Colleges
Schools
Real-time
Engineering
Science
Business
3 Ls of Learning
1. Face-to-Face Lectures
2. Virtual Labs
3. Universal Digital Library
Universal Library Vision
All recorded information online
• instantly available
–
–
–
–
To Anyone
Anywhere in the world
In any language
searchable, browsable, navigable by humans
and machines
Digital Library Contents
• Books
• Periodicals (journals, newspapers)
• Art, photographs
• Databases, software
• Movies, video
• Music, opera, dance
Suppose all of this were on the Web
Digital Library of the future
• Digital library
• Digital museum
• Digital tour guide
• Research assistant
• Knowledge amplifier
Can we store all the human
knowledge in a Digital form
There are about 100 Million books written by the human
race
Multiply by 10 for all other form of knowledge
1 book = 500 pp. = 1 MB uncompressed
– 109 books = 1015 bytes = 1 petabyte
140 million computers on the Internet
– At 20 GB free space each  >2.8 Zetabytes now
1 GB of disk costs ~$1
– 1 petabyte < $1 million
– Our Peta Byte server Initiative
– Storage is not the limitation but creation and
coordination are
– Avoiding Duplication and connectivity are
Universal Digital Library
•
•
•
•
More than 120 million PCs on the net
Each having atleast 20 GB of free space
Peer to peer Communication
Can we store all the Human Knowledge
in the computers
This is today
The time consuming process is taking the
printed books to the web- The technology is
not an impediment
Technology Driven Vision for the
Universal Digital Library
• A vision to store everything that the
human race ever produced
• A mission to digitize 1 Million Books and
make them freely available
The Strategy for Scanning of
books
• A planetary Scanner like the Minolta PS 7000
• Takes about two hours to scan a 500 page book,
crop, OCR and convert it to TIFF, HTML and
XML files
• About 10, 000 pages to the web in a day
• Storage per book is around ~ 60MB
• 100 Tera byte is not an issue
• Our Partner Internet Archives has 370 TB
adding 30 TB a day
• Distributed data bases
Process
Process Involved
Identification of
Books
Pre-Scanning
process
Scanning
Process
Image
Processing
Conversion
Process
Scanning
•2 pages at a time
•Stored in tif format
Post scanning operations
• Skew Correction
• Document Registration
• Dot Shading and Speck Removal
• Image centering
• Image Cropping
• Smoothing and Completion
Image comparison
Original Image
Processed Image
SW 1
OCR CONVERSION
Performance evaluation for various fonts in
Kannada language OCR
Series1: Average performance efficiency before using the cropping software.
Series2: Average performance efficiency after using cropping software.
The Digitized book
• Average book size ~ 500 Pages
• Size of Page as Image ~ 50-150 KB
• Size of Page as text file
(rtf /htm) ~ 8 – 15 KB
• Average size of Digitized book ~ 60MB
Brightness – Dark(1 in scale) and contrast – 9(in scale)
Original image
Cropped image
Million Books to the web- Stake
holders as Partners
• Academia- CS, IS and users
• Researchers and Language
Technologists
• Cultural and Religious Organizations
• Public Libraries
• Government Agencies
• None too exclusive
Background and Status
• Collaborative Project between India and US
• Lead roles by CMU and IISc
• Initiated by CMU sending scanners free of cost to
India. NSF supported
• Initiated by the Office of the Principal Scientific
Advisor to GOI by a Seed funding to IISc
• Fuelled by MCIT’s whole hearted support
• More than 16 centres in academic, religious and
government institutions spread across the country
• 69 scanners in place
• China, Egypt (Alexandria Library), Srilanka,
Australia joining in
• There is light on the other side of the tunnel
Hubs of DL Activities in India
Anna University, Chennai, Tamil Nadu
Arulmigu Kalasligam College of Engineering, Srivilliputur, Madurai,
Tamil Nadu
Goa University, Goa
Indian Institute of Information Technology, Allahabad, Uttar Pradesh
International Institute of Information Technology, Hyderabad, Andhra
Pradesh
City and State Central Library, Andhra Pradesh
Shanmugha Art, Science, Technology & Research Academy,
Thanjavore, Tamil Nadu
Sringeri Mutt, Sringeri, Karnataka
Tirumala Tirupathi Devasthanams, Tirupathi, Anadhra Pradesh
Mahastrastra Industrial Development Corporation, Maharastra
Universirty of Pune, Pune
Kanchi University, Kanchi, Tamil Nadu
Indian Institute of AstroPhysics, Karnataka
ary
1
3
4
2
Pu
5
Pu
ne
nja
bU
niv
MI
DC
III
TH
yd
hab
ad
Ra
sth
rap
ath
i
Go
IIS
aU
c
niv
ers
ity
lla
TA
ibr
CL
ger
i
Sri
1
III
S&
1
P
2
IIA
i
nch
Ka
RA
1
ST
SA
2
AU
45
40
35
30
25
20
15
10
5
0
CE
AK
Scanner Operation at Hubs
40
10
1
3
5
Centre
SCL
CCL
Kanchi
AU
PUNE
MIDC
TTD
SASTRA
AKCE
6276
5000
4500
4000
3500
3042
3000
2500
2000
2000 1704
1500
1031 1097
1000
504 465 273
158
500
0
IISc
No. of Books
Progress of Various Centre in Scanning
600000
400000
200000
500000
134100
Centre
39395
0
152502
97334
800000
451452
158933
1000000
1400000
1080759
1319001
837708
1200000
Ka AU
nc
hi
CC
L
SC
L
AK c
SA CE
ST
RA
TT
M D
ID
PU C
NE
IIS
No. of Pages
Number of Pages Scanned
Category of Books
English
Telugu
Tamil
Sanskrit
Kannada
Others
Urdu
5596
5000
4500
4000
3500
3000
2962
2500
2000
1500
836
1000
430
500
176
168
384
Ur
du
Ot
he
rs
Ka
nn
ad
a
Sa
nsk
rit
Ta
mi
l
Te
lug
u
En
gli
sh
0
Cumulative Status
4771184
16550
Books
Pages
More Centres and InitiativesAlready 61 scanners in operation
+ 39 in the pipe line
• Rashtrapathi Bhavan
• Punjab Technical University
• IIIT Hyderabad and University of Hyderabad
MCIT’s Initiatives
• Mobile Van with VSAT for the Book Mobile
• ERNET providing connectivity to all centres
• Many Centres supported with funds for
computers and for scanning operations
• Total spending from Government support and
from Scanning Centre’s resources is ten times
more than the Scanning equipment cost and
effectively 100 times more
• Support from all quarters of the government,
religious leaders, academia and private agencies
• Universal Digital Library of India to be launched
Some Observations
and the Road ahead
• More than 5 million pages have been scanned
• The highest average rate of sustained scanning
was about 4,000 pages per day at Hyderabad
during February.
• Our goal is to establish best practices to reach
6000 pages a day
• 3 years – 1 M Books
• By 2020 – 20 Million Books, 2 Million Songs,
200,000 Movies
• The most enviable content creation
Road Ahead
• Establishing the Digital Library of India
on the same lines as the E-Governance
Initiative
• Under the MCIT
• Head Quartered in AP
• A think tank for content selection,
delivery, technology and policy
directions for the country
• Creation of special funds for 4C
Criteria for Selecting Mega
Centres- 5 of them planned
• Geographical Distribution
• Availability of contents of interest to larger
user base
• Local enthusiasm to support and sustain this
activity
• Budget of US$ 200,000 Initially and around 0.5
cent per page of output
• One single scanner can produce 2 Million
pages a year• We will have 300 scanners – a Million books a
year
Raod Ahead
• Mega Content Creation Centres
• New Delhi, Varanasi, Allahabad, Hyderabad,
Far east (Tawang or Guahathi), Kolkotta and
Chennai
• Each Centre having around 40 scanners and 5
mobile scanners
• Content Creation Centres with upto 5 scanners
in Gujarat, Rajasthan so as to cover the entire
country
• Spearheading Language Technology
Initiatives
• Adding voice and video of our heritage
Universal Digital Library
• Goal — To have all public knowledge online,
available for free to all, everywhere
• An achievable goal
– There are only some 100,000,000 books in the world
– A few billion dollars could bring these online
• Limitations
– Copyright and licensing issues
– Different language books and character recognition
technologies
• We must ensure that English is not necessarily the de facto
language
• Universal Library
TECHNOLOGICAL CHALLENGES
• Input (scanning, digitizing, OCR)
• Data representation
– text, notations, images, web pages
• Navigation and Search
• Multilingual Issues
• Output (voice, pictures, virtual reality)
• Synthetic Documents
SEARCH ENGINE of UDL
• Very powerful light weight and scalable
CMU search engine
• Greenstone
• Both are working and are being evaluated
for the choice
• Both have been modified for use as Indian
Language search engines- language
independent search
• Future- Semantic web and content based
retrieval – Speech input and speech output
COMPARATIVE ANALYSIS – GREENSTONE Vs
UDL SEARCH ENGINES
Search
Engine
Time
Taken
Boolean
Proximity
Case
Stemming
Greenstone
Not
dependi
ng on the
number
of hits
OR &
NOT
Default
:AND
Phrase
searching
User can select
the option
Stemming
allowed
UDL
Highly
dependi
ng on the
number
OR
Default
:AND
No
No Case
Sensitivity
Not available
of hits
Choice of Collection
• Use books from libraries that are beyond
copyright
• Administrative metadata from OCLC, ISBN,
and other sources
• Dublin Core for Indian Books
• A Copy Right Metadata – aggressive attempts
to obtain copy right- Free Copyright from
many agencies including GoI
• Source Library Metadata
• Converge towards focussed collection
Funding – Road Ahead
• Funding effort must be an organized activity
• Commercial funding unlikely for “public
good” activity
– Must go to governments, NGOs
• World Bank
• Qatar (if CMU deal succeeds)
• Benefits of UDL:
–
–
–
–
–
Digital Opportunity
Use in distance education
International involvement – cultural diversity
Technology dissemination
Low cost v. conventional libraries
• Funding is tied to Outreach (next slide)
Outreach
• The UDL message must be disseminated
• Present at World Summit (WSIS) in Geneva
(12/03)
• Pre-WSIS meeting at CERN (12/03)
• Establish liaison with UN Decade of
Literacy (2003-2013)
• Points:
– Terabyte servers
– “Free to read” policy
– Universal Dictionary (applicability to other
domains)
Access by Public
• All content free to read, print one page at a
time
• Restrictions imposed by donors will be
respected
• Categories of use will be recognized, e.g.
cannot print entire document
• Buttons, links to fulfillment houses and
publishers are allowed- to take in “born
Digital” copyrighted material
Partner Relations- Future
• All material scanned or input as part of the
UDL will be shared by all partners
• Preference for national umbrella
organizations to simplify international
partner relations
• Relationships between partners and their
national DLs encouraged
• Online communication and collaboration
tools needed to facilitate partner questions
and interchanges
• Written partnership agreement will be made
Standards
• Published standards within the UDL
• Quality control and testing standard
• Funding to be sought to support standards
development
• Logo to be developed (graphic device
without words). Must appear on all sites, all
pages
• Logo should have a hot link to a gateway
site that links all UDL sites
• Local variability in look and feel of sites is
permitted so long as the logo is displayed
Scanning/OCR Policy
• We scan what gives greatest impetus to
continued funding
• Language: majority of content in English;
otherwise no restriction
• Scans will be previewed for minimum
quality; OCR will not be corrected unless
local site desires
Metadata
• All entries MUST have metadata according
to MARC or Dublin Core
Copyright
• Public domain materials: no restrictions,
tools for printing entire document provided
• Works of uncertain copyright status:
– Good faith effort to determine status, locate owner
– Scan and index work
– After a waiting period (at least one month), make
work viewable
• Archival material (old but unique)
– Allow resolution restriction to avoid devaluation
of original
• Out-of-print in-copyright (OPIC)
– Seek blanket permissions from publishers
Possible Intake Model
HINDI
INTAKE
LOCAL
MATERIALS
SCANNING
CENTER
ENGLISH
INTAKE
SCANNING
CENTER
TAMIL
INTAKE
LOCAL
MATERIALS
SCANNING
CENTER
INDIA
CENTRAL
MIRROR SITE
GUJARATI
INTAKE
SCANNING
CENTER
SCANNING
CENTER
INDIA
OUTSIDE
INDIA
AUSTRALIAN
MIRROR SITE
CMU
UL SERVER
LOCAL
MATERIALS
CHINESE
MIRROR SITE
ART
INTAKE
The Digital Library a Test Bed for
language research
• Rich data in many languages from the Million
Books to the web Project - atleast 10,000 books
in any language
• Translations in many languages- Gita, NBT,
NCERT etc- an excellent tool for language
translation• Training data for the OCR
• The case insensitive ITRANS standard
The Digital Library a Test Bed for
language research
• Rich data makes the creation of OCRs in
Indian languages easy- In Tamil, Kannada
and Malayalam – A rapid prototyping
• Speech synthesis and recognition
• Indian Language Search Engines
• Example Based Machine Translation
• Universal Dictionary
Word
danúbia
danum
danun
danup
danup
danupan
daný
daný
daný
daný číslom
daný na pospas
danyag
daog
daog
daogdaog
daong
daong
daot
daot
daotan
daotan'g buut
daotan'g hitabo
daotan'g tinguha
daotan'g tuyo
dapa
dapa
dapa
dapače
dapadnúť (na
d'apaiser
nohy)
English
linen tape
water
early
hunger
hunger, starvation
hungry, starving
existent
existing
given
numerical
obnoxious
landscape
overturn
prevail
manhandle
boat with a covered cabin, ark
bring the ship to shore
harm
mar
bad
dislike
mishap
malice
malice
granary
lie flat on stomach or face
lie
flat on stomach or face
down
on
the contrary
down
to land
to appease
POS
Pron
Use
The Universal Dictionary
n
v
v
v
v
v
adj
n
n
n
n
n
adv
v
Lang
HUN
PMP
PMP
PMP
PMP
PMP
SLO
SLO
SLO
SLO
SLO
HIL
CEB
CEB
CEB
TAG
TAG
CEB
CEB
CEB
CEB
CEB
CEB
CEB
CEB
PMP
TAG
BOS
SLO
FRE
HUNGARIAN
KAMPAMPANGAN
SLOVAK
HILIGAYNON
CEBUANO
TAGALOG
BOSNIAN
FRENCH
Aboutness Hierarchy- Dr Shamos
Universe
SUBJECT SEARCHING
OCCURS HERE
Newspaper
Collection
3D Artifact
Book
Chapter
Article
Section
Paragraph
Sentence
KEYWORD SEARCHING
OCCURS HERE
Word
Glyph
Photograph
Object
Legal and Business Challenges
• Use of copyrighted material
• Economics (Who pays? Who gets?)
• Privacy
• Reliability of information
• Change in the nature of teaching
• Change in the nature of Information
creation and use
Philosophy of Copy Right Laws
• Protect the Inventor so that private
investments in R & D would flow
• Disseminate the information so that
society grows
• Protect the fairuse
• Ensure you get what you paid for
What can be copyrighted ?
• Must be tangible, e.g. a lecture can’t be
copyrighted, a transcript of it can
• Work must be original
• Work must be creative - even minimal
efforts usually count as creative
Fair use doctrine
Authorizes any person to make fair use of a
published or unpublished copyrighted work
(including the making of unauthorized
copies) in these contexts:
 In connection with criticism of or comment
on the work
 In the course of news reporting
 For teaching purposes or
 As part of scholarship or research activity
Four basic Factors:
1.
2.
3.
4.
The purpose and character of the use,
including whether such use is of a
commercial nature or is for nonprofit
educational purposes
The nature of the copyrighted work
The amount and substantiality of the portion
used in relation to the copyrighted work as a
whole; and
The effect of the use upon the potential
market for or value of the copyrighted work
www.library.org principles
1. Scholarly and government information
and knowledge is a public good
• that should be available, maintaining the
balance of the rights of the individual creator
vs. the needs of the public
2. The Library is the intellectual crossroads
of the community.
3. Librarians will conceptualize and ensure
• implementation of innovative new systems
• for the creation and dissemination of information
for succeeding generations.
“This rule provides that the first sale of a
copy of a work to a member of the public
‘exhausts’ the rights holder’s ability to
control further distribution of that copy. A
library is thus free to lend, or even rent or
sell, its copies of books to patrons”
How does this work in the Digital
World ?
Music, Movie and Entertainment
Industry
•
•
•
•
•
•
•
•
Much larger part of most of the economies
Large production costs
Need to protect business interest
Need to technology to protect
NAPSTER – peer to peer communication
DeCSS
NAPSTER for video ??
Consumer is different from the creator
New paradigms in the Digital
Library
• Should the laws used for protecting
commercially attractive enterprise such
as patents, music, entertainment be
applied to DL
• The dissemination of information
creates multiplication unlike in music
etc
• Shorter life cycles for the information
Copyright Conflicting
requirements


Need to protect the financial interests
of creators in order to encourage
private investments to the economy
Need to create a framework for every
human being to create
The 2nd principle should dominate in DL
The 1st principle should dominate the
others
The Concept of FourC
 The scientific community is the only
one that is creator and consumer of
information
 It pays for both
 The SW Industry had shown the way
for freeware
 Can we do it in Scholarly
communication, text books etc.
The Concept of FourC
 In the 20th Century, in the interest of public
good the Governments created BBC, PBS,
AIR and also the Public Library Systemprovided compensation for artists and
writers while providing free access to public
Total Global Expenditure in public
broadcasting and public libraries exceed 100
B$
 Look at our kings who supported all the
poets and scholars
 We need to find the 21st Century equivalent
of BBC, AIR and PBS.
The Concept of FourC
 Learn from NAPSTER- will we have a
video equivalent of NAPSTER
 It is impossible to police and protect IP
Rights at gigabit rate connections
 Some countries and WIPO under pressure
from lobbying groups form the draconian
Copy Right Laws
 Remember the FAIR USE Doctrine- and
what the creators want- recognition and
compensation
The Solution -FourC
Consortium for Compensation of Creative
Contents- FourC
Set aside 25% of the current national
expenditure on public broadcasting and PLs
Authors are encouraged to put the work on
the web after a few years of commercial
exploitation- many models- in return get tax
excempt etc.
India showing the way IASc and INSA
Books out of print
Titanic effect
Authors Can take back the Copy right
The Solution -FourC
Authors compensation based on the hits
Future versions of text books may be FAQs
and XMLisedMany eceonomic modelsCan work for Courseware as well
The Solution -FourC
The changing trend in publications- we want
the documents to be readable by the machines
as well humans
Born digital documents
Can we compensate those for creating contents
for the web
Can we compensate those who create music
and movies for the web- really small form
factor – small screens
Conclusion
• Knowledge multiplies whenever bits are circulated on
the web
• Technology has a habit of creating a problem (by
knowledge explosion) and spending the rest of its time
in trying to solve it- through Digital Library
• The Universal Digital Library with 20 Million Books by
2020 – A year our President dreams India to become a
developed nation
• A FourC Policy and a Digital Library Act are in the anvil
in India to meet this mission
• If a billion people sneeze- together we can create a
Hurricane
• With the technology of the two nations we will convert
this hurricane into useful energy and light up the world
of knowledge
• If you are creating a digital library, it should be for
access by anyone, anytime and from any place
• If Your Digital Library Is For Exclusive Use, Let Us Talk
About Weather
• There Is Nothing Called, Your DL, My DL
– It Is Our DL
– The Universal Digital Library
Download