COPA notes Gio Wiederhold

advertisement
COPA notes
Gio Wiederhold
Computer Science Dept. and Medicine, Stanford University
www-db.stanford.edu/people/gio.html
Technology demo
http://jxw.stanford.edu/cgi-bin/zwang/wipe2_show.cgi
3 August 2000, updated 6 August 2000
The purpose of the Child Online Protection Act is to prohibit online sites
from knowingly making available to minors material that is "harmful to
minors" (sexually explicit material meeting definitions set forth in the Act).
Commercial providers of "harmful to minors" material may defend themselves
against prosecution by restricting the access of minors to such material.
August 2000
Gio Wiederhold for COPA
1
Dealing with the system chain
• We have to consider the entire chain of flow from Producers to Consumers
– The flow can be hindered at any link, but focused controls encourage workarounds
• The producers can be, and are likely to be anywhere; not only where US law applies
– Many producers of pornography are financially motivated
– There is also a significant remainder for whom it is a hobby
• The costs of setting up a pornographic website are small
– Poverty and exhibitionism provides low-cost material
– The cost is in marketing and collecting income
• The Internet’s capability to transmit images and video are increasing
– Today, 2000, 1.4M homes have broadband access (ISDN, DSL, cable, T1, …)
– By 2004 more than 20M homes will have it [Yankee group per Internet World, 1 August 2000]
– Motivated by the entertainment industry, and unlikely to be stopped
• The consumers are not willing to pay for and install for restrictions
– Libraries are reasonably concerned about limiting speech
– Parents are not motivated to impose restrictions: trust, awkwardness, self-interest
August 2000
Gio Wiederhold for COPA
2
System issues
• If consumers, the final link, do not cooperate, the chain will not be broken
–
–
–
–
Illegal copying of copyrighted music
Non-payment of sales taxes
Drug use condoned by well-off urban and suburban households and
Watching Pornography - at one point ~50% of Internet use
• Some novel methods exist and could be applied at all 3 links
– WIPE can identify objectionable websites, and publish their IP addresses
– Objectionable sites, identified by IP addresses, could be blocked by filters
– WIPE can dynamically analyze web content and with high probability identify
objectionable content, and be used to filter out such content
– Parents and libraries can identify under-age consumers
• These technologies are not perfect, but their application is feasible
– Installation of filters may be at ISPs, in browsers, at home, under parental control
We will show next some images where our WIPE technology failed
• showing failures provides understanding, also prevents embarrassment
– In practice our purely image-analysis based technology would be
1. Applied to many images at a provider site, creating a very high confidence
2. Combined with other information, as text, recognition of museums, etc.
August 2000
Gio Wiederhold for COPA
3
Using only image features ~
7% over an image library
WIPE
Examples of benign images classified as objectionable
Art from a
museum
collection
Color
mix too
similar
Much
skin
Common
composition of
objectionable
images
Color and
composition
too similar
August 2000
Gio Wiederhold for COPA
4
Using only image features ~
3% over prurient web-sites
WIPE,
Failures to categorize offending pictures correctly
(WIPE learns using images from offensive sites)
Much surrounding
material
Partially
covered
Little
flesh
Too dark
(our blocking)
August 2000
Gio Wiederhold for COPA
5
Problem: NO single approach is perfect
• Rating & labeling websites depends on provider cooperation
– Commercial providers may well comply
– Amateurs, clubs, foreign sites would not comply
• Firewall and similar technologies, depend on IP addresses
– IP addresses are large in number, change rapidly and cannot be controlled
• Checking for text is easily bypassed
– the choices for indicative terms are wide, will create false constraints
– suggestive terms can be hidden in images
– however, commercial sites will want to be found
• Checking of image content [WIPE] also create false hits
• Websites using pirate processes can be recognized and included in bad-lists
– disabling exit means, hijacking browser BACK buttons, excessive stickyness
Combining multiple technologies can improve the performance
It is easier to identify objectionable websites than to filter individual pages
–
Broadband speeds are difficult to match by processing technology
August 2000
Gio Wiederhold for COPA
6
Suggestion
1. Recognize both Greenfield (www. … . kid) areas, where kid-friendly parties
can reside, perhaps monitored by a voluntary watchdog organization, and
Redlight (www. … .xxx) areas where adult, explicit purveyors can reside.
Most commercial adult sites might voluntarily enter that district, since they
would be easy to find by their customers. Predatory sites, not in the .xxx
domain, should be blacklisted.
Note that these two areas will only occupy a few percent of the web content,
since most commercial and most scientific sites will not want to label
themselves green nor red. Not green is not red, and not red is not green!
2. Establish a consortium, supported by by the vendors of filtering tools and
major portals, that would survey the non .xxx portion of the Internet, and
locate, classify, and list objectionable, predator sites for all . It would share
the redundant work of identifying objectionable sites, make the filtering
more consistent, and provide a forum to allow appeals when sites have been
misclassified. The filtering products would then still compete on the basis of
ease-of-use, compatibility, methods of parental control, and price.
August 2000
Gio Wiederhold for COPA
7
Conclusion
To deal with the problem of keeping objectionable material from our
children, cooperation of all participants in www commerce is needed.
No single solution will be adequate, but the a combination of technologies
can do much to protect children from undesirable influences.
Filtering tools must be improved, but until a market develops they will not be
able to provide satisfactory services for parents. Filtering should be
based on a combination of initial automatic text-analysis, image analysis,
process analysis, and include human validation and review.
We suggest establishment of a domain-based green/gray/red infrastructure,
and a consortium to deal with the massive volume of material on the web.
There is a cost to breaking the chain, and there must be willingness by
parents and other participants to bear that cost, both financially and
socially.
August 2000
Gio Wiederhold for COPA
8
Bio
Gio Wiederhold is a Professor of Computer Science at Stanford University, with appointments in
Medicine and Electrical Engineering. He has been active for many years in the application
and development of knowledge-based techniques to database management, information
systems, protection of information, and software construction and maintenance. His current
research projects address resolving problems due to Heterogeneity of Information and
Privacy Protection.
Gio Wiederhold received a degree in aeronautical engineering in Holland in 1957 and started
programming early computers at the NATO Air Defense Technical Center there. In 1958 he
emigrated to the United States. After gaining 16 years of industrial experience, he returned
to school and earned in 1976 a PhD in Medical Information Science from the University of
California in San Francisco. He has been on the Stanford faculty since that time. From 1991
to 1994 he was on leave as a Program Manager for Knowledge-Based Systems at DARPA,
initiating programs in Software Composition, Intelligent Integration of Information, and
Digital Libraries.
Professor Wiederhold has consulted for the US Department of Health and Human Services,
various US defense agencies, and Silicon Valley innovators. He is, and has been, a member
of many academic and governmental panels and boards, as well as editor of several
professional publications. He is a fellow of the ACMI, the IEEE, and the ACM. He has
published 4 books and more than 350 publications in computing and medicine.
Professor Wiederhold's address is Department of Computer Science, Gates 4A, Stanford
University, Stanford, CA 94305-9040.
His home page at http://www-.stanford.edu/people/gio.html provides details.
August 2000
Gio Wiederhold for COPA
9
Download