COPA notes Gio Wiederhold Computer Science Dept. and Medicine, Stanford University www-db.stanford.edu/people/gio.html Technology demo http://jxw.stanford.edu/cgi-bin/zwang/wipe2_show.cgi 3 August 2000, updated 6 August 2000 The purpose of the Child Online Protection Act is to prohibit online sites from knowingly making available to minors material that is "harmful to minors" (sexually explicit material meeting definitions set forth in the Act). Commercial providers of "harmful to minors" material may defend themselves against prosecution by restricting the access of minors to such material. August 2000 Gio Wiederhold for COPA 1 Dealing with the system chain • We have to consider the entire chain of flow from Producers to Consumers – The flow can be hindered at any link, but focused controls encourage workarounds • The producers can be, and are likely to be anywhere; not only where US law applies – Many producers of pornography are financially motivated – There is also a significant remainder for whom it is a hobby • The costs of setting up a pornographic website are small – Poverty and exhibitionism provides low-cost material – The cost is in marketing and collecting income • The Internet’s capability to transmit images and video are increasing – Today, 2000, 1.4M homes have broadband access (ISDN, DSL, cable, T1, …) – By 2004 more than 20M homes will have it [Yankee group per Internet World, 1 August 2000] – Motivated by the entertainment industry, and unlikely to be stopped • The consumers are not willing to pay for and install for restrictions – Libraries are reasonably concerned about limiting speech – Parents are not motivated to impose restrictions: trust, awkwardness, self-interest August 2000 Gio Wiederhold for COPA 2 System issues • If consumers, the final link, do not cooperate, the chain will not be broken – – – – Illegal copying of copyrighted music Non-payment of sales taxes Drug use condoned by well-off urban and suburban households and Watching Pornography - at one point ~50% of Internet use • Some novel methods exist and could be applied at all 3 links – WIPE can identify objectionable websites, and publish their IP addresses – Objectionable sites, identified by IP addresses, could be blocked by filters – WIPE can dynamically analyze web content and with high probability identify objectionable content, and be used to filter out such content – Parents and libraries can identify under-age consumers • These technologies are not perfect, but their application is feasible – Installation of filters may be at ISPs, in browsers, at home, under parental control We will show next some images where our WIPE technology failed • showing failures provides understanding, also prevents embarrassment – In practice our purely image-analysis based technology would be 1. Applied to many images at a provider site, creating a very high confidence 2. Combined with other information, as text, recognition of museums, etc. August 2000 Gio Wiederhold for COPA 3 Using only image features ~ 7% over an image library WIPE Examples of benign images classified as objectionable Art from a museum collection Color mix too similar Much skin Common composition of objectionable images Color and composition too similar August 2000 Gio Wiederhold for COPA 4 Using only image features ~ 3% over prurient web-sites WIPE, Failures to categorize offending pictures correctly (WIPE learns using images from offensive sites) Much surrounding material Partially covered Little flesh Too dark (our blocking) August 2000 Gio Wiederhold for COPA 5 Problem: NO single approach is perfect • Rating & labeling websites depends on provider cooperation – Commercial providers may well comply – Amateurs, clubs, foreign sites would not comply • Firewall and similar technologies, depend on IP addresses – IP addresses are large in number, change rapidly and cannot be controlled • Checking for text is easily bypassed – the choices for indicative terms are wide, will create false constraints – suggestive terms can be hidden in images – however, commercial sites will want to be found • Checking of image content [WIPE] also create false hits • Websites using pirate processes can be recognized and included in bad-lists – disabling exit means, hijacking browser BACK buttons, excessive stickyness Combining multiple technologies can improve the performance It is easier to identify objectionable websites than to filter individual pages – Broadband speeds are difficult to match by processing technology August 2000 Gio Wiederhold for COPA 6 Suggestion 1. Recognize both Greenfield (www. … . kid) areas, where kid-friendly parties can reside, perhaps monitored by a voluntary watchdog organization, and Redlight (www. … .xxx) areas where adult, explicit purveyors can reside. Most commercial adult sites might voluntarily enter that district, since they would be easy to find by their customers. Predatory sites, not in the .xxx domain, should be blacklisted. Note that these two areas will only occupy a few percent of the web content, since most commercial and most scientific sites will not want to label themselves green nor red. Not green is not red, and not red is not green! 2. Establish a consortium, supported by by the vendors of filtering tools and major portals, that would survey the non .xxx portion of the Internet, and locate, classify, and list objectionable, predator sites for all . It would share the redundant work of identifying objectionable sites, make the filtering more consistent, and provide a forum to allow appeals when sites have been misclassified. The filtering products would then still compete on the basis of ease-of-use, compatibility, methods of parental control, and price. August 2000 Gio Wiederhold for COPA 7 Conclusion To deal with the problem of keeping objectionable material from our children, cooperation of all participants in www commerce is needed. No single solution will be adequate, but the a combination of technologies can do much to protect children from undesirable influences. Filtering tools must be improved, but until a market develops they will not be able to provide satisfactory services for parents. Filtering should be based on a combination of initial automatic text-analysis, image analysis, process analysis, and include human validation and review. We suggest establishment of a domain-based green/gray/red infrastructure, and a consortium to deal with the massive volume of material on the web. There is a cost to breaking the chain, and there must be willingness by parents and other participants to bear that cost, both financially and socially. August 2000 Gio Wiederhold for COPA 8 Bio Gio Wiederhold is a Professor of Computer Science at Stanford University, with appointments in Medicine and Electrical Engineering. He has been active for many years in the application and development of knowledge-based techniques to database management, information systems, protection of information, and software construction and maintenance. His current research projects address resolving problems due to Heterogeneity of Information and Privacy Protection. Gio Wiederhold received a degree in aeronautical engineering in Holland in 1957 and started programming early computers at the NATO Air Defense Technical Center there. In 1958 he emigrated to the United States. After gaining 16 years of industrial experience, he returned to school and earned in 1976 a PhD in Medical Information Science from the University of California in San Francisco. He has been on the Stanford faculty since that time. From 1991 to 1994 he was on leave as a Program Manager for Knowledge-Based Systems at DARPA, initiating programs in Software Composition, Intelligent Integration of Information, and Digital Libraries. Professor Wiederhold has consulted for the US Department of Health and Human Services, various US defense agencies, and Silicon Valley innovators. He is, and has been, a member of many academic and governmental panels and boards, as well as editor of several professional publications. He is a fellow of the ACMI, the IEEE, and the ACM. He has published 4 books and more than 350 publications in computing and medicine. Professor Wiederhold's address is Department of Computer Science, Gates 4A, Stanford University, Stanford, CA 94305-9040. His home page at http://www-.stanford.edu/people/gio.html provides details. August 2000 Gio Wiederhold for COPA 9