CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Email Address: meyer@sci.brooklyn.cuny.edu Course Page: http://www.sci.brooklyn.cuny.edu/~meyer/ CISC3140-Meyer-lec4B CISC3140-Meyer-lec4B Contents • Information systems? ▫ What is information? ▫ Knowledge representation ▫ Information retrieval • SQL ▫ A bit more on relational databases ▫ Finishing up sql labs 1 & 2 ▫ Meeting with project group to discuss database design CISC3140-Meyer-lec4B Information Systems • We spent some time defining: ▫ Data Models ▫ Database Models • Both of these areas of study are contained within a larger branch of knowledge called Information Systems. • Information Systems is a massive field. CISC3140-Meyer-lec4B Information Systems (IS) comprises: • • • • • • • • Data modeling & Organization Computer based information systems Objectives (when working with information) Risk Assessment (when working with information) Planning and project management IS development life cycle Tools, techniques and methodologies (for data) Social effects (of data access and control) CISC3140-Meyer-lec4B Types of Information Systems • Informal Evolve from human behavior (can be complex) Not formalized (i.e., designed) Rely on “word of mouth” (“the grapevine”, oral traditions) • Manual Formalized but not computer based Historical handling of information in organizations, before computers (i.e., human “clerks” did all the work, libraries, filing systems) Some organizations still use aspects of manual IS (e.g., because computer systems are expensive or don’t exist to replace specialized human skills) • Computer-based Automated, technology-based systems Typically run by an “IT” (information technology) department within a company or organization (e.g., ITS at BC) CISC3140-Meyer-lec4B Computer Based IS • Data Processing systems (accounting, production) • Office Automation systems (e.g., document management, email, scheduling systems, spreadsheets) • Management Information systems (MIS) (e.g., produce information from data, data analysis and reporting) • Decision Support systems (DSS) (e.g., extension of MIS, often with some intelligence, allow prediction, posing of “what if” questions) • Executive Information systems (EIS) (e.g., extension of DSS), contain strategic modeling capabilities, data abstraction, support high-level decision making and reporting, often have fancy graphics. CISC3140-Meyer-lec4B Why information systems? • • • • • • Efficiency Effective resource and personnel management Competitive advantage Reduce risk To support an organization’s long-term goals To allow replacement/elimination of unnecessary personnel. CISC3140-Meyer-lec4B Alright... so what IS information? • People disagree. Seriously? Seriously. • Human perspective, information is: ▫ "Stuff... about... things..." ▫ Things told, knowledge, items of knowledge, news. ▫ Data that is useful? • What is the relationship between data and information and knowledge and beliefs? • Does "information" require a human mind? CISC3140-Meyer-lec4B What to the experts say? Intuitive Notion of Information (Losee, 1997) 1. Information must be something, although its exact nature is not clear 2. Information must be about something (something?) 3. Information must be “new” (repeating something old isn’t considered “information”... or is it?) 4. Information must be true (i.e., not “misinformation”) Note: That these are human-centered definitions that emphasizes meaning and message CISC3140-Meyer-lec4B Information Theory (1940's) • Claude Shannon, 1940’s, IBM ▫ Studying communication and found ways to measure/categorize information ▫ Communication = producing the same message at its destination as at its source ▫ Problem: noise can distort the message ▫ Message is encoded between source (transmitter) and destination (receiver) • Introduces the question of information types/ and properties (information is something that can be manipulated). CISC3140-Meyer-lec4B Types of Information Types of information can be differentiated by: ▫ ▫ ▫ ▫ form content quality associated information • Properties of information ▫ Can be communicated electronically (methods: broadcasting, networking) ▫ Can be duplicated and shared (issues: ownership, control, maintenance, correction) CISC3140-Meyer-lec4B The Library of Babel, by Jorge Luis Borges (1941) • A story about a universe comprised of an indefinite (possibly infinite) number of hexagonal rooms, each containing walls of bookshelves that contain books which, in turn contain all possible combinations of letters • Does the library contain: information? data? knowledge? intelligence? • Introduces the question of form and meaning regards to information. CISC3140-Meyer-lec4B Meaning VS. Form • Is the form of information the information itself? or another kind of information? • Is the meaning of a signal or message the signal or message itself? • Signal to symbol problem: ▫ What does 451 mean? • Artifacts (symbols with meaning) help us reason ▫ Anything not present in a representation can be ignored (do you agree with that?) ▫ Things left out of a representation are often those things that are hard to represent, or we don’t know how to represent them CISC3140-Meyer-lec4B Side Note: • How is the World Wide Web like (or unlike) the library of babel? • In "Weaving the Web" Sir Tim Berners-Lee notes that one of the principle complaints against his WWW and HTTP protocol is that without a rigid (hierarchical) ordering of pages nobody would ever be able to find anything? • Were they right? Google? • What about content factories effect on Google? CISC3140-Meyer-lec4B Information Theory Today • Total annual information production including print, film, media, etc is between 3-5 Exabyte's per year (1XB is 1,073,741,824GB) ▫ 5 Exabytes: Could store all words ever spoken by every human being that ever lived. ▫ Google has indexed 10billion WebPages already. • How to we organize this??? ▫ Remember it accumulates! • Information hierarchy: ▫ data -> information -> knowledge -> intelligence CISC3140-Meyer-lec4B Information retrieval • Another sub-category of IS • Information organization IS NOT information retrieval • Organization: ▫ categorizing and describing information objects in ways that people can use them who need to use them • Retrieval: ▫ being able to find the information objects you need when you need them • Two key concepts: ▫ precision: did I find what I wanted? ▫ recall : how quickly did I find it? • Ideally, we want to maximize both precision and recall—this is the primary goal of the field of information retrieval (IR) CISC3140-Meyer-lec4B IR Assumptions 1. 2. 3. Information remains static Query remains static The value of an IR solution is in how good the retrieved information meets the needs of the retriever Are these good assumptions? 1. In general, information does not stay static; especially the internet 2. People learn how to make better queries • Problems with standard model on the internet: ▫ “answer” is a list of hyperlinks that then need to be searched ▫ answer list is apparently disorganized CISC3140-Meyer-lec4B IR Process • IR is iterative ▫ IR doesn’t end with the first answer (unless you’re “feeling lucky”...) ▫ Humans can recognize a partially useful answer; automated systems cannot always do that ▫ Human queries change as their understanding improves by the results of previous queries ▫ Sometimes humans get an answer that is “good enough” to satisfy them, even if initial goals of IR aren’t met CISC3140-Meyer-lec4B IR Strategies • Zone Search ▫ Typically free-text search in clearly defined area. • Scoring and Ranking ▫ How closely text matched ▫ If multiple zones used, how many zones found ▫ Can weight (prioritize) different zones: Example: score = 0.6(Brooklyn 2 neighborhood)+0.5(3 2 bedrooms)+0.4(1000 = price) ▫ Term Weight/Density CISC3140-Meyer-lec4B THERE IS A WHOLE LOT MORE • There is a whole lot more to Information Systems then these last few slides. ▫ But, again it will be MORE interesting later in your careers. • We WILL revisit this topic briefly when we examine "agents" and "intelligent agents" at the end of the class. ▫ Agents are concerned with beliefs/knowledge that can be derived from information. ▫ IR (particularly search weight) is imporant. CISC3140-Meyer-lec4B Last Lab we created something like this tblUser userID lastname firstname month day 1 Bob Barker 12 11 2 Bill Bixby 10 9 3 Ozzie Allcomefree 8 7 4 harry potter 12 11 5 George Lucas 6 10 • But again, this is inefficient, because users 1 and 4 have the same birthday and that information is stored twice in the table CISC3140-Meyer-lec4B 3 Tables (3rd Normal Form) tblUser userID lastname firstname 1 Bob Barker 2 Bill Bixby 3 Ozzie Allcomefree 4 harry potter 5 George Lucas rltUserBday userID bdayID 1 1 2 2 3 3 4 1 5 4 tblBday bdayID month day 1 12 11 2 10 9 3 8 7 5 6 10 • At first this may not look more efficient, but consider a table with 10 billion name entries. • There are only 365 days in a year. CISC3140-Meyer-lec4B Table Integrity • Another advantage of using 3 tables is that you are preserving the table integrity by storing the data and the relation in separate tables. • BUT If you do this you have to be careful when removing entries from the tables: ▫ Think of the user table as the primary table and the other tables as auxiliary to that table ▫ If you remove an entry from the user table, you must also remove the corresponding entry for the birthday the relation. CISC3140-Meyer-lec4B Simple Deletion 1-Table Model mysql> DELETE FROM tblUser WHERE lastname=’potter’ AND firstname=’harry’; CISC3140-Meyer-lec4B Simple Deletion 3 Table Model mysql> DELETE FROM rltUserBday WHERE userID=4; mysql> DELETE FROM tblUser WHERE lastname=’potter’ AND firstname=’harry’; Or if you don’t know what the userID value is: mysql> DELETE FROM rltUserBday WHERE userID=( SELECT userID FROM tblUser WHERE lastname=’potter’ AND firstname=’harry’); mysql> DELETE FROM tblUser WHERE lastname=’potter’ AND firstname=’harry’; CISC3140-Meyer-lec4B Take a deep breath. You got this! CISC3140-Meyer-lec4B Continuing with the Lab • Finishing up sql labs 1 & 2 • Meeting with project group to discuss database design & review project schematics.