lecture4b - Computer and Information Science

advertisement
CISC 3140 (CIS 20.2)
Design & Implementation of
Software Application II
Instructor : M. Meyer
Email Address:
meyer@sci.brooklyn.cuny.edu
Course Page:
http://www.sci.brooklyn.cuny.edu/~meyer/
CISC3140-Meyer-lec4B
CISC3140-Meyer-lec4B
Contents
• Information systems?
▫ What is information?
▫ Knowledge representation
▫ Information retrieval
• SQL
▫ A bit more on relational databases
▫ Finishing up sql labs 1 & 2
▫ Meeting with project group to discuss database
design
CISC3140-Meyer-lec4B
Information Systems
• We spent some time defining:
▫ Data Models
▫ Database Models
• Both of these areas of study are contained within
a larger branch of knowledge called Information
Systems.
• Information Systems is a massive field.
CISC3140-Meyer-lec4B
Information Systems (IS) comprises:
•
•
•
•
•
•
•
•
Data modeling & Organization
Computer based information systems
Objectives (when working with information)
Risk Assessment (when working with
information)
Planning and project management
IS development life cycle
Tools, techniques and methodologies (for data)
Social effects (of data access and control)
CISC3140-Meyer-lec4B
Types of Information Systems
• Informal
 Evolve from human behavior (can be complex)
 Not formalized (i.e., designed)
 Rely on “word of mouth” (“the grapevine”, oral traditions)
• Manual
 Formalized but not computer based
 Historical handling of information in organizations, before computers (i.e.,
human “clerks” did all the work, libraries, filing systems)
 Some organizations still use aspects of manual IS (e.g., because computer
systems are expensive or don’t exist to replace specialized human skills)
• Computer-based
 Automated, technology-based systems
 Typically run by an “IT” (information technology) department within a
company or organization (e.g., ITS at BC)
CISC3140-Meyer-lec4B
Computer Based IS
• Data Processing systems (accounting, production)
• Office Automation systems (e.g., document
management, email, scheduling systems, spreadsheets)
• Management Information systems (MIS) (e.g., produce
information from data, data analysis and reporting)
• Decision Support systems (DSS) (e.g., extension of MIS,
often with some intelligence, allow prediction, posing of
“what if” questions)
• Executive Information systems (EIS) (e.g., extension of
DSS), contain strategic modeling capabilities, data
abstraction, support high-level decision making and
reporting, often have fancy graphics.
CISC3140-Meyer-lec4B
Why information systems?
•
•
•
•
•
•
Efficiency
Effective resource and personnel management
Competitive advantage
Reduce risk
To support an organization’s long-term goals
To allow replacement/elimination of
unnecessary personnel.
CISC3140-Meyer-lec4B
Alright... so what IS information?
• People disagree. Seriously? Seriously.
• Human perspective, information is:
▫ "Stuff... about... things..."
▫ Things told, knowledge, items of knowledge, news.
▫ Data that is useful?
• What is the relationship between data and
information and knowledge and beliefs?
• Does "information" require a human mind?
CISC3140-Meyer-lec4B
What to the experts say?
Intuitive Notion of Information (Losee, 1997)
1. Information must be something, although its exact
nature is not clear
2. Information must be about something (something?)
3. Information must be “new” (repeating something
old isn’t considered “information”... or is it?)
4. Information must be true (i.e., not “misinformation”)
Note: That these are human-centered definitions that
emphasizes meaning and message
CISC3140-Meyer-lec4B
Information Theory (1940's)
• Claude Shannon, 1940’s, IBM
▫ Studying communication and found ways to
measure/categorize information
▫ Communication = producing the same message at its
destination as at its source
▫ Problem: noise can distort the message
▫ Message is encoded between source (transmitter) and
destination (receiver)
• Introduces the question of information types/ and
properties (information is something that can be
manipulated).
CISC3140-Meyer-lec4B
Types of Information
Types of information can be differentiated by:
▫
▫
▫
▫
form
content
quality
associated information
• Properties of information
▫ Can be communicated electronically (methods:
broadcasting, networking)
▫ Can be duplicated and shared (issues: ownership,
control, maintenance, correction)
CISC3140-Meyer-lec4B
The Library of Babel, by Jorge Luis Borges
(1941)
• A story about a universe comprised of an
indefinite (possibly infinite) number of
hexagonal rooms, each containing walls of
bookshelves that contain books which, in turn
contain all possible combinations of letters
• Does the library contain: information? data?
knowledge? intelligence?
• Introduces the question of form and meaning
regards to information.
CISC3140-Meyer-lec4B
Meaning VS. Form
• Is the form of information the information itself? or another kind of
information?
• Is the meaning of a signal or message the signal or message itself?
• Signal to symbol problem:
▫ What does 451 mean?
• Artifacts (symbols with meaning) help us reason
▫ Anything not present in a representation can be ignored (do you
agree with that?)
▫ Things left out of a representation are often those things that are
hard to represent, or we don’t know how to represent them
CISC3140-Meyer-lec4B
Side Note:
• How is the World Wide Web like (or unlike) the
library of babel?
• In "Weaving the Web" Sir Tim Berners-Lee
notes that one of the principle complaints
against his WWW and HTTP protocol is that
without a rigid (hierarchical) ordering of pages
nobody would ever be able to find anything?
• Were they right? Google?
• What about content factories effect on Google?
CISC3140-Meyer-lec4B
Information Theory Today
• Total annual information production including
print, film, media, etc is between 3-5 Exabyte's
per year (1XB is 1,073,741,824GB)
▫ 5 Exabytes: Could store all words ever spoken by
every human being that ever lived.
▫ Google has indexed 10billion WebPages already.
• How to we organize this???
▫ Remember it accumulates!
• Information hierarchy:
▫ data -> information -> knowledge -> intelligence
CISC3140-Meyer-lec4B
Information retrieval
• Another sub-category of IS
• Information organization IS NOT information retrieval
• Organization:
▫ categorizing and describing information objects in ways that
people can use them who need to use them
• Retrieval:
▫ being able to find the information objects you need when you
need them
• Two key concepts:
▫ precision: did I find what I wanted?
▫ recall : how quickly did I find it?
• Ideally, we want to maximize both precision and recall—this is the
primary goal of the field of information retrieval (IR)
CISC3140-Meyer-lec4B
IR Assumptions
1.
2.
3.
Information remains static
Query remains static
The value of an IR solution is in how good the retrieved
information meets the needs of the retriever
Are these good assumptions?
1. In general, information does not stay static; especially the internet
2. People learn how to make better queries
• Problems with standard model on the internet:
▫ “answer” is a list of hyperlinks that then need to be searched
▫ answer list is apparently disorganized
CISC3140-Meyer-lec4B
IR Process
• IR is iterative
▫ IR doesn’t end with the first answer (unless you’re
“feeling lucky”...)
▫ Humans can recognize a partially useful answer;
automated systems cannot always do that
▫ Human queries change as their understanding
improves by the results of previous queries
▫ Sometimes humans get an answer that is “good
enough” to satisfy them, even if initial goals of IR
aren’t met
CISC3140-Meyer-lec4B
IR Strategies
• Zone Search
▫ Typically free-text search in clearly defined area.
• Scoring and Ranking
▫ How closely text matched
▫ If multiple zones used, how many zones found
▫ Can weight (prioritize) different zones:
 Example:
score = 0.6(Brooklyn 2 neighborhood)+0.5(3 2
bedrooms)+0.4(1000 = price)
▫ Term Weight/Density
CISC3140-Meyer-lec4B
THERE IS A WHOLE LOT MORE
• There is a whole lot more to Information
Systems then these last few slides.
▫ But, again it will be MORE interesting later in
your careers.
• We WILL revisit this topic briefly when we
examine "agents" and "intelligent agents" at the
end of the class.
▫ Agents are concerned with beliefs/knowledge that
can be derived from information.
▫ IR (particularly search weight) is imporant.
CISC3140-Meyer-lec4B
Last Lab we created something
like this
tblUser
userID lastname firstname month day
1
Bob
Barker
12
11
2
Bill
Bixby
10
9
3
Ozzie Allcomefree 8
7
4
harry
potter
12
11
5
George
Lucas
6
10
• But again, this is inefficient, because users 1 and
4 have the same birthday and that information is
stored twice in the table
CISC3140-Meyer-lec4B
3 Tables (3rd Normal Form)
tblUser
userID lastname firstname
1
Bob
Barker
2
Bill
Bixby
3
Ozzie Allcomefree
4
harry
potter
5
George
Lucas
rltUserBday
userID bdayID
1
1
2
2
3
3
4
1
5
4
tblBday
bdayID month day
1
12
11
2
10
9
3
8
7
5
6
10
• At first this may not look more efficient, but
consider a table with 10 billion name entries.
• There are only 365 days in a year.
CISC3140-Meyer-lec4B
Table Integrity
• Another advantage of using 3 tables is that you are
preserving the table integrity by storing the data and the
relation in separate tables.
• BUT If you do this you have to be careful when
removing entries from the tables:
▫ Think of the user table as the primary table and the
other tables as auxiliary to that table
▫ If you remove an entry from the user table, you must
also remove the corresponding entry for the birthday
the relation.
CISC3140-Meyer-lec4B
Simple Deletion 1-Table Model
mysql> DELETE FROM tblUser
WHERE lastname=’potter’ AND
firstname=’harry’;
CISC3140-Meyer-lec4B
Simple Deletion 3 Table Model
mysql> DELETE FROM rltUserBday WHERE userID=4;
mysql> DELETE FROM tblUser WHERE lastname=’potter’ AND
firstname=’harry’;
Or if you don’t know what the userID value is:
mysql> DELETE FROM rltUserBday WHERE userID=( SELECT
userID FROM tblUser WHERE lastname=’potter’ AND
firstname=’harry’);
mysql> DELETE FROM tblUser WHERE lastname=’potter’ AND
firstname=’harry’;
CISC3140-Meyer-lec4B
Take a deep breath.
You got this!
CISC3140-Meyer-lec4B
Continuing with the Lab
• Finishing up sql labs 1 & 2
• Meeting with project group to discuss database
design & review project schematics.
Download