Challenges in Natural Language Processing:

Fundamentals/ICY: Databases 2012/13 REVISION WEEK John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK Structure of Exam  One and a half hours. Do THREE Questions. Qn 1: on SQL: obligatory (Fundamentals and ICY). Weight: 32%  Fundamentals:    Do two out of remaining three Questions. Weight: each 34%. One is on maths (relations, relational algebra) No maths in the other two questions.  ICY:    Do two out of remaining four Questions. Weight: each 34%. One is on maths (relations, relational algebra). No maths in the other three questions.  Hence Fundamentals and ICY students have same DEGREE OF CHOICE (2/3) over the material they are EXPECTED to know. Fundamentals students have more pressure to do the Maths question. Structure of Exam, contd The Maths question is on the connections of the maths material to  SQL (what SQL operators do, translation to and from relational algebra).  database concepts (nature of tables, functional dependencies, operations on tables, nature of relationships between entity types) Material Needed for Exam  Lecture material except when specified as optional ((incl. by double brackets))  Required textbook reading:  Anything may be useful in the exam, except of course that a detailed memory of specific, data-full examples is not expected, and except for some SQL detail (see next slide).  All Additional Notes, except that:   The exam will NOT rely on the treatment of functional dependencies and normalization there (in the 1st of the three parts in the Week 9 batch). The exam will NOT rely on material on physical design (in the 3rd of the three parts in the Week 9 batch).  All Exercise Answer Notes.  BUT NOT the content of Keerthi’s lecture on his industrial experience (but of course it may help you in overall understanding of some issues). Textbook Parts See my module website (top page). On SQL: the exam doesn’t rely on fine detail beyond what’s in the handouts (and occasional lectures). MODULE REVIEW What We Mainly Studied  The nature of relational databases, the central modern type of database. Entity types represented as tables, holding relations.  Some basic mathematical concepts underpinning relational databases, and useful also in many other branches of CS.  Key aspects of how to develop the conceptual/logical design of relational databases.  In particular, how to achieve certain types of good structuring, to help achieve certain types of correctness and efficiency.  How to create and query databases using a particular database language, PostgreSQL (a version of SQL: very widely used in various forms). Initial Considerations What a database is, and how it relates to other types of data structure/repository in CS. Data integrity, data redundancy, data anomalies. Associative links between parts of a database, as opposed to pointing. Ways data is stored/linked in physical human media such as diaries, address books and timetables. Various complications in tables in human documents. Restricted type of table used in relational DBs. Entities and Relationships Any relational DB as consisting of entity types and relationships between them: Entity Relationship Model (ERM) in general. Specific ERMs for specific applications, and distinction from Entity Relationship Diagrams (ERDs). Entity Types as represented by tables. The question of what types of thing should correspond to “entity types,” and hence tables, depends on the application and your design judgment. Attribute Determination and Keys One or more attributes determining another attribute. Can also be described as that attribute being functionally dependent on the former attributes. Various notions of key, especially superkeys, candidate keys and primary keys … And foreign keys as the (main) implementation of relationships between entity types. Strength and Weakness Strong and weak relationships. (Also called identifying and non-identifying relationships respectively.) Weak entity types, as defined according to the strength of their relationships to other entity types and existencedependence with respect to those types. Depiction of strength or weakness in different styles of ERD. Connectivity, Cardinality & Participation Connectivity: uniqueness or multiplicity of entities at either end of a relationship. Cardinality: precise numerical info about how many entities allowed or required at either end of a relationship. Participation: optionality or mandatoriness of a relationship, in either direction. Overlap between these notions. Notation in ERDs. Table Representation of Relationships of Different Connectivities  Basic case is 1:M non-recursive. (Recursive is when two or more entity types in a relationship are the same.)  1:M recursive—can often be handled within a single table.  M:N, M:N:P, etc. standardly handled by breaking down into two, three, etc. 1:M relationships going to a new entity type: a “bridging” or “linking” type.  Symmetric relationships (e.g. marriage). Special problems of redundancy arise. Can be avoided by a special implementation involving two extra tables, together serving a bridging function.  Symmetric relationships are recursive and are either 1:1 or M:N. 1-1 Relationships  If you find yourself using a 1-1 relationship, especially when unchanging, mandatory in both directions, and non-recursive: ask yourself whether the two entity types could be combined into one without causing difficulty.  E.g., if there were an unchanging 1-1 relationship, mandatory both ways round, between employees and phone stations, you could probably combine into one entity type without difficulty, increasing efficiency of some operations.  But if the relationship changes frequently, easier to have 2 types/tables.  And if the relationship is optional in at least one direction, using one type leads to more wasted space. 1-1 Relationships, contd  Some good, standard uses of 1-1 relationships:  [As just mentioned] Cases where there is a significant amount of change or optionality.  Subtype/supertype relationships: naturally 1-1; useful to keep the types separate.  Symmetric 1-1 relationships such as marriage: only one entity type anyway. Other Representation Issues Multivalued attributes. OK in themselves in early stages of design, but should eventually be broken down into single-valued attributes in some way. A main divergence in ways of doing this is based on whether the different values are for stably identifiable sub-attributes. Generalization hierarchies. Exhaustiveness, disjointness. Normalization What normalization is and what role it plays in the database design process. The normal forms 1NF, 2NF, 3NF, BCNF.  4NF (two versions) was left as optional material. How tables can be transformed from lower normal forms to higher normal forms. That normalization and ER modeling are used concurrently to produce a good database design, helping to eliminate data redundancies & anomalies. That some situations benefit from non-normalization to gain efficiency for some operations. Creating ER Models/Diagrams Designing an ER model for a database is an iterative process, because, e.g.:  As you proceed, you think of new ways of conceiving what’s going on (much as in ordinary programming)  Multivalued attributes need to be re-represented  M:N relationships can be included as such at an early stage, but generally need to be replaced by bridging entity types at some point  1:1 relationships raise a red flag, though may be justified.  Special supertype/subtype notation needs eventually to be converted into more standard diagram notation, to correspond to the actual tables used.  Conversion to Normal Form (NB: different parts of the DB may have different needs) SQL Mainly, the module only covers basic SQL mechanisms for querying, creating and updating tables. See manual and textbook for much more if you want! MATHEMATICAL VIEW Tuples, Relations and Tables A relation on some sets A, B, C, … is simply a set of tuples all of the same length, where in each tuple the first element is from A, the second from B, etc. A relation is therefore a subset of the Cartesian product of those sets. A row is a tuple. Hence a table at any given moment induces a relation over the value domains of the table (augmented as appropriate with the value NULL). The table consists of not just the induced relation but also the attributes themselves, their domains, specification of primary and foreign keys, etc. Functionality of Relations  Functional relation from A, B, …C to some sets: for each choice of values from A, B, …, the relation contains at most one tuple starting with those values.  Also called a partial function.  Functional dependence relationship, i.e determination relationship X, Y, …, Z  U within a table: induces a functional relation from the value domains for X, Y, …, Z to the value domain for the determined attribute U. Relations from Entity Relationships  The connection between relationships in ERMs and mathematical relations.  E.g., the EMPLOYED-BY relationship from the People entity type to the Organizations entity type says that the database (at any moment) stores a relation from the People entity set to the Organizations entity set.  The connection between connectivity of a relationship between entity types and the issue of whether the corresponding relation is one-to-one, functional, etc.  The connection between mandatoriness of a relationship from entity set E to entity set F and notion of relation totality. (A mandatory relation is total on the set of current E things.) Some Operations on Sets in General Union, intersection, difference, symmetric difference and Cartesian product of two sets X and Y (of any sort). When X and Y are relations: Note the set of all possible concatenations of a tuple within X and a tuple within Y. I have called this the flattened Cartesian product of X and Y, notated as X  Y as opposed to X  Y. Often called the relational product in the DB world. (Shouldn’t be, but often is, called the Cartesian product.) Relational DB Operators & Relational Algebra Defines theoretical way of manipulating tables using “relational DB operators” that mainly manipulate the relations in the tables. • SELECT • UNION • PROJECT • DIFFERENCE • JOIN (various sorts) • PRODUCT • INTERSECT • (DIVIDE) Use of relational DB operators on existing tables produces new tables. Strong connection to SQL commands/operators. Relational algebra puts relational DB operators into a mathematical notation that is more convenient than, e.g., SQL operators. QUESTIONS?

Challenges in Natural Language Processing:

Related documents

Products

Support

Challenges in Natural Language Processing:

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib