Notes (Wrapup)

Wrapup Amol Deshpande CMSC424 DBMS at a glance  Data Models  Conceptual representation of the data  Data Retrieval  How to ask questions of the database  How to answer those questions  Data Storage  How/where to store data, how to access it  Data Integrity  Manage crashes, concurrency  Manage semantic inconsistencies  Not fully disjoint categorization !! DBMS at a glance  Data Models  E/R Model, Relational model  Very simple and hence effective  Easy to make things complicated, very hard to keep them simple  No other data model has survived for so long  What is the future of XML ? DBMS at a glance  Data Retrieval  How to ask questions of the database  Declarative languages are great  Hide complexity from users, can optimize things, can evolve easily  SQL – More or less declarative  How to answer those questions  Parsing --> Optimization --> Processing  Operators: Hashing, sorting, joins, aggregation  Data structures – Hash indexes: Good for equality queries – Tree indexes: For everything else  Optimization: Complex, but key piece of a database system DBMS at a glance  Data Storage  How/where to store data, how to access it  Need to be cognizant of the memory hierarchy  Memory is cheap, disk is very expensive to access  Further disk is cheap to access sequentially, much more expensive to access randomly – Many of our decisions are influenced by this  RAID: Surviving failures  Accessing data: Indexes  What happens if a new form of storage comes along with different properties (say holographic storage ?)  We will need to rethink the tradeoffs, but we now know the approach DBMS at a glance  Data Integrity  Manage crashes, concurrency  Transactions, 2-phase locking  Write-ahead logging  DBMS pretty much the last word on concurrency/recovery  OSs don’t come close to supporting anything like that  Manage semantic inconsistencies  Normalization, FDs  Not easy to identify tools, but we have learned how to think about them – Try to capture them in the E/R diagram as much as possible Motivation: Data Overload  We began the first lecture with discussing the data overload  Huge amounts of data generated every day  Much faster than our ability to process it  Increasing ability to capture more enterprise data  Web, blogs, RSS Feeds etc  Multimedia – Flickr and cellphone cameras has led a revolution in how people take pictures – Videos will be next – Not hard to imagine capturing every moment of your life  Sensor/RFID data – Tiny sensors/RFID just beginning to become ubiquitous – Billions of these generating a tiny amount of data every second is still too much  Biological/Scientific data Motivation: Data Overload  Relational databases help for structured data  But increasingly not sufficient  The things we want to do with data can’t be expressed in SQL  E.g. with biological data, web  Too much unstructured data  Distributed data generation creates additional headaches  Almost impossible to try to collect the data in one location  Making sense of this requires not only advances in data processing, but also in data understanding/mining  Interdisciplinary efforts Some Lessons from RDBMS  But can use the lessons learned from developing RDBMS  Data independence / abstraction is good  Hide details, even if initially it leads to inefficiency  Look for structure  Every seemingly highly unstructured data might have structure  Look for patterns in usage  Relational database are fast because query processing is predictable – Unlike say OS workloads which are very hard to optimize for  If you can identify patterns, you can probably optimize them  Declarative languages are great  Say what you want, not how to get it

Notes (Wrapup)

Related documents

Products

Support

Notes (Wrapup)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib