Course overview

CS 440

Database Management Systems

Lecture1: Course overview

Welcome to CS440!

•

Arash Termehchy

•

Assistant Professor at EECS

•

I nformation & D ata Manag e ment and A nalytics

( IDEA ) Lab @ OSU

•

Research on databases and data analytics

Tell Us About You

•

Name

•

Department & program

•

Technical interests

•

Non-technical interests

•

How do you store and query your data?

This course is about data management

•

Manual processing: 1900

•

Mechanical punch-cards: 1900 - 1955

•

Stored-program computers: sequential record processing: 1955 - 1970

•

Online navigational network databases: 1965 -

1980

•

Relational Databases: 1980 - 1995

•

Post-relational and the Internet: 1995 -

Database management system (DBMS)

•

W. McGee, Generalization: Key to Successful

Electronic Data Processing , Journal of ACM, 1959.

•

Data processing was mostly ad-hoc programs

•

We need generalization (abstraction):

–

Operation

: sort, select part of the file, …

–

File: A sequence of records

•

It makes our systems usable and scalable .

–

More people can use them

–

Easier to extend for large number of large data sets

Generalization is the key

•

How to develop correct and usable generalizations for our data and query?

–

Data & Query Model

– Relational model, Web data model, …

•

How to implement these models efficiently?

–

Database systems internal

– Storage management, access methods, ….

Course objective:

Data models & systems

•

Learn the fundamental concepts and ideas

–

Foundational models, algorithms, and systems.

–

By reading and lectures.

•

Develop systems

–

Apply the lessons learned to interesting data problems.

–

By doing assignments.

This course is not about learning basic concepts in data management

•

We do not discuss

–

ER model, relational model, relational algebra,

SQL, database programming

•

You should know them already

–

Take CS 340

•

We review some of them to refresh your memory.

The Era of Big Data

•

Technological shifts (

Web, cheap hardware, mobile, sensors, …) created a staggering number of enormous data sets.

•

There exists both opportunities and challenges .

Opportunities are priceless!

The story of John Snow

“In the mid-1850s, Dr. John Snow plotted cholera deaths on a map, and in the corner of a particularly hard-hit buildings was a water pump. A 19 th -century version of Big Data , which suggested an association between cholera and the water pump.”

Integrating data sets has saved millions of lives!

Paradigm shifting influence on scientific discovery

• “

The Fourth Paradigm: Data-Intensive Scientific Discovery

”,

Jim Gray

–

Empirical

–

Theoretical

–

Computational

–

Data-centric

•

Sloan Sky Server database is a top cited resource in the field of astronomy.

–

Astronomical observation => database query

Unreasonable effectiveness of data

•

A. Halevy, The unreasonable effectiveness of data , IEEE Intelligence Systems, 2009.

•

More data outperforms complex statistical models in prediction and discovery.

–

Spread of diseases by analyzing Google query log

•

We do not need more complex statistical models.

Traditional systems cannot deal with today’s data sets.

•

Hardon Colider can generate 500 exabyte per day.

•

Sloan Sky Server will soon store 30 terabyte per day.

•

Advances in hardware outpaced DBMS technology.

Traditional systems cannot deal with staggering number of data sets.

•

RDMS used to deal with a single static database.

•

We need to transform and or integrate large number of evolving data sets.

•

Impossible to do manually.

“If you’re an data integration expert, you always find jobs!”

Current systems are not built for scientists and normal users .

“….(in the next few years) we project a need for 1.5 million additional analysts in the United States who can analyze data effectively…“,

-- McKinsey Big Data Study, 2012

“It may take a PhD in computer science to successfully deploy a data analytics algorithm!”

Our plan

•

Learn the fundamental concepts and ideas

–

Foundational models, algorithms, and systems.

–

Textbooks, resources, and lectures.

•

Apply them to new problems

–

Apply the lessons learned to interesting database problems.

–

By doing assignments.

Learning the fundamentals: Lectures

•

Review and discuss the material.

•

Will be available on the course website after the class.

•

Provide the road map for studying

–

The course material can seem overwhelming.

•

Attendance is not required but encouraged .

•

Read the course material before the class.

•

Participate and ask questions!

Learning the fundamentals: Readings

•

Textbooks:

–

Database management systems , 3 rd edition ,

R. Ramakrishnan and J. Gehrke.

•

Cow book

–

Mining Massive data sets , Jure Leskovec, Anand

Rajaraman, Jeff Ullman.

•

Free Online

–

Papers for newer material: posted on the course website.

Learning the fundamentals: Readings

•

Recommended

–

Database systems: the complete book, 2 nd edition, Hector

Garcia Molina, Jeffry Ullman, and Jennifer Widom.

•

The complete book

–

Foundations of databases, Serge Aitboul, Richard Hull,

Victor Vianu

•

Alice book

Learning the fundamentals: Exam

•

Midterm exam in class.

–

Closed books and notes

–

Tests your knowledge of the subjects discussed in the class.

–

40% of the overall grade

–

In class

•

No final exam

Apply your understanding

•

Seven assignments:

•

Announced on Piazza and course website, posted on the course website.

•

Both written and programming.

•

Submit using TEACH

•

Write using word processors and submit in pdf.

•

Start early!!!

•

60% of the overall grade

How to get the most out of the course?

•

Communicate with the course staff

–

TA: Laxmi Ganesan

–

Piazza

• preferred method of communication

–

Office hours

•

Arash: Tuesday/ Thursday 4:30 – 5:30

•

Laxmi: Friday 1 – 2 pm

–

Email the staff for other types of questions

•

Use [cs440] tag in the subject line.

•

Communicate with your peers on course materials and lectures.

•

Check the Piazza and course website for announcements or possible changes in the schedule.

What is next?

•

A review of relational model, relational algebra, and SQL.

•

Assignment 1 will be posted tomorrow night!

•

You refresh your memory by working on some problems on relational model and database design.

Course overview

CS 440

Database Management Systems

Welcome to CS440!

Tell Us About You

Generalization is the key

The Era of Big Data

Unreasonable effectiveness of data

Learning the fundamentals: Exam

Apply your understanding

What is next?

Related documents

Products

Support

Course overview

CS 440

Database Management Systems

Welcome to CS440!

Tell Us About You

Generalization is the key

The Era of Big Data

Unreasonable effectiveness of data

Learning the fundamentals: Exam

Apply your understanding

What is next?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib