comp4_unit6a_audio_transcript

advertisement
Introduction to Information and Computer Science: Databases and SQL
Audio Transcript
Slide 1
Welcome to Introduction to Information and Computer Science: Databases and
SQL. This is Lecture (a).
The component, Introduction to Information and Computer Science, is a basic
overview of computer architecture; data organization, representation and structure;
structure of programming languages; networking and data communication. It also
includes basic terminology of computing.
Slide 2
The Objectives for Databases and SQL are to:







Define and describe the purpose of databases
Define a relational database
Describe data modeling and normalization
Describe the structured query language (SQL)
Define the basic data operations for relational databases and how to implement
them in SQL
Design a simple relational database and create corresponding SQL commands
Examine the structure of a healthcare database component
Slide 3
It is important to review some computer science basics in order to understand the
details of information storage. Remember that for a computer, data consists of ones and
zeros. In other words, every data value is represented by a combination of binary ones
and zeroes, or simply values of on and off, for example, the number 01000001 [zeroone-zero-zero-zero-zero-zero-one] on this slide. When a computer system examines
this value it does not know what it represents. It depends on the system application’s
knowledge of the underlying data. If the application is a text editor, it knows that this
value represents a capital ‘A’ as defined by the American Standard Code for Information
Interchange, better known as ASCII [as key]. On the other hand, in the context of data
in a spreadsheet, the cell formatting may indicate that the stored binary number actually
represents the number 65. That binary data may also be used in many other ways,
including as a central processing unit instruction or as part of an audio or video file.
Slide 4
One very large component of computer systems is the management of data. Consider
the information maintained on a personal computer – this might include programs,
photos, music, videos, tax returns and class papers, just to name a few. Some files may
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
1
remain unchanged; others might be modified over time, such as revisions to a class
paper.
Now consider an electronic healthcare record (EHR) system that may contain
information for thousands or tens of thousands of patients. With this volume of
information, it is important that information be stored efficiently, is quickly accessible,
and has the capacity to be updated.
Slide 5
Data can be stored electronically in different ways. The first way is to store it in a simple
text file. Another is to store it a spreadsheet, which is more powerful than a simple text
file. Finally, data can be stored in databases, which is the topic of this unit. Before
discussing databases, this lecture will provide information about the other options for
data storage and when they are appropriate to use.
Slide 6
A file is a collection of information, or data, stored in a single electronic location. How
that information is stored in files is also important. Files can contain text or data that is
not readable by humans. If data is to be accessed by a person, then it needs to be
human-readable; however, a computer system may use a different format – it just needs
to know how to interpret the data. For example, an audio file and a text file contain
information that is stored in different formats. A text editor cannot edit an audio file, and
a music player cannot play a text file. An audio file is not readable by humans, but its
data can be interpreted by a music player and converted to the music that humans
listen to.
Slide 7
Files are stored on file systems, which every computer has. Because of that, every
computer has the ability to create, use and store files. Files can be easily shared; email
and shared drives are some options for sharing. They're also used by many
applications; for example, a PowerPoint presentation is stored in a file. Also, many
scientific applications and instruments use input data files and/or generate output data
files. For example, genomic data is often stored in large data files that are searched
and parsed by special programs.
On the other hand, files have limitations. The security of files is limited to that of the file
system. Also, by default, multiple users cannot use the same file at the same time.
Usually, one user can open the file for editing; any additional users open a read-only
copy of the file. Finally, using files to store structured data with relationships can result
in redundancy and inconsistency as shown in the following example.
Slide 8
This slide shows a file that contains contact information for individuals and their
organizations. It contains names, such as Bill Robeson, Walter Schmidt and Mary
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
2
Small and the corresponding organization names and addresses – there are only two
different organizations in this small sample.
Slide 9
After review of the previous slide, answer the following questions:

Do Bill and Albert work for the same company?

What is the difference between Catherine and Walter’s information?

If a computer application was looking at the data would it be able to tell that there
was an issue with the addresses for Catherine and Walter?

Can you sort this list by last name?

Could you sort a list of 10,000 contacts?
Slide 10
While it is easy to see that Bill and Albert work for the same company, note that in the
file, Bill’s company name is Community Hospital Inc. and Albert’s is Community
Hospital Incorporated. This There is a similar issue with Catherine and Walter’s
information. Catherine’s address is 14 12th Street with street spelled out. Walter has the
same address, but notice in this case that 14 12th St. uses the abbreviation for street.
Humans can easily handle these variations in data and determine that they are the
same. However, a computer system, even one with an artificial intelligence system,
would have significantly greater challenges in determining that the companies and
addresses are the same.
And while sorting these few entries may be feasible just looking at the list, sorting a file
with 10,000 contacts would be extremely time-consuming without the use of technology.
Slide 11
A bigger challenge might be if “Community Hospital, Inc.” becomes “Community
General”. If this change were done manually, or with an automated system, every single
instance of “Community Hospital” would have to be identified in the data. Additionally,
every different representation of “Community Hospital”, for example, “Community
Hospital, Inc.” and “Community Hospital Incorporated”, would also need to be located;
and there is no guarantee that the word “Community” was spelled correctly in every
instance.
Once all of the entries are identified, each one needs to be modified to correctly read
“Community General”, If done manually, this still has the potential for human data-entry
error and in a large system would be very time-consuming. If it is done using a simple
search and replace automated function, it may not take as long, but it may or may not
result in partial changes to other existing records, for example, Portland Community
Hospital being changed to Portland Community General.
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
3
Slide 12
Spreadsheet applications were first developed for businesses to automate accounting
tasks. Today, spreadsheets are widely used for storing, manipulating and presenting
data. Today's spreadsheet applications perform calculations using predefined or usercreated formulas. They provide features for easily sorting and filtering data and can
even perform data analysis. Advanced spreadsheet users can create very complex
calculations and relationships between data.
Spreadsheets have become very powerful tools for data analysis and manipulation.
However, they still have the same limitations as plain text files as shown on the
following slide.
Slide 13
Here is an example of an OpenOffice Calc spreadsheet. (Other spreadsheet
applications include Microsoft Excel, IBM Lotus Symphony, Corel QuatroPro, Apple
Numbers and Google documents spreadsheets). The data is organized into numbered
rows and lettered columns; column names can be provided in the first row. The data
itself does not look very different from the data in the simple text file; however, there are
vast options for manipulating and presenting this data on the menus above the data.
We can sort the data very easily and quickly, unlike plain text files. Regardless,
spreadsheets have the same problems as the text file--there is data defined multiple
times (company name and address) which is inefficient and error prone.
Slide 14
Since spreadsheets are just a special type of file, they have similar advantages and
disadvantages. While spreadsheets require a special application such as Microsoft
Excel, these applications are widely available. Spreadsheet applications provide
powerful calculations and basic sorting and filtering. But like files, they have limited
security, multiple user access and a potential for redundant and inconsistent data.
Spreadsheets are good for doing calculations on static snapshots of datasets, but they
aren't the best solution for long term storage and access of data.
Slide 15
So what exactly is a database? A database is a structured data collection which is
accessed electronically. in other words, it is information stored on a computer for
access through a computer program. The text file used in this lecture that contained the
contact information can be considered to be a very simple database – it contains
organized, though not necessarily consistent, information, that might be accessed
through a text editor. A relational database is a database that maintains relationships
between data elements and are the focus of this unit.
Slide 16
The concept of a relational database was first published by E.F. Codd in
“Communications of the ACM” in June 1970. Codd held the view that users should not
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
4
have to keep track of how the information is stored in a computer in order to use it. To
quote Codd, “Future users of large data banks must be protected from having to know
how the data is organized in the machine (the internal representation).”
In other words, users should not have to know whether the binary bits discussed
previously in this lecture represent a capital A or the number 65, or even what they are
related to – rather, the system should keep track of that information once it is provided
by the user.
So a relational database is an organized collection of data accessible by electronic
means where the information type and information relationships are maintained
internally by the system itself.
Slide 17
A relational database consists of one or more tables defined by the database designer
in a meaningful fashion. A table is a collection of information organized into rows and
columns. Each table contains one or more rows of data. The data in a row is ordered by
columns, and each column is of a known and specified type where the data and type
are independent. The order of rows in the table is irrelevant, but the order of the
columns in the row is significant.
Slide 18
A relational database has quite a number of advantages over files and spreadsheets.
Database systems are designed to be highly secure; control to data can be precisely
defined. In addition, databases are designed to be accessed and modified by multiple
users at the same time. Relationships between tables support organized data that
prevents data redundancy and inconsistency. The highly optimized underlying data
structures used by the relational database result in highly efficient and fast access.
Because a database system is designed for the specific purpose of data organization,
the basic operations of retrieving, adding, modifying and deleting data are more efficient
than general-purpose applications and storage such as spreadsheets and files.
Furthermore, relationships and efficient access allow for complex queries and searches
of data.
On the other hand, databases are complex systems that require expertise to install,
maintain and use. There are free, open-source databases, but the commercially
available databases are very expensive. In comparison, files and spreadsheets are
more widely available and easy to use. Also, data in databases is not as easily
analyzed using complex data calculations. Instead, data is usually exported from
databases into a spreadsheet or data file for statistical software.
Slide 19
This concludes Lecture (a) of Databases and SQL. There are several options for data
storage including files, spreadsheets or databases. Files and spreadsheets are widely
available and are good for data computations. Databases are very secure and
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
5
optimized systems for storing, accessing and modifying data over the long term.
Multiple users can access and modify data at the same time. Furthermore,
relationships are stored in a database along with the data which allows for less data
redundancy and inconsistency as well as for complex queries.
Slide 20
References slide. No audio.
Health IT Workforce Curriculum Introduction to Information and Computer Science
Version 3.0 / Spring 2012
Databases and SQL
Lecture a
This material (Comp4_Unit6a) was developed by Oregon Health and Science University funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC000015.
6
Download