Data structure and file structures - International University of Japan

advertisement
Data Structure &
File Structure
Hun Myoung Park, Ph.D.,
Public Management and Policy Analysis Program
Graduate School of International Relations
International University of Japan
2
Outline
Data Structure
Language Translators
Software Development
Software Analysis and Design
Programming
Implementation & Maintenance
Documentation
3
Data Type
Abstract data types (ADT) include
 Stack (LIFO)
 Queue (FIFO)
 List, tree and graph
Primitive type: number, string, boolean
Composite type (bundling of elements): array
and record
4
Data Structure
A way of storing and organizing data.
“A collection of related variables that can be
accessed individually or as a whole”
“A set of data items that share a specific
relationship”
Implements abstract data types
Arrays, records, and linked lists.
Array (fixed-size sequence of elements) vs. list
(variable size)
5
Data Structure: Array
A sequenced collection of elements
Elements may or may not share the same
data type
Array name, or array name and index
(subscript) to refer to elements.
a[0], a[1], a[2]… instead of a0, a1, a2, …
Array name alone a means a[0], a[1],…
Multi-dimensional array a[0][3]
6
Data Structure: Record
A collection of related elements (fields or
attributes) of an entity
The name of a record is the whole structure
name (e.g., student)
The names of fields (e.g., name, id,…)
student.name, student.id, student.age, …
Array of records (e.g., student[1].name,
student[1].id, … student[2].name,
student[2].id, …)
7
Data Structure: Linked List
“A collection of data in which each element
contains the location of the next element.”
Consists of data and link
Data contain information to be processed
Link contains a pointer (address) that
identifies the next element in the list.
The last element contains data only (null
pointer)
8
Array versus Linked List
9
File Structure
“An external collection of related data
treated as a unit”
Used to store data permanently in a
secondary storage device or auxiliary
Examples are a MS Word file and display of
information on the screen
Sequential access (one record after another
from the beginning to end) versus random
access
10
Sequential Files 1
Sequential access method
Each record is accessed one after another
from the beginning to end.
Master and transaction files for update
Cost saving (efficiency) and data security
11
Sequential Files 2
12
Indexed Files 1
Random access method
Consists of a data file and its index
An index contains the key of the data file
and the address (record number) of the
corresponding record
An index is sorted based on the key values
(attributes) of the data file
Find the desired key and retrieve its address,
and then access the record.
13
Indexed Files 2
14
Hashed Files 1
Random access method
Use a mathematical function for mapping a
key to the address
User give a key  the hash function maps the
key to the address  then passes to the OS
 record is retrieved
No need to have an index
Direct, modulo division, digit extraction, and
collision hashing
15
Hashed Files 2
16
File Systems
Control how data are stored and read
Unix/Linux: ext2, ext3, ext4, and others
Mac: HFS, HFS Plus
MS Windows: FAT (File Allocation Table),
FAT32, NTFS (New Technology File System)
17
Directories 1
A special type of file containing information
about other files
A directory itself is a file
An index telling where files are located
Organized as a tree (hierarchical structure)
Each directory except the root directory has
a parent directory.
18
Directories 2
Root directory (/)
Working directory (current directory)
Parent directory versus child directory
Absolute path versus relative path
/home/kucc625/www
kucc625/www (assuming /home as a
working directory)
19
Binary Files
A collection of data stored in the internal
format of the computer.
Use all 256 (8 bits) bit-string patterns
Data can be character, integer, floatingpoint numbers, and/or other type of data.
Object files, images, videos, sounds, and
formatted text files (e.g., MS Word file) are
binary files
20
Text Files
A sequence of lines and plain texts
A file of characters.
Each byte is written in 128 ASCII codes (MSB is
0 and remaining 7 bits are used)
Even a text file eventually stores data in 0’s
and 1’s
Readable in text editors and many
applications as well
21
ASCII Files 1
Text format containing ASCII characters
Depending on the delimiter (separating date
items)
Free format (space delimited)
Comma delimited format or comma
separated values (CSV). Text in quotes
Tab delimited format
Fixed format
22
ASCII Files 2
23
Files for Specific
Applications
Formatted for specific applications
MS Word (.doc & .docx) and Excel (.xls &
.xlsx) have their own format.
Unlikely to be shared by multiple
applications.
One application (program) and its specific
data format.
One program-one data file?
24
References
Forouzan, Beherouz. 2013. Foundations of
computer science, 3nd ed. Cengage
Learning EMEA.
Download