Data Structure & File Structure Hun Myoung Park, Ph.D., Public Management and Policy Analysis Program Graduate School of International Relations International University of Japan 2 Outline Data Structure Language Translators Software Development Software Analysis and Design Programming Implementation & Maintenance Documentation 3 Data Type Abstract data types (ADT) include Stack (LIFO) Queue (FIFO) List, tree and graph Primitive type: number, string, boolean Composite type (bundling of elements): array and record 4 Data Structure A way of storing and organizing data. “A collection of related variables that can be accessed individually or as a whole” “A set of data items that share a specific relationship” Implements abstract data types Arrays, records, and linked lists. Array (fixed-size sequence of elements) vs. list (variable size) 5 Data Structure: Array A sequenced collection of elements Elements may or may not share the same data type Array name, or array name and index (subscript) to refer to elements. a[0], a[1], a[2]… instead of a0, a1, a2, … Array name alone a means a[0], a[1],… Multi-dimensional array a[0][3] 6 Data Structure: Record A collection of related elements (fields or attributes) of an entity The name of a record is the whole structure name (e.g., student) The names of fields (e.g., name, id,…) student.name, student.id, student.age, … Array of records (e.g., student[1].name, student[1].id, … student[2].name, student[2].id, …) 7 Data Structure: Linked List “A collection of data in which each element contains the location of the next element.” Consists of data and link Data contain information to be processed Link contains a pointer (address) that identifies the next element in the list. The last element contains data only (null pointer) 8 Array versus Linked List 9 File Structure “An external collection of related data treated as a unit” Used to store data permanently in a secondary storage device or auxiliary Examples are a MS Word file and display of information on the screen Sequential access (one record after another from the beginning to end) versus random access 10 Sequential Files 1 Sequential access method Each record is accessed one after another from the beginning to end. Master and transaction files for update Cost saving (efficiency) and data security 11 Sequential Files 2 12 Indexed Files 1 Random access method Consists of a data file and its index An index contains the key of the data file and the address (record number) of the corresponding record An index is sorted based on the key values (attributes) of the data file Find the desired key and retrieve its address, and then access the record. 13 Indexed Files 2 14 Hashed Files 1 Random access method Use a mathematical function for mapping a key to the address User give a key the hash function maps the key to the address then passes to the OS record is retrieved No need to have an index Direct, modulo division, digit extraction, and collision hashing 15 Hashed Files 2 16 File Systems Control how data are stored and read Unix/Linux: ext2, ext3, ext4, and others Mac: HFS, HFS Plus MS Windows: FAT (File Allocation Table), FAT32, NTFS (New Technology File System) 17 Directories 1 A special type of file containing information about other files A directory itself is a file An index telling where files are located Organized as a tree (hierarchical structure) Each directory except the root directory has a parent directory. 18 Directories 2 Root directory (/) Working directory (current directory) Parent directory versus child directory Absolute path versus relative path /home/kucc625/www kucc625/www (assuming /home as a working directory) 19 Binary Files A collection of data stored in the internal format of the computer. Use all 256 (8 bits) bit-string patterns Data can be character, integer, floatingpoint numbers, and/or other type of data. Object files, images, videos, sounds, and formatted text files (e.g., MS Word file) are binary files 20 Text Files A sequence of lines and plain texts A file of characters. Each byte is written in 128 ASCII codes (MSB is 0 and remaining 7 bits are used) Even a text file eventually stores data in 0’s and 1’s Readable in text editors and many applications as well 21 ASCII Files 1 Text format containing ASCII characters Depending on the delimiter (separating date items) Free format (space delimited) Comma delimited format or comma separated values (CSV). Text in quotes Tab delimited format Fixed format 22 ASCII Files 2 23 Files for Specific Applications Formatted for specific applications MS Word (.doc & .docx) and Excel (.xls & .xlsx) have their own format. Unlikely to be shared by multiple applications. One application (program) and its specific data format. One program-one data file? 24 References Forouzan, Beherouz. 2013. Foundations of computer science, 3nd ed. Cengage Learning EMEA.