FILE ORGANIZATION STUDY GUIDE TABLE OF CONTENTS: 1 PAGE: 1. Disk Storage Concepts 2 2. Processing Files 3 3. ISAM 8 4. VSAM 9 5. Access Method Services 13 6. Variable Length Records 17 7. Alternate Index 19 8. Data Structures 24 9. Relational Databases 26 10. Normalization 28 11. Structured Query Language (SQL) 29 12. Embedded SQL 36 Appendix A. COBOL with SQL 40 Appendix B. Microcomputer SQL 43 Appendix C. Modulus 11 in COBOL 47 Appendix D. Modulus 11 in Pascal 48 Appendix E. CALL an external module in Fujitsu COBOL 51 Appendix F. VSAM COBOL Status Codes 52 DISK STORAGE CONCEPTS. DASD - direct access storage device inexpensive fast & reliable online data storage virtual storage update records in place CONCEPTS IN DATA ORGANIZATION. logical record - one user data record as processed by application program physical record (block) - the data between the gaps control interval – actual unit of data transfer in VSAM contains physical record plus control information IBM mainframe DASD devices can be one of two types: FBA – CKD blocking - how many logical records in one physical record add 1 block for the EOF on an FBA device Error detection and correction on an FBA device: CRC - cyclic redundancy check - write a control total with each record Parity checking is used between the host CPU and the controller, but CRC is used between the controller and the 3370. CONCEPTS IN ACCESSING DATA. access motion time (seek time) - select correct cylinder head selection - electronically activate head over a track rotational delay - half a rotation used as average data transfer rate - kilobytes / second 2 PROCESSING FILES. Definitions: File bit, byte, field, record, file, database - collection of related records. - called a data set in OS - called a cluster in VSAM volatility - frequency of record addition & deletion static = low dynamic = high activity - % of records accessed to total records size - leave room for future growth File processing - the manner in which the blocks are read from or written to. Examples: sequential - used in batch processing with activity % over 60 direct random indexed File Access Methods: SAM - sequential access method DAM - direct access method ISAM - indexed sequential access method VSAM - virtual storage access method VSAM: designed especially for DASD device independent position individual records of a file on the storage medium without respect to the physical characteristics of the DASD. Note: a record is accessed by its displacement, in bytes, from the beginning of the cluster (file); called RBA (relative byte addressing) supports both sequential and direct access 3 SEQUENTIAL FILE ACCESS METHOD - REVIEW Required mainframe DOS/VSE JCL: DLBL EXTENT ASSGN EXEC Required COBOL connection: SELECT / ASSIGN – Environment Division Which is first? JCL or COBOL? Processing in COBOL: OPEN determines which verbs will be allowed READ determines how records will be accessed CLOSE writes any records still in buffer, plus EOF Importance of making backups : Sequential access: File rotation scheme Direct access: Updating THE master file (versus OLD and NEW master files) Audit trail Any file access: Offsite / secure storage Site redundancy 4 PROCESSING INDEXED FILES. index - cross-reference table created and stored on disk - relates the key field to the corresponding address of the data - the key field is defined somewhere in the 01 record for the indexed file. An indexed file is created sequentially, sorted by key field, in ascending order. There can be no duplicate key field values. After creation, the file can be accessed sequentially or randomly. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL SECTION. SELECT filename ASSIGN TO device ORGANIZATION IS INDEXED / ACCESS IS <-RECORD KEY IS \ FILE STATUS IS . SEQUENTIAL (default) RANDOM DYNAMIC File status: a field in working storage defined as PIC XX holds status return codes in hexadecimal (hex) Reading this file: READ statement form varies with type of access used. Examples: READ; READ NEXT; READ KEY IS Dynamic access file can be processed with either sequential or random access. To update the file in place using the REWRITE statement: 1. obtain the value of the key field from transaction or algorithm 2. open the indexed master file for I-O 3. move the key field value to the key field of the master file record 4. READ a master record 5. make changes in the record area 6. REWRITE the master record 7. check system result by status code You can REWRITE without an initial READ. Should you? You can also DELETE the record, or WRITE new records: DELETE filename WRITE recordname Use of ALTERNATE KEYS: see chapter on Alternate Index (AIX). Duplicate values of alternate keys are permitted. 5 PROCESSING RELATIVE FILES. Relative files are another way (besides indexed files) to access files randomly. The key field is, or is converted to, the actual storage address for the record. Advantage: fastest access. No index needed. The key field, defined in working storage, is called the relative key. The programmer must calculate its value. The file is created as slots with blank records wherever a record does not exist with the key value. There may be an algorithm to compute the value of the key (see below). Alternate keys are NOT allowed. Variable length records are NOT allowed. SELECT ASSIGN TO ORGANIZATION IS RELATIVE ACCESS IS RELATIVE KEY IS FILE STATUS IS . To obtain the value of the relative key, if the key field is too large or not suitable (would leave too many gaps): 1. Best if no conversion needed; key field IS relative key. 2. HASHING algorithm (also known as randomizing). 3. Other algorithms exist. Collisions from synonyms must be handled. A collision is two or more records assigned to the same space, called a bucket. Note that a collision triggers an INVALID KEY and requires an algorithm to compute the new location of the record in the overflow area. There are "digit analysis" programs which can recommend the best algorithm to minimize collisions. Some hashing algorithms: 1. Divide the key value by a prime number just larger than the number of records, and use the remainder as the relative key. 2. Square the key value and truncate to the number of digits needed. 3. Use extraction of digit(s) in a fixed position of the key. 4. Split the key into two or more parts, add, truncate. 5. Radix transformation. Convert key to another base. 6 Efficient processing of relative files: It is NOT efficient to process relative files sequentially if the file was created with a hashing algorithm. 1. Records are "out of order". 2. Lots of unused disk space. COBOL file processing statements: OPEN I-O READ or READ INTO has several forms depending on type of access used: READ filename [AT END] – sequential read in sequential access READ filename KEY IS – random read in random or dynamic access KEY IS key field or KEY IS alternate key field – tells the system which field to use to read the file by READ filename NEXT – sequential read in dynamic access WRITE recordname INVALID KEY CLOSE START – used to establish the CRP (current record pointer) if you do not want to start at the beginning of the file; followed by a READ or READ NEXT for sequential read REWRITE recordname DELETE filename File Status return codes: The system returns a 2 character hexadecimal number after every I/O operation on an indexed file. These codes must be checked by the programmer after each I/O verb affecting the indexed file. A value of '00' indicates successful processing. Any higher value is a warning or error, although some warnings are expected during normal processing. Any value greater than '23' indicates a serious error, and processing should be terminated after showing the reason and closing all files. The file status return codes may be used with DECLARITIVES / USE AFTER ERROR to let the system show the code if any I/O error occurs for the file. In this case, INVALID KEY would not be used. EXAM 1 includes material to this point. 7 ISAM. Indexed Sequential Access Method An access method supporting sequential and direct file processing. ISAM maintains the logical order of records by an index, or cross-reference file, even when the physical order is changed from additions or deletions of records. To delete a record from ISAM, move HIGH-VALUES to the one-byte field at the start of the record. The record is NOT physically deleted. When the file is first built, it is built sequentially, and in order. When records are then added during an update, they are added in entry order at the end of the file. Therefore, after an update, some of the records in an ISAM file will be "out of order" physically. ISAM maintains a track index, prime data tracks, and overflow areas. The system maintains POINTERS to keep the logical order. Each cylinder of an ISAM file has one index. Each record of data is actually two parts: normal and overflow. Problems with ISAM: 1. After a time, ISAM files need to be reorganized (rebuilt) by the programmer, or performance suffers. Why? 2. ISAM is limited by the physical size and characteristics of the storage device. 3. Like any DASD file that is updated directly, care must be taken to provide: backups audit trails 8 VSAM. Virtual Storage Access Method - a revolutionary IBM product that replaced ISAM: an access method for direct or sequential processing of fixed and variable length unblocked records on direct access devices. logical records are stored in VSAM format, invisible to the user. Relative byte addressing (RBA) is used to describe the offset, in bytes, of a record from the beginning of the file. However, the physical location of indexed records changes as records are added. Index in virtual storage. All transparent to user. ADVANTAGES: more efficient than ISAM because index structure is handled in virtual storage as much as possible no overflow areas; free space is available throughout overcomes physical limitations of DASD device (device independent) – why? allows sequential, indexed, and relative file access DISADVANTAGES: sequential access alone would not be worth trouble of VSAM; however, indexed processing allows sequential access when preferred often, much wasted disk space for data space and catalog(s) Serious I/O errors can go undetected because VS COBOL does not handle VSAM error checking; the programmer must check the FILE STATUS codes. VSAM processes three types of files: KSDS - key sequence data set ESDS - entry sequence data set RRDS - relative record data set Records of a KSDS or ESDS file may be either fixed or variable length. Records of a RRDS are always only fixed length. 9 KSDS file processing: A file of existing data is built sequentially in ascending order by key value with no duplicates of the key field value efficient use of the index; kept mostly in virtual storage; data and index are only physically stored on master catalog by a CLOSE statement accessed sequentially or randomly free space can be allocated (NOTE: default = no free space) free space allows for records to be added or lengthened easily can support variable length records does not have to be reorganized often by programmer VSAM recovers space when records are deleted or shortened KSDS is most frequently used VSAM organization Definitions about the INDEX. Used to locate a record in a KSDS file both the index and the file are defined as a cluster Definition: the “main” index is called the prime index index relates the key field value to the RBA location in file RBA: value of the key field must not be altered during processing one record. Why? VSAM index is a file with one or more levels; each level is a set of control intervals each control interval has one index record that can have one or more index entries Definition: lowest level of the index is called the sequence set; one per control area records in all higher levels are collectively called the index set alternate index possible for KSDS; may have duplicates 10 VSAM CATALOGS and data storage concepts: A catalog keeps track of file and space characteristics. A catalog is similar to VTOC but relates to VSAM data space, and VSAM manages the location of data for you. Therefore, no EXTENT card! 4 major VSAM components: MASTER CATALOG optional USER CATALOG(s) data space files (clusters). CATALOG contains DATA SPACE; DATA SPACE contains CLUSTERS. CLUSTER - VSAM data set = a file. DEFINE a cluster does NOT mean that data is put into it; use REPRO command in IDCAMS utility program. See chapter on Access Method Services. You do NOT choose blocks; assigned when cluster defined. Password protection and overwrite protection possible. CONTROL INTERVAL (CI) - unit of data transfer between virtual storage & DASD. must be a binary multiple of 512 bytes; FBA device = 512 maximum size is 32,768 bytes (CKD only) size is independent of the type of DASD used, but the size is chosen by VSAM for the device on which it is defined stores data records and control information about them size & number of control intervals per control area is fixed VSAM chooses size & number of logical records. Depends on: maximum record size size of VSAM I/O buffers the type of DASD device used COBOL programmer does NOT block VSAM records. No recording mode! No label records! No block contains! It is possible to have spanned records (larger than one control interval) in ESDS or KSDS. 11 CONTROL AREA (CA) - contains control intervals & free space free space for KSDS additions is defined with cluster adding records out of sequence may cause a control interval split; this can continue as long as there is free space Remember, VSAM reclaims space if record is shortened or deleted. Control interval is normally the size of a cylinder on the DASD device Each Control Area has same number of Control Intervals DATA SPACE - storage area for use exclusively by VSAM, defined in the VTOC. Contains continuous set of Control Intervals. VSAM also maintains a 'pool' of free space. What about disk efficiency? CATALOG - contains information on VSAM data sets. This is an area on disk not used by files running under ISAM or SAM. We store SAM info in VTOC but VSAM names in the catalog. There is a Master Catalog (STUDENT.MCAT at DCC) and there can be User Catalog(s) if more security is needed. Also safer; if one catalog damaged affects only files in it. 12 ACCESS METHOD SERVICES. Processing of VSAM files requires the use of the Access Method Services, IDCAMS, to access the VSAM files through the VSAM catalog. VSAM files are identified by the COBOL external name. // EXEC IDCAMS,SIZE=AUTO syntax: COMMAND PARAMETER Commands available are Functional or Modal commands. functional: DELETE release storage space; includes options delete cluster removes index, data, and any alternate index defined on it. PURGE overwrite protected file ERASE binary 0 overwrite sensitive data DEFINE reserve catalog, space, or cluster REPRO almost any file organization to another (repro = reproduce) PRINT CHARACTER, HEX, or DUMP (both, by default) LISTCAT see cluster, data, and index names; catalog Ex. LISTC ENTRIES(base.cluster.name) ALL - VERIFY was EOF set properly; uses CC condition code IMPORT from another computer system EXPORT to another computer system PARM SYNCHK check syntax without altering any data! modal: IF LASTCC = 0 modal command reads condition code THEN - some IDCAMS command ELSE ; another IDCAMS command. else optional 13 Condition Codes used in LASTCC in the IF command: 0 = successful processing 4 = warning 8 = error 12 = error and function not performed 16 = severe error; rest of job flushed VSAM names require that each string of 8 characters (or less) be separated with a period. No trailing period. Name = 44 characters maximum length. First character MUST be a letter. Parameters of the same command can be split over lines and a hyphen is the continuation character. The hyphen must be the last non-blank character on the line. Either do not use a hyphen, or use a semicolon, as terminator (to indicate the last line, that is not continued). The order of parameters is not significant. Must have matching parentheses total for each command, even though many individual lines do not. Commands in col. 2 - 72. Parameters must be separated. May use space or comma. /* comments flanked by */ also can indent for readability Always use SIZE= in VSAM. Size=AUTO leaves room for other modules to be loaded in the partition. Passwords are optional for cluster and catalog. Default is not to require a password for clusters. READPW parameters - read only UPDATEPW parameters - update MASTERPW - all operations permitted Normal order of IDCAMS commands: DELETE DEFINE REPRO Do some COBOL processing PRINT results 14 DEFINE CLUSTER (NAME(yourname.master.file) VOLUME(DAP000) RECORDS(#) - allow for additions - future growth RECORDSIZE(80,80) - 1st = average size 2nd = maximum size KEYS(x,y) - x = length of field y = offset = # characters displaced from beginning of record FREESPACE(10,20)) - (CI%,CA%) room for added records DATA (NAME(yourname.master.file.data)) - otherwise system INDEX supplies odd names (NAME(yourname.master.file.index)) CATALOG(STUDENT.MCAT/IOXYE) name of ours Also, use of PARM SYNCHK allows the Access Method Services to check the syntax of your IDCAMS commands without taking action or altering any actual data. In the above example, the name of the VSAM cluster used is yourname.master.file but you would use your own unique name, as long as it has periods at least each 8 characters and is no longer than 44 characters including the periods. To print your cluster: // JOB // DLBL extname // EXEC IDCAMS,SIZE=AUTO PRINT INFILE(extname) CHARACTER /* /& 15 COBOL, DITTO, IDCAMS and JCL CONNECTIONS. SELECT filename [SAM file] ASSIGN TO SYS015-one FD RECORDING MODE F LABEL RECORDS ARE STANDARD. 01 PIC X(two) SELECT filename ASSIGN TO SYS016-three ORGANIZATION IS INDEXED ACCESS IS RANDOM RECORD KEY IS four FILE STATUS IS five. FD 01 filename. recordname. 05 four [VSAM file] [no recording mode and no label records for VSAM] PIC X( ). WORKING-STORAGE SECTION 01 five PIC XX. _________________________________________________ // ASSGN SYS015,DISK // DLBL one,'SAM unique file-id here',0,SD // EXTENT SYS015,... // DLBL three,'yourname.master.file',,VSAM // EXEC ,SIZE=AUTO _________________________________________________ // EXEC IDCAMS,SIZE=AUTO DELETE ... DEFINE CLUSTER (NAME(yourname.master.file) ... REPRO INFILE(one, ENV(RECFM(F),BLKSZ(two))) OUTFILE(three) _________________________________________________ // EXEC DITTO [for the SAM file only] $$DITTO SPR FILEIN=one /* _________________________________________________ // EXEC IDCAMS,SIZE=AUTO [for the VSAM file only] PRINT INFILE(three) CHARACTER /* one two three four five - COBOL external name for the SAM file. Max. 7 characters. - length of the logical record in the SAM file. - COBOL external name for the VSAM file. Max. 7 characters. - name of the key field. Should be PIC X in COBOL 85, but OK if not. - name of the field used to store the two-digit status codes (in Hex). EXAM 2 includes material to this point. 16 VARIABLE LENGTH RECORDS. IDCAMS DEFINE CLUSTER RECORDSIZE(x,y) REPRO INFILE( , ENV(RECFM(F),BLKSZ(nn))) OUTFILE( where y > x inputs SAM file, fixed (F) length of (nn) ) DITTO may be used with the CVS (Card to VSam) command to load cards to a VSAM cluster. Variable length records may be used in ESDS and KSDS files (not RRDS). Methods of defining variable length records in COBOL. 1. record descriptions of different lengths in one FD or 2. table defined as OCCURS DEPENDING ON or 3. RECORD CONTAINS range or 4. RECORDING MODE IS V (sequential files only) Example: FD 01 class-file RECORD CONTAINS 27 TO 627 CHARACTERS. class-record. 05 class-name PIC X(6). 05 room-number PIC X(7). 05 teacher-name PIC X(12). 05 number-of-students PIC 9(2). 05 class-table OCCURS 0 TO 20 TIMES DEPENDING ON number-of-students. 10 student-number PIC X(9). 10 student-name PIC X(21). How many elements? How much storage is required for this table? How could this table be processed? If you knew a student number, could you get the person’s name? 17 Notes on the use of OCCURS .. DEPENDING ON element-counter. 1. Because of the possibility of the element-counter causing the compiler to allocate dynamically the storage beyond the intended range, define the table as the last part of storage when possible. The next field in storage may be overlapped upon. If the next area is instead used by another program, this has been known to 'bring down' CICS on-line processing. Example: 01 05 table occurs depending on X 05 table occurs depending on Y Unreliable; must be put in separate records. 2. A subscript can be out of range when the table is not as large as the maximum size. OCCURS .. DEPENDING ON does not check for a subscript out of range. Consider: OCCURS X TO Y TIMES DEPENDING ON Z X <= Z <= Y An error occurs when a subscript is used that is > Z when the table is not as large as Y. 3. The compiler calculates the size of the table dynamically. There are three conditions that cause this calculation: a. when a file is read in and the value of the element-counter is a field in that record b. when a new value is moved to the element-counter (however, if the element-counter is redefined a value may be moved to that field without causing a recalculation of the record length) c, after a record is written, the length is set to the maximum size to accommodate the next record 18 ALTERNATE INDEX. Purpose: allow access to a record by secondary (alternate) key FILE-CONTROL. SELECT ASSIGN ORGANIZATION IS ACCESS IS RECORD KEY IS ALTERNATE RECORD KEY IS [WITH DUPLICATES] [ALTERNATE RECORD KEY IS [WITH DUPLICATES]] FILE STATUS IS . can have more than one Requires in VSAM the following commands in IDCAMS in this order: 1. define the base cluster: 2. define the alternate index: 3. define a logical path: 4. build the alternate index: 5. process it and print it DEFINE CLUSTER DEFINE AIX DEFINE PATH BLDINDEX - Note that printing the AIX will only show a cross-reference between the primary and alternate keys. To see the data sorted by the alternate key, print the PATH. See the following sample code: 19 // // // // /* /& 20 DLBL DLBL DLBL EXEC base,'example.base',,VSAM aix,'example.aix',,VSAM base1,'example.path',,VSAM IDCAMS,SIZE=AUTO first: delete, define & repro the base cluster DEFINE AIX (NAME(example.aix)RELATE(example.base) VOLUME(DAP000) RECORDS(n) RECORDSIZE(a,m) KEYS(x,y) NONUNIQUEKEY if duplicates allowed SHAREOPTIONS(2) - must be in define cluster also UPGRADE) DEFINE PATH (NAME(example.path) PATHENTRY(example.aix)) BLDINDEX INFILE(base) OUTFILE(aix) PRINT INFILE(base) CHARACTER PRINT INFILE(aix) CHARACTER PRINT INFILE(base1) CHARACTER PROCESSING ALTERNATE INDEXED VSAM KSDS FILES. The alternate key must be fixed length & in fixed position. Duplicates in alternate index keys may be used. To process, move the value you wish to search for to the alternate key and READ KEY IS to find the first match. Then READ NEXT which will read sequentially each succeeding record with that key. When there are no more matches, “AT END” is triggered. This combination of random then sequential access requires ACCESS IS DYNAMIC in COBOL and uses status codes 02 and 10. 1st record: READ file KEY IS dataname [IN record] (this record is called the "record of reference") then: READ filename NEXT (sequential read until key not = the key of the record of reference) DUPLICATES require: NONUNIQUEKEY in IDCAMS and WITH DUPLICATES in COBOL The path is a logical, not physical concept required by the access method; no space is defined for it. Definition: the alternate index is built on a cluster called the base cluster. The alternate index is an indexed cluster, so you may supply a DATA name and INDEX name if you wish. You actually process the PATH in your application program, however. The technical file structure (see the next chapter) needed to support an alternate index is either a MULTIPLE LINKED LIST or an INVERTED FILE. Both structures contain records for each alternate key value and pointers to the original file. VSAM uses the inverted file for a KSDS alternate index, which references the alternate key to the record key in alternate key order. Updates to a VSAM file with alternate indexing mean that updates must also be made to the structure supporting the alternate indexing. This reduces performance! Although VSAM handles this for you when you use the UPGRADE option, alternate indexing should not be installed unless necessary. VSAM currently does not provide for retrieval of a record based on more than one alternate key at a time. The path is a logical concept, so the IDCAMS PRINT of the path shows all records in the base cluster but in order of the alternate key. A PRINT of the AIX shows the alternate key & the primary key, in order of the alternate key. In other words, the AIX is only the cross-reference of the keys. A path may be specified over a base cluster for a reason other than using alternate indexing. For example, the path could provide an alias with different protection attributes. 21 System conventions include SHAREOPTIONS(2) so that both the path and base cluster may be accessed, much like two programs trying to access one file. There are several levels of protection. Use a 2 for KSDS alone, and a 3 under CICS. The external name of the path must be the same as the external name of the base with a number at the end, beginning at 1 for the first alternate index, and never exceeding the 7 character maximum. ex. XYZ, XYZ1, XYZ2 ex. XYZMAST, XYZMAS1, XYZMAS2 ... XYZMA10 etc. IMPORTANT: despite what the IBM VSAM manuals say, the DCC implementation does not work like this! You must use no more than 6 letters for the base name, followed by a ‘1’ to make the path name. Example: if the base is XYZMAS, the path is XYZMAS1 When you work with DLBLs, remember that a job can have all the DLBLs defined at the top of the job, and they will work throughout the entire job. However, if you choose to repeat a DLBL, you must repeat all DLBLs (and related EXTENTs and ASSIGNs if needed), since having just one “resets” the system’s knowledge of which ones are in place for that job step. PROCESSING DUPLICATES IN AN ALTERNATE INDEX To do this, three conditions must be true: In COBOL, ACCESS must be DYNAMIC In COBOL, must have WITH DUPLICATES under the alternate key IDCAMS DEFINE AIX must have NONUNIQUEKEY To process a record by its alternate key, move the value you wish to search for to the alternate key field, and code READ filename KEY IS alternate key This will find the first record with the matching alternate key value. If it is the only record with the alternate key value, the file status code is ‘00’. However, if it is the first of several records that have the same alternate key value, the file status code will be ‘02’. You would then switch to sequential processing using a READ NEXT statement to find the desired record, usually in conjunction with an IF statement to match a third field for a particular value. 22 If you set up a loop to read sequentially all the records with the same alternate key value, the last matching record (the last duplicate) will give you a file status value of ‘00’. If you continue to READ sequentially, you will read the rest of the records in the file from that point, and eventually get a value of ‘10’, which logically means “end of file” in the sense that there are no more records when reading sequentially. One method to avoid reading more records than you need to, is to set up a loop when you get a file status value of ‘02’. Within the loop, use sequential processing to search for the desired value of another field that will find the correct record from within those with a matching alternate key value. You can stop the loop at that point with a switch. Be sure also to set the switch once the value changes to ‘00’, which means you have read the last duplicate and still not found what you are looking for. Loop until the switch is set. File Status code summary on a READ with alternate indexing with duplicates: ‘00’ successfully found only record, or last record if more than one ‘02’ successfully found first matching record, but there are more ‘10’ no more records to process sequentially A file status code of ‘02’ may also be obtained on a WRITE or a REWRITE statement if there are already other records with the same alternate key value. COBOL programmers may also use the START statement to position the READ statement at the appropriate record. This could be done to position the first record at somewhere other than the beginning of the file (sequential or dynamic access). START can also give options to prevent errors if there is no exact match for the first value of the key field value you select. The format of the START statement, with three options, is: START filename KEY IS EQUAL TO dataname START filename KEY IS GREATER THAN dataname START filename KEY IS NOT LESS THAN dataname If you use the START statement with no KEY IS phrase, the primary key is assumed. EXAM 3 includes material to this point. 23 DATA STRUCTURES. Pointer: A field associated with one piece of data used to identify the location of another piece of data. If the data is reorganized the pointers must be changed. If pointers are destroyed they can be very difficult to reconstruct. Stack: all insertions and deletions are made at the same end of the data structure. Last in - first out. Queue: all insertions occur at one end and all deletions at the other; First in - first out. Sorted List: insertions and deletions may occur anywhere; elements are maintained in logical order based on a key field Inverted List: a table, list, index, or directory of data addresses that indicate all the records with something in common An index is usually used to speed processing. Indexes can be created on secondary keys, which may not be unique to one record. An index is more compact than the data it references, but an index can be very large. An index can be treated as a file itself, on which an index can be created. Example: VSAM alternate index. Linked List: when the above structures are used with a system of pointers from one element to the next. Also called a Chain. Tree Structure: each element of the structure (except the root) has one pointer to it, but may have zero or many pointers from it to other elements. Element also called node. Binary Tree: each element has at most two pointers from it. Very efficient for processing. In a sequence tree, there is a left pointer from an element which points to the element in the next level with a lesser value, and a right pointer - next level, greater value. This kind of tree is dependent on how the records are loaded. The worst case for a tree occurs when the records are already sorted (the tree becomes a one-sided linked list). 24 DATA MODELS. Data Model: an abstract representation, or description, of real objects and their associations. DBMS: database management system Schema: the internal model designed by the data base administrator and seen by the programmers includes the data base management system, operating system access methods, and other programs Subschema: the data as seen by a user of a specific data processing technology; also called external model DATABASE MODELS: Models: Hierarchical, Network, Relational, Object-oriented, XML Hierarchical: Parent / child structure. No record has more than one parent. Example: IBM's IMS (Information Management System) Network: Parent / child structure but more flexible than the hierarchical and with less redundancy. Example: TOTAL Relational: A relationship between data elements based on two-dimensional tables that is easiest to use and does not depend on any specific access. Examples: ORACLE, SQL/DS, dBASE III, dBASE IV, RBase, ACCESS, many others. Note: a vendor's definition of what makes a database relational, does not always agree with the original Codd specifications. Many so-called relational databases lack certain features. Also, do not confuse a relational database product with a simple file manager (often called a flat file). Object-oriented Properties of data are stored along with the data, including the function of inheritance. Used in C++ XML: 25 XML is a recent system developed to deal with real-world data, which does not always have a fixed record structure. This should be interesting to watch, since it has the potential to change completely how we deal with information. RELATIONAL DATABASES. database: collection of shared related files. Note: if you are working with just one file, it’s not really a database. Many programs that work with one file should not be called database management software; they are instead more properly called flat-file managers. record: collection of related fields. relational: refers to the qualities of a record, that every field in a record is directly related to the key field of that record. Bad example: Student name, student number, advisor, name of advisor’s cat Using databases: a. Design the structure of the database b. Load the data c. Query the database Relational database structure. Uses two-dimensional tables. Each column contains a single value. The order of the columns does not matter. Each row is different. The order of rows does not matter. There are rules for a process called Normalization that governs good design of relational structures. Terminology used in relational databases: 26 Relational theory: _________ Relational DBMS _________ File processing ____________ Relation Tuple Attribute Table Row Column File Record Field Programming with a relational database. The relational model provides data independence from the physical database. If the database is changed, application programs do not have to be rewritten. A disadvantage is much data redundancy. Also, there is not as much efficiency in high volume applications as the other structures (network and hierarchical). Data definition concepts: a table has columns and rows intersection of row and column is called a field domain: set of all possible values in a field view: a logical table derived from tables not physically stored; may be processed like a table index: change logical order of data; SQL decides when to use indexes referential integrity: must not delete a record from one table if that record is needed by another 27 NORMALIZATION. Normalization is the application of a set of rules for the most efficient design of a relational data base to eliminate or at least reduce problems caused by insertion, deletion, or modification of records. First normal form: each relation contains no repeating groups (these should be in a table by themselves) Second - fifth normal form: more rules to make the database as efficient as possible for data integrity. example: no field in a record that is not directly related to the key field There are several other levels of normalization. The way a database is structured when it has multiple files being shared affects the efficiency of information storage and retrieval. A database must store all needed data, and it must be able to connect that data logically, even across files. The following are design principles, not to be violated without a very good reason: 1. Assign a unique key field to each record in a file. 2. If two fields are always related to each other, put them in the same record. 3. If the same field is found in more than one record, make separate files and put the key field of the record in both files to allow the connection to be made from one to the other. 4. Do not repeat a data type within a record. Instead construct two files with a common key field. 5. Do not put any field in a record that is not directly related to the key field (remember the Advisor’s Cat). Construct another file and JOIN it to the first file by a field with common values. 6. Adding indexes for secondary keys, or additional files, can save time in some data retrieval operations, but this will add to the use of disk space and the complexity of the overall design. 7. Use enough columns (fields) so that any potential data can be queried or sorted. 8. Give each column enough width for any possible entry. 9. Make columns that may be compared with one another the same data type. 28 STRUCTURED QUERY LANGUAGE (SQL). SQL/DS - Structured Query Language / Data System IBM product ANSI standard for relational databases, Feb. 1985 Concept of relational model invented by E. F. Codd, IBM, 1972 First fully relational DBMS, 1979, Oracle Corp. IBM versions: SQL/DS for DOS, 1982 SQL for VM, 1983 SQL for MVS, 1984 Description: fourth-generation relational database query language. A fourth-generation language specifies what to do but not how to do it. non-procedural language to create, access, and modify data has important data security and integrity features Methods of accessing SQL: ISQL - runs as a CICS transaction and allows testing of commands through QMF SQL engine in a database management application package or Internet search engine Batch SQL // EXEC ARIDBS (SQL utility - see IBM manual #5046) Embedded SQL embedded into a high-level host language like COBOL, Assembler, or PL/1 embedded SQL commands must first be precompiled SQL uses the VSE/VSAM access method Microcomputer SQL implementations, such as ACCESS 29 Data definition statements: CREATE (table, view, or index) GRANT, REVOKE (access to tables by other users) DROP (delete a table and its data) Data query statement: SELECT (for queries - retrieve data) Data manipulation statements: UPDATE, INSERT, INPUT, DELETE, ALTER Syntax: names are short-ID (to 8 characters) or long-ID (to 18) short-ID used for databases, table space long-ID used for tables, indexes, views, columns names usually start with letters; short may start with # or $; long may contain numbers but not special characters or spaces. Exception: long-ID may use the underscore avoid SQL reserved words as user-defined names; there are over 300 reserved words Types of data: integer -2,147,483,648 to +2,147,483,647 smallint -32,768 to +32,767 tinyint 0 to 255 float floating point -79 decimal positions to +75 decimal(m,n) m = total numbers n = decimal places char(n) fixed length (n) data varchar(n) to 254 characters long varchar(n) between 254 and 32,767 entries may be NULL (no value); displays as a ? 30 Catalogs in SQL: catalogs are tables themselves syntax: USERID.TABLENAME if user has granted rights SYSTEM.SYSCATALOG shows all tables SYSTEM.SYSCOLUMNS detailed list of columns in all tables SQLDBA.SYSUSERLIST users (only DBA sees passwords) while a table is created, altered, or deleted, the system locks out any operation on that table and temporarily locks out the system catalog. SQL recognizes and can break deadlock Examples of useful queries of catalogs: SELECT * FROM SQLDBA.SYSUSERLIST SELECT * FROM SYSTEM.SYSCOLUMNS WHERE CREATOR = user SELECT * FROM SYSTEM.SYSPROGAUTH SELECT * FROM SYSTEM.SYSCATALOG SELECT * FROM SYSTEM.SYSVIEWS CREATING TABLES: CREATE TABLE tablename (columnname type, ... ) may use NOT NULL after the type to show that there must be data in that field if a table already exists, error message DELETING TABLES: DROP TABLE tablename 31 RETRIEVING DATA FROM A TABLE: command syntax: SELECT columnname,... FROM tablename,... [WHERE condition] SELECT specifies columns (which fields are displayed) FROM specifies tables (from which data file) WHERE specifies rows (for which records) Select query shows cost estimate and row count. * = all columns Example: SELECT * FROM INVENTORY DISTINCT removes duplicate rows, so that the columns which are displayed contain different data in each row Example: SELECT DISTINCT PARTNO FROM INVENTORY Retrieving only some rows which meet a condition: WHERE Alphanumeric data is placed in quotes. Ex. WHERE PART = ‘CAM’ Arithmetic operators: + - * / Relational operators: = < > <= >= <> Statistical functions: AVG MAX MIN SUM COUNT Logical operators: AND, OR, NOT used to form compound condition AND evaluated before OR; parentheses evaluated first 32 Additional operators used in a query: BETWEEN, IN, NULL, LIKE Examples: WHERE partno BETWEEN 241 AND 266 WHERE major IN ('CIS', 'ACC', 'BAT') WHERE description IS [NOT] NULL WHERE description LIKE 'large %' Note: LIKE works with alphanumeric data only and finds any number of appearances of that character string Special characters in search expression: _ and % _ (underscore) means ignore a single character in the string % (percent) means ignore zero or more characters in string Examples: 'CIS%' '%A%' 'C_S' Sorting the output: finds all courses starting with CIS finds BAT, ACC, CPA, etc. finds CIS and CPS etc. ORDER BY columnname [DESC], ... Examples: ORDER BY PARTNO, SUPPNO DESC ORDER BY 2, 1 ORDER BY 2 DESC, 1 Summarizing: 33 GROUP BY columnname, ... Using calculated fields: SELECT suppno, price*qonorder AS order FROM QUOTATIONS WHERE price*qonorder > 100 This results in a column name of ORDER after SUPPNO. The addition of ‘AS’ is useful to provide a more meaningful column name, if supported by SQL version. RETRIEVING DATA FROM MULTIPLE TABLES: This is one of the most powerful relational database capabilities. The operation is called JOIN and allows for an unplanned logical connection between the tables. There must be a column in each table with data in common for a join. The column names do not need to match, but the data must. If duplicate column names exist in the joined tables, each reference to column name must either use an alias, or be a qualified name (containing the name of the table, a period, and the column name), to eliminate ambiguity. SELECT .. FROM tablename1, tablename2 WHERE tablename1.columnname = tablename2.columnname AND ... The WHERE condition shows the relationship between the tables. If a columnname is found in both of the tables, the command requires a prefix of the tablename and a period so the system knows which column is desired. Column names that are only in one table require no prefix. There is also normally some other part of the condition (AND, OR). ALIAS It is possible to join a table to itself to solve certain queries. This is done by using an alias for the name of the table, so the same table is referred to by two different names when you join them in the query. In the following example, only one table is used, but it is given an alias as FIRST and SECOND. SELECT DISTINCT FIRST.columnname FROM tablename FIRST, tablename SECOND WHERE FIRST.columname = SECOND.columname AND …; 34 OTHER SQL COMMANDS: Subqueries SELECT .. WHERE join AND .. IN (SELECT ...) There are several types, including nested queries and correlated subqueries. VIEWS Views are logical tables. can make complicated queries simpler can grant access to a view to see only part of a table GRANT TO and REVOKE FROM GRANT privilege ON table TO INPUT - begin adding data to a table INSERT - add one row with specific values HELP Examples: HELP SELECT or HELP 'CREATE VIEW' UPDATE - change one value or values in a row or rows DELETE FROM WHERE - delete row(s) from table ALTER - change column structure COMMIT WORK - save changes ROLLBACK WORK - if error occurs, ignore changes OTHER SQL FUNCTIONS: YEAR(birthdate) HOUR(current time) SUBSTR(fname,1,1) DECIMAL(invtotal,9,2) Ex. SELECT DECIMAL (AVG (fieldname), 7,3) 35 EMBEDDED SQL. Using embedded SQL in Fujitsu MicroCOBOL is not possible with the limited free version. The following notes are mainframe-based. Preprocessing changes SQL code to be processed by the host compiler and converts SQL statements to access modules in machine code, stored in the SQL/DS database and called by the host program. The complete development cycle for COBOL is as follows: 1. Define tables 2. Code the host program with embedded SQL top-down 3. Precompile (converts SQL statements to CALL statements) 4. Bind (system finds access paths for each SQL request) 5. Compile (normal, although SQL shown) 6. Link-edit (normal) 7. Execute (steps 3 - 7 done by special JCL) Writing the program: begin each SQL statement with EXEC SQL in area B end each SQL statement with END-EXEC define your host variables (will be prefaced by a colon when used in a SQL command) in the DECLARE SECTION include the SQL Communication Area (for error handling) EXEC SQL INCLUDE SQLCA END-EXEC establish error handling WHENEVER a problem occurs negative numbers are errors 0 = successful processing; 100 = EOF CONNECT to SQL by a USERID and PASSWORD SQL DECLARE CURSOR a cursor is a pointer to the database associates a query with a name the query may return one or many rows, called the "active set" verbs are OPEN, FETCH, PUT, DELETE, UPDATE, CLOSE FETCH functions like a COBOL "READ INTO" 36 COMMIT WORK RELEASE - make changes and release database ROLLBACK WORK RELEASE - ignore changes and release The program in Appendix A is a working mainframe COBOL program with an embedded SQL query. This program was written to provide a simple demonstration of the technique by which a SQL query can substitute for a great deal of COBOL procedural code. Direct references to SQL have a pair of instructions surrounding them: EXEC SQL and END-EXEC. This indicates that the precompiler will first translate only the statements between these instructions into COBOL CALL statements. Then the COBOL compiler, invoked by rather elaborate JCL, can translate the precompiled SQL as calls to object modules along with the rest of the regular COBOL. Technically, the jobstream which is actually sent does not contain the source code; the job entry control language punches the appropriate source code from another file and any required data at the required time. The embedded SQL query is in lines 53 - 56. The demonstration query in this program retrieves all the records in the SCHEDULES table, except those scheduled in a building whose identification starts with the characters "DU". The data will be sorted by course number. Line 52 assigns the query to a cursor, named in this case C1. This is not a cursor in the sense of an indication of where you are on a monitor; the cursor defines a pointer to the desired record. Lines 80 and 81 show a FETCH C1 INTO which functions as a COBOL READ INTO statement, reading the desired data from the SQL database. The datanames which begin with colons on line 81 are defined in the DECLARE section, lines 22 - 29. Although the names are the same as the attribute names in the query (for example, 'course'), this is not required. The colon in front of the name indicates that the dataname is a COBOL field which will accept a SQL attribute. Some very powerful and convenient program control logic is interesting to note here. SQL queries give a return code: a negative number is an error; a 0 means processing was successful; a positive number is a warning, even though processing continues. Lines 62 and 83 refer to SQLCODE = 100, which is a specific warning indicating an end-of-file condition, or no more records in the table. The SQL WHENEVER commands in lines 69 and 70 may be the most time-saving lines in the program. If a SQL query return code is a warning, ignore it and continue; if a return code indicates that an error, any error, occurred, control is transferred unconditionally to the error-handling paragraph, ERRCHK, which begins on line 91. This avoids the programmer having to code individual error-handling conditional logic throughout the program. Just say, if there is ever an error, go there. This is perhaps the best example of the fourth-generation principle: tell the computer what you want but let it figure out how. If you need more control, you can still have it. Although the technique was not used in this program, "WHENEVER statements may be specified in more than one place in a program. A WHENEVER statement is applicable to all SQL statements that follow it, until the end of the program or the next WHENEVER statement. 37 Finally, line 111 in the ERRCHK paragraph contains a ROLLBACK WORK statement which causes any change to any table made during this program to be undone. Only if no errors occur, and the ERRCHK routine is never called, will the program store any changes made by the COMMIT WORK RELEASE instruction in line 64. Up until that statement, a ROLLBACK WORK (called from a WHENEVER) can keep any updates from becoming permanent, avoiding possible corruption of data and allowing the programmer to try again on the original data. During the precompile, SQL commands are read top-down. The precompiler will not follow COBOL logic. Therefore the DECLARE cursor must come before OPEN cursor or the OPEN will be flagged as an error etc. Submit the precompile job, not COBOL code. The special JCL will precompile then compile, link, and execute. Programming: In the SQL precompile job: put any JCL statements that the job needs to run put your userid and password reference file name, type, disk which has COBOL source code In the COBOL source code: WS error-handling fields STEP-DENOTER and DECODED-SQLCODE in working-storage to be used by SQL error output routines WS fields for userid and password as SQL declared variables in the DECLARE SECTION; best method is to ACCEPT them from card input, so they are not shown on a printout of the job when defining host variables, use COMP for smallint, and COMP-3 for decimal host fields. INCLUDE the SQLCA module for error handling routines (both errors and warnings) move a message to STEP-DENOTER before each SQL command write your embedded SQL commands, using a colon in front of COBOL host variables used in the SQL command use standard COBOL techniques for output formatting reproduce the recommended error handling paragraph ERRCHK which is invoked by the SQL WHENEVER clauses 38 Handling SQL return codes: negative numbers are errors ex. -504 use HELP -504 in ISQL to see info on this error a 0 means successful processing positive numbers are warnings a 100 means no more rows remain in the active set; in other words, the EOF has been reached 39 APPENDIX A. Sample COBOL program with embedded SQL query. 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 40 IDENTIFICATION DIVISION. PROGRAM-ID. SQLDEMO. AUTHOR. M.K.FINLEY. ENVIRONMENT DIVISION. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM390 WITH DEBUGGING MODE. OBJECT-COMPUTER. IBM390 WITH DEBUGGING MODE. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT PRINT-FILE ASSIGN TO SYS011-SQLPRT. DATA DIVISION. FILE SECTION. FD PRINT-FILE LABEL RECORDS ARE OMITTED. 01 PRINT-REC PIC X(133). WORKING-STORAGE SECTION. 01 STEP-DENOTER PIC X(50) VALUE SPACES. 01 DECODED-SQLCODE PIC --------999. EXEC SQL BEGIN DECLARE SECTION END-EXEC. 01 COURSE PIC X(9) VALUE SPACES. 01 TITLE PIC X(20) VALUE SPACES. 01 ROOM PIC X(6) VALUE SPACES. 01 INSTRUCTOR PIC X(20) VALUE SPACES. 01 USERID PIC X(8) VALUE SPACES. 01 PASSW PIC X(8) VALUE SPACES. EXEC SQL END DECLARE SECTION END-EXEC. 01 PRINT-LINE VALUE SPACES. 05 CARRIAGE-CONTROL PIC X. 05 COURSE-OUT PIC XXXXXXBXBXX. 05 FILLER PIC X(3). 05 TITLE-OUT PIC X(20). 05 FILLER PIC X(3). 05 ROOM-OUT PIC XXXBXXX. 05 FILLER PIC X(3). 05 INSTRUCTOR-OUT PIC X(20). 05 FILLER PIC X(65). 01 MESSAGE-LINE VALUE SPACES. 05 CARRIAGE-CONTROL PIC X. 05 SQLMESSAGE PIC X(132). EXEC SQL INCLUDE SQLCA END-EXEC. 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 41 PROCEDURE DIVISION. 000-MAIN-MODULE. OPEN OUTPUT PRINT-FILE. MOVE 'SQLMKF NOW EXECUTING.' TO SQLMESSAGE. WRITE PRINT-REC FROM MESSAGE-LINE AFTER 3. PERFORM INITIALIZE-SQL THROUGH INITIALIZE-EXIT. MOVE 'DECLARE' TO STEP-DENOTER. EXEC SQL DECLARE C1 CURSOR FOR SELECT COURSE, TITLE, ROOM, INSTRUCTOR FROM SCHEDULES WHERE ROOM NOT LIKE 'DU%' ORDER BY COURSE END-EXEC. EXEC SQL OPEN C1 END-EXEC MOVE 'COURSE TITLE ROOM INSTR.' TO SQLMESSAGE. WRITE PRINT-REC FROM MESSAGE-LINE AFTER 2. PERFORM PROCESS-SQL THROUGH PROCESS-EXIT UNTIL SQLCODE = 100. EXEC SQL CLOSE C1 END-EXEC. EXEC SQL COMMIT WORK RELEASE END-EXEC. CLOSE PRINT-FILE. STOP RUN. INITIALIZE-SQL. MOVE 'WHENEVERS' TO STEP-DENOTER. EXEC SQL WHENEVER SQLWARNING CONTINUE END-EXEC. EXEC SQL WHENEVER SQLERROR GO TO ERRCHK END-EXEC. ACCEPT USERID. ACCEPT PASSW. MOVE 'CONNECT' TO STEP-DENOTER. EXEC SQL CONNECT :USERID IDENTIFIED BY :PASSW END-EXEC. INITIALIZE-EXIT. EXIT. PROCESS-SQL. MOVE 'FETCH ' TO STEP-DENOTER. EXEC SQL FETCH C1 INTO :COURSE, :TITLE, :ROOM, :INSTRUCTOR END-EXEC. IF SQLCODE NOT = 100 MOVE COURSE TO COURSE-OUT MOVE TITLE TO TITLE-OUT MOVE ROOM TO ROOM-OUT MOVE INSTRUCTOR TO INSTRUCTOR-OUT WRITE PRINT-REC FROM PRINT-LINE AFTER 1. PROCESS-EXIT. EXIT. 091 ERRCHK. 092 ********************************************************** 093 * THE NEXT ROUTINE PRINTS THE SQLCA STRUCTURE 094 * - SQL CODE = SQL RETURN CODE 095 * - SQL ERRM = SQL ERROR MSG 096 * - SQL ERRP = MODULE DETECTING ERROR 097 * - SQL ERRD = INTERNAL ERROR VALUES 098 * - SQL WARN = SQL WARNING STRUCTURE 099 ********************************************************** 100 DISPLAY SPACES. 101 DISPLAY 'CHANGES WILL BE BACKED OUT'. 102 DISPLAY SPACES. 103 DISPLAY 'A PROBLEM HAS BEEN DETECTED IN THE'. 104 DISPLAY SPACES. 105 MOVE SQLCODE TO DECODED-SQLCODE. 106 DISPLAY STEP-DENOTER. 107 DISPLAY SPACES. 108 DISPLAY 'SQLCODE: = ' DECODED-SQLCODE. 109 DISPLAY 'SQLERRM: = ' SQLERRMC. 110 DISPLAY 'SQLERRP: = ' SQLERRP. 111 EXEC SQL ROLLBACK WORK RELEASE END-EXEC. 112 ERRCHK-EXIT. 113 EXIT. 42 APPENDIX B. Microcomputer SQL. Affordable microcomputer versions of SQL became available on microcomputers in the late 1980s. These early versions, especially shareware programs, were quite limited, with many commands not supported. dBASE IV implemented a fairly complete version, but its use was cumbersome at best. SQL on a microcomputer can be used to learn the query syntax. You can also translate a query by example (QBE) into SQL, and examine the results to help learn SQL. Microsoft ACCESS supports SQL, although the steps to get to it are not intuitive, since SQL does not initially appear on the pull-down menus! Why are they hiding this powerful tool? How to use SQL in Microsoft ACCESS: 1. Create or load a table in ACCESS. 2. Click on the Queries tab under Objects in the Database window. 3. Choose NEW in the menu of that window. 4. Choose DESIGN VIEW in the New Query window and click OK. 5. A Show Table window appears automatically. Click Close. 6. Choose SQL View from the View menu. A new window with SELECT; appears. 7. Type in a query. Be sure it ends with the semicolon. 8. To run the query (see the results), click on the red exclamation point icon, or select Query, Run from the menu. 9. Depending on window sizes, your query window may disappear. Find the window named Query1 (the default first query name). To edit your query, click on View, SQL View. 10. You can save the query by clicking on the Save icon. The default name is Query1 but you can change the name to a more meaningful one. The next time you load the database, the query will be loaded as well and will appear as a choice in the Database window from the Query tab. 11. If you edit a query and save it, it will overwrite the existing query. To keep the existing query and add a new query, choose File, SAVE AS and give it a different name. 43 To join two tables in Microsoft Access: Load the first table On the Database window, click on Tables From the top menu, Insert, Table From the New Table window, click on Import Table & click OK Navigate to the second table and double-click on it In the Import window, select what you want to bring in (tables, queries, etc.) You should now have the second table in the Database window with your first table You are now ready to write SQL queries using both tables. Access provides a non-standard command called INNER JOIN. You may use it, or you may use the standard SQL form. See the two examples below: SELECT songlist.song, number, publisher FROM songlist INNER JOIN composers ON songlist.composer = composers.name; Is the same query as standard SQL: SELECT songlist.song, number, publisher FROM songlist, composers WHERE songlist.composer = composers.name; 44 Differences between Standard (ANSI) SQL and Microsoft Access SQL Access SQL is a subset of standard SQL, probably because Microsoft provides other ways within Access to create and manipulate data and expects that you will use those, not SQL. Microsoft even hides the SQL menu from you, until you are in a particular design query environment. This is unfortunate. Not supported in Access SQL: CREATE TABLE - done using Access Table menu CREATE VIEW and DROP VIEW - however, you can simulate a VIEW by using SELECT … INTO UPDATE DELETE GRANT REVOKE NOT NULL in WHERE conditions. This is because Access won’t let you specify that a field could be NULL when designing the table. Wildcard characters “_” or “%” - however, the “*” is used as a wildcard character Data types CHAR and VARCHAR (use TEXT in table design) Special notes: Access allows either single or double quotes for strings. Access SQL uses and recommends INNER JOIN ON but you can still join two tables using standard SQL language (WHERE table.field = table.field AND) When using GROUP BY, you must list each field in the SELECT. Access does not recommend joining a table to itself, but it is supported. Access SQL assumes ALL in a SELECT when you don’t use DISTINCT 45 Using PL/SQL in ORACLE 9I SQL implementations by different vendors often have minor differences in features and syntax, particularly in two areas: data types supported, and output formatting. The following is a quick reference to some of the features of PL/SQL I found that do not match the batch SQL used at DCC by running jobs with // EXEC ARIDBS. The textbook reference used for the following information is Chapters 2, 3 and 4 of Guide to Oracle9I by Morrison & Morrison, published 2003 by Thompson Course Technology. Please note this is not meant to be a comprehensive list, nor is it meant to imply one SQL implementation is preferable to another. Data types: NCHAR and NVARCHAR data types are not supported in batch SQL. DATE, TIMESTAMP, INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND, LOB data types are not supported in batch SQL. Constraints: Constraints, including foreign and composite keys, are handled differently in batch SQL, except that NOT NULL is the same. Query commands: The DESCRIBE command is not found in batch SQL; use SELECT instead. The RENAME command is not supported in batch SQL. DATE and INTERVAL values are handled differently; check the manuals. The CREATE SEQUENCE and USER SEQUENCE commands are not found in batch SQL. Single row number functions, single row character functions and single row date functions are much more limited in batch SQL than in Oracle. Oracle’s SQL*Plus functions are not supported in batch SQL. The exponentiation operator, **, does not work in batch SQL. Procedural language constructs: SQL by itself is not a procedural language, so it should not be expected to have programming constructs like IF THEN or WHILE LOOP that are contained in Oracle’s implementation. 46 APPENDIX C. MODULUS 11 Check Digit algorithm in COBOL. // JOB 21100FAC 01 FAC MATT FINLEY - INSTRUCTOR // OPTION DECK // EXEC IGYCRCTL,SIZE=IGYCRCTL PROCESS LIB IDENTIFICATION DIVISION. PROGRAM-ID. MODCALL. AUTHOR. MADISON K. FINLEY. DATE-WRITTEN. 9-27-99. ENVIRONMENT DIVISION. DATA DIVISION. WORKING-STORAGE SECTION. 01 NEEDED-FIELDS. 05 PRODUCT 05 THE-SUM 05 REMAINDER-DIGIT 05 DONT-NEED LINKAGE SECTION. 01 WORK-AREA. 05 DIGIT-1 05 DIGIT-2 05 DIGIT-3 05 DIGIT-4 05 CHECK-DIGIT PIC PIC PIC PIC 99. 999. 99. 99. PIC PIC PIC PIC PIC 9. 9. 9. 9. 9. PROCEDURE DIVISION USING WORK-AREA. 000-MAIN-PARA. * DISPLAY "MODCALL IS RUNNING.". COMPUTE THE-SUM = 0. COMPUTE PRODUCT = DIGIT-1 ADD PRODUCT TO THE-SUM COMPUTE PRODUCT = DIGIT-2 ADD PRODUCT TO THE-SUM COMPUTE PRODUCT = DIGIT-3 ADD PRODUCT TO THE-SUM COMPUTE PRODUCT = DIGIT-4 ADD PRODUCT TO THE-SUM * 5. * 4. * 3. * 2. DIVIDE THE-SUM BY 11 GIVING DONT-NEED REMAINDER REMAINDER-DIGIT. COMPUTE CHECK-DIGIT = 11 - REMAINDER-DIGIT. * DISPLAY CHECK-DIGIT. 000-EXIT. EXIT. 100-RETURN-PARA. EXIT PROGRAM. 100-EXIT. /* /& 47 APPENDIX D. MODULUS 11 Check Digit algorithm in Pascal. program modulus(input,output); { Calculates a check digit using Modulus 11 } { Written by M.K.Finley. 9/27/85 } { Last revised 4/29/03. Bug fix 8/28/07. } uses CRT; const max = 19; type datatype = 0..max; indextype = 1..max; arraytype = array [indextype] of datatype; var digit,weight,product,sum,remainder,check,total : integer; answer : char; counter : 1 .. max; number : arraytype; procedure welcome; begin writeln(' writeln; writeln(' writeln; writeln(' writeln(' writeln(' writeln; end; MODULUS.PAS Last revised by M.K.Finley 4/28/03. 9/27/85. ‘); '); This program calculates what the check digit is, using'); the Rule of Modulus 11. Use of a check digit can prevent'); unauthorized or invalid account numbers.'); procedure initialize; begin answer:='a'; product:=0; sum:=0; total:=1 end; procedure choice; begin write(' readln(answer); writeln end; 48 {initial screen} {welcome} {zero accumulators} {initialize} {set switch to continue or exit} A. Calculate check digit. {choice} B. End program. '); procedure howmany; {length of input number} begin writeln;writeln; write(' How many digits is the number, including the check digit? '); readln(total); if total = 1 then howmany; total:= total - 1; writeln end; {howmany} procedure input; {input & store each digit in an array} begin for counter:= 1 to total do begin write(' Enter number in position # ',counter, ' ( L to R ): '); digit:=-1; readln(digit); while (digit < 0) or (digit > 9) do begin write(' Error. Try # ',counter, ' again. '); readln(digit) end; number[counter]:=digit end; writeln end; {input} procedure calculate; {compute & display Modulus 11 formula} begin weight:=total + 1; for counter:= 1 to total do begin write(' Number = ',number[counter]); write(' weight = ',weight); product:=number[counter] * weight; write(' product = ',product); sum:=sum + product; writeln(' sum = ',sum); if weight > 2 then weight:= weight - 1; end; remainder:= sum mod 11; write(' Remainder = ',remainder); check:= 11 - remainder; if check = 11 then check:= 1; if check = 10 then check:= 0; writeln(' Check = ',check); end; {calculate} 49 procedure results; {display the answer} begin writeln;write(' The number was: '); for counter:= 1 to total do write(number[counter]); writeln;write(' The check digit would be: writeln(check); writeln;write(' The new number should be: for counter:= 1 to total do write(number[counter]);writeln(check); writeln end; {results} begin end. 50 '); '); {MAIN LOGIC} clrscr; welcome; initialize; choice; clrscr; while (answer <> 'b') and (answer <> 'B') do begin howmany; input; calculate; results; initialize; choice; clrscr end {PROGRAM} APPENDIX E. Call an external module in Fujitsu COBOL (Instructions adapted by Dean Finley from Gary Fidler) ______________________________________________________________ Write the calling program. Ex. CALLING.COB Write the called program. Ex. CALLED.COB In Programming Staff, click the Project button then click Open - make sure you are in the folder with the source code In the Open Project dialog box, enter a project name ex. CALLTEST.PRJ then click Open In the new Open Project dialog box, click Yes to create the file In the Target Files dialog box, click Add then click OK In the Dependent Files dialog box, browse to and add files Ex. CALLING.COB and CALLED.COB In the list box, select CALLING.COB, click Main Program, then click OK In the CALLTEST.PRJ window, click Build When you get the Make ended message, click OK Close the editor window that opens In the CALLTEST.PRJ window, click Execute In the Runtime Environment window, click OK Check your output file 51 APPENDIX F. VSAM COBOL Status Codes. CODE: ____ CAUSE & ACTION TO TAKE: _______________________ 00 Successful processing 02 Duplicate key on a READ in sequential access (more records match) 04 Wrong length record on a READ 05 Optional file is not present (I'm not sure how to get this error) 10 End of file in a READ using sequential access when there are duplicates in Dynamic access 20 Invalid key (I'm not sure how to get this error) 21 Invalid key, sequence error (rewrite a record that doesn't exist). Don't change the value of the key between READ and REWRITE. 22 Duplicate key (record already exists) on WRITE or REWRITE 23 Key not found (record does not exist) on READ, DELETE or START 24 Attempted to write beyond the boundary of the file 30 Hardware parity error (let us know if you get this error) 35 Required file missing. Check on an open if the file even exists. 37 Device conflict. Did you open a tape for I/O? Did you open KSDS or RRDS as I/O when it is an empty file? Check also the COBOL SELECT for the correct ACCESS usage. 39 Can't OPEN; file attributes conflict (check record length, the length of the primary key, the length of the alternate key). Also check the COBOL SELECT for the correct ORGANIZATION usage. 41 Can't OPEN a file already open 42 Can't CLOSE a file already closed 43 Error on REWRITE or DELETE in sequential access only, when you did not READ the record first to establish the record pointer. 46 Sequential access only: READ shows no more records. Did you READ past the EOF? 47 Can't READ when file was not opened, or opened for output. 48 Can't WRITE when file was not opened for output or I-O 52 49 Can't DELETE or REWRITE because file not opened for I/O 90 Undocumented error. This code occurs when there is an error when none of the other status codes can explain the error. Difficult to diagnose this one. 91 VSAM password failed. Did you create the file with one? 92 Logic error. Make sure the record description is the same length as specified in the RECORDSIZE(). File not open? File opened the wrong way? File already open? Reading past the EOF? Many of these errors are now reported with the 30s codes. 93 Are you working with an alternate index? If so, did you put SHAREOPTIONS(2) in BOTH the DEFINE CLUSTER and the DEFINE AIX? It has to be in both. Let us know if you get 93 for any other reason. 94 In a sequential READ or READ NEXT, the current record is undefined 95 Incomplete or invalid information for the VSAM file. Error in the DLBL for the VSAM file? File not in the catalog? Check that the KEY(S) in IDCAMS is correct for the record key in the COBOL code. Check your PIC clauses and calculate the displacement of the key field by adding up the total length of all PIC clauses before the key field. A key field in position 1 is a displacement of zero. Record key different from Keys in the Define Cluster? 96 No DLBL for the VSAM file Did you use the AIX name instead of the PATH name with a cluster that uses an alternate index? 97 OPEN was successful and integrity verified, but the cluster was not properly closed during the last processing run. 53 FILE ORGANIZATION STUDY GUIDE FOR CIS 211 Madison K. Finley Associate Dean of Academic Affairs Emeritus Adjunct Lecturer, Department of Engineering, Architecture and Computer Technologies Adjunct Lecturer, Department of Performing, Visual Arts and Communications Certified Computing Professional (C.C.P.) I gratefully acknowledge the support of the DCC Computer Information Systems faculty, staff, and students for all that I have learned over the last twenty-five years. Specific recognition goes to Professor Emeritus William G. Kleinhomer for his lasting contributions to the development of this course, to Dr. Frank Whittle for his knowledge and continuing support, and to Gary Fidler for all his assistance in adapting to a new system. Copyright ©2007 by Madison K. Finley All rights reserved. This book or parts thereof may not be reproduced or used in any form or any means without written permission of the author. Making copies of this book, or any portion, is a violation of United States copyright laws. Throughout this guide, trademarked names are used. Rather than place a trademark symbol in every occurrence of a trademarked name, please note that the names are used only in an editorial fashion to the benefit of the trademark owner with no intention of infringement of the trademark. First printing: Eleventh printing: August 1987. September 2007. Dutchess Community College Poughkeepsie, New York 12601 54