2.1 Difference Between Data and Information Data A collection of raw facts that are not organized and has no meaning on its own Information Data that has been organized. Meaningful and useful for decision making. 2.1 Difference Between Data and Information Data A collection of raw facts that are not organized and has no meaning on its own Information Data that has been organized. Meaningful and useful for decision making. 2.2 Data Processing Cycle Data Processing Cycle All data should go through stages of processing cycle in a logical order. Data preparation Data collection Information output Data input Data processing 2.2 Data Processing Cycle Data Collection An activity of collecting raw data from the outside world so that it can be put into an information system Can be done by survey, observation, interview and experiment The raw data is collected in a file called source document. 2.2 Data Processing Cycle Data Preparation A pre-process that involves, Keep track of incoming data. Data categorization Manual validity check Data-logging Check for completeness. Check whether the contextual information is included. Check whether the answers are reasonable and legible. Divide the raw data into different groups Sort the raw data into a specific order for facilitating subsequent data entry procedure 2.2 Data Processing Cycle Data Input and Sources of Error Mistakes can be made during data entry. The three different types of errors caused by manual input are data source error, transcription error and transposition error. Error Source of error Example Data source error Data source providers provides incorrect data. An interviewee reports an incorrect telephone number. Transcription error Data is read or typed in incorrectly. ‘1’ as ‘l’ or ‘o’ as ‘0’ Transposition error Two consecutive digits are swapped. Type 61 when you intend to type 16. 2.2 Data Processing Cycle Data Validation The process of comparing data with a set of rules or values to make sure that the data is reasonable and valid. Validity check Function Field presence check Ensure that all necessary fields are present. Field length check Ensure that the data has the correct number of characters or digits. Range check Ensure that the data value is within a predetermined range. Format check Ensure that the form of data follows some known patterns. Check digit A check digit calculated using a mathematical formula is added to a numeric data for selfchecking. 2.2 Data Processing Cycle Data Verification A control to check whether the input data matches that in the source document Two commonly used methods: ‘input data twice’ and ‘double data entry’ Input Data Twice An operator inputs the data twice. Computer checks second entry against first one. Double Data Entry Two operators enter the same data into two different files. Computer checks for discrepancies. 2.2 Data Processing Cycle Data Verification Hierarchical Structure of Data Database >> database table >> record, field A database is a collection of related data files. A database table is a file containing a collection of records with the same record structures. A record contains all the data relating to one item. Each piece of data is stored with specific format called a field. 2.2 Data Processing Cycle Data Verification Key Field One of the fields in a record which contains a unique value for identification of a specific record from a table 2.2 Data Processing Cycle Data Verification Database table Table 1: Student Information Database ID Name Gender Date of Birth Table 1 Table 4 A001 Chu SW M 19/9/1979 A002 Chan Y M 15/6/1979 Table 2 Table 3 A003 Au FH F 1/11/1980 A004 Chu YY M 15/6/1979 A005 Leung FH F 15/6/1980 Record Key Field Field 2.2 Data Processing Cycle Processing Data The type of data processing method we use depends on the result we hope to achieve. There are three types of data processing in database management sorting to organize a list of records in a specific order searching to retrieve a specific record of data from a database merging to combine the records of two tables into a new table 2.2 Data Processing Cycle Processing Data Sorting The process of rearranging records in a table in specific order A sort key is the field in which its value is used as a reference in rearranging records in a sorting process When the records are sorted, the order of records is physically rearranged in the table according to the sort key value 2.2 Data Processing Cycle Processing Data Sorting Sort key Record number ID Name 1 A001 Chu SW 2 A002 Chan Y 3 A003 Au FH 4 A004 Chu YY Record number ID Name Record number ID 1 A004 Chu YY 1 A003 Au FH 2 A003 Au FH 2 A002 Chan Y 3 A002 Chan Y 3 A001 Chu SW 4 A001 Chu SW 4 A004 Chu YY Sorted in descending order Name Sorted in ascending order Sort key 2.2 Data Processing Cycle Searching Data Purpose: To find specific information from a large database. Sequential Search Records are searched one by one until either the target record is found or all the records are searched. usually applied to an unsorted database table only feasible for the database tables containing a small number of records 2.2 Data Processing Cycle Searching Data Sequential Search Looking for information of Frank Record found! Name Weight (kg) Sex 001 John 50 M 002 Mary 48 F 003 Susan 49 F 004 Luke 62 M 005 Matthew 70 M 006 Mark 65 M 007 Winnie 45 F 008 Frank 58 M 009 James 72 M Record number 2.2 Data Processing Cycle Searching Data Binary Search an algorithm for searching a target record in a sorted database table Steps in a binary search 1. Check the record at the mid-point of the table. 2. If the target record is located, the search is completed. Otherwise, proceed to step 3. 3. Select the half of the list that should contain the target record. 4. Repeat step 1 until either the target record is found or the record list to be searched is empty. 2.2 Data Processing Cycle Searching Data Binary Search 2nd search: Record found! Name Weight (kg) Sex 008 Frank 58 M 009 James 72 M 001 John 50 M 004 Luke 62 M 006 Mark 65 M 002 Mary 48 F 005 Matthew 70 M 003 Susan 49 F 007 Winnie 45 F Record number Locate 1 4 2 =2nd record (integral part) =‘James’ 1st search: locate 1 9 2 = 5th record =‘Mark’ Target ‘James’ < ‘Mark’ Therefore, search upper record 1 to 4 2.2 Data Processing Cycle Merging Data A merged table contains all the records of the source tables preserves the record structure and sorting natures of the source tables Name Mark Mary 56 Peter 45 Paul 62 Merge Name Mark David 80 James 75 William 45 Name Mark David 80 James 75 Mary 56 Peter 45 Paul 62 William 45 Sorting nature preserved 2.2 Data Processing Cycle Merging Data A merged table contains all the records of the source tables preserves the record structure and sorting natures of the source tables Name Mark Mary 56 Peter 45 Paul 62 Merge Name Mark David 80 James 75 William 45 Name Mark David 80 James 75 Mary 56 Peter 45 Paul 62 William 45 Sorting nature preserved 2.3 Processing Information Information is processed for specific purpose. Reorganization of Information Includes presenting the information with different structures or manipulating the information from existing records Common manipulations Filtering and sorting Statistic calculation 2.3 Processing Information Conversion of Information Information can be represented in various forms that are favourable for subsequent operations or for specific devices. Examples Grading system Store examination grades by number for more efficient manipulation by computers. Analysis of fingerprint images Scanned images of fingerprints can be analyzed for special characteristics. The computer is able to identify the person even if his/her current fingerprints are slightly different from the stored ones. 2.3 Processing Information Communication of Information the exchange of information between two different systems Protocols: a set of communication rules by which different computer systems communicate TCP/IP is a common communication protocol that is used on the Internet. 2.3 Processing Information Transmission of Information Send information from one device to another. Can be categorized as serial and parallel Serial Transmission Parallel Transmission Transmission rate 1 bit at a time 8 bits or more Cost of transmission medium Low Relatively higher Suitable distance Long Short 2.3 Processing Information Transmission of Information Send information from one device to another. Can be categorized as serial and parallel Serial Transmission Parallel Transmission Transmission rate 1 bit at a time 8 bits or more Cost of transmission medium Low Relatively higher Suitable distance Long Short 2.4 Batch Processing and Real-time Processing Batch Processing A mode of operation that the computer processes a series of jobs or data files without the user’s interaction Procedures Accumulate data or jobs. Create a batch file to instruct the computer with the work to be done. The computer processes the data and jobs as instructed automatically at a specified time. 2.4 Batch Processing and Real-time Processing Real-time Processing A mode of operation that the program allows a job to be handled as fast as possible upon request Comparison Batch Processing Real-time Processing Time lag between the data collection and the data processing Response time is short. Information may be outdated. Information is always updated. More efficient use of resources Longer system idle time, hence lower system utilization 2.4 Batch Processing and Real-time Processing Real-time Processing A mode of operation that the program allows a job to be handled as fast as possible upon request Comparison Batch Processing Real-time Processing Time lag between the data collection and the data processing Response time is short. Information may be outdated. Information is always updated. More efficient use of resources Longer system idle time, hence lower system utilization 2.5 Denary, Binary and Hexadecimal Number System Number System A general number system of base b uses b digits A number is formed by a combination of these b digits Denary Uses Digits Contained Binary Hexadecimal Daily counting and calculation Computer systems For computerprogrammer communication 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 0, 1 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F ( A – F stand for 10 – 15 respectively) 2.5 Denary, Binary and Hexadecimal Number System Number System Conversion Binary to Denary Evaluate the place values of the digits The binary number 10112 in its expanded form is, Binary digit 1 0 1 1 Place value 23 22 21 20 Digit value 1 x 23 0 x 22 1 x 21 1 x 20 10112 = 1 x 23+0 x 22+ 1 x 21 +1 x 20 = 1110 2.5 Denary, Binary and Hexadecimal Number System Number System Conversion Hexadecimal to Denary Evaluate the place values of the digits The hexadecimal number 2CA916 in its expanded form is, Hexadecimal digit 2 C A 9 Place value 163 162 161 160 Digit value 2 x 163 12 x 162 10 x 161 9 x 160 2CA916 = 2 x 163+ 12 x 162 + 10 x 161 + 9 x 160 = 1143310 2.5 Denary, Binary and Hexadecimal Number System Number System Conversion Denary to a number system with base b Divide the denary number by b repetitively until the quotient is smaller than b. Obtain the answer by writing up from the quotient to the remainders in reverse order. 2.5 Denary, Binary and Hexadecimal Number System Number System Conversion Binary to hexadecimal Group the digits of the binary number by four starting from your right-hand side. Replace each group of the four digits by an equivalent hexadecimal digit. 2.5 Denary, Binary and Hexadecimal Number System Number System Conversion Hexadecimal to binary Convert each digit of the hexadecimal number to a group of four binary digits. Obtain the binary number by grouping all the binary digits together. 2.5 Denary, Binary and Hexadecimal Number System Addition and Subtraction of Different Number Systems The rules are the same as in the denary system. A ‘carry’ is generated when the sum of digits exceeds the base value. A ‘borrow’ from the left digit is necessary if a larger digit is subtracted from a smaller one. 2.5 Denary, Binary and Hexadecimal Number System Addition and Subtraction of Different Number Systems The rules are the same as in the denary system A ‘carry’ is generated when the sum of digits exceeds the base value. A ‘borrow’ from the left digit is necessary if a larger digit is subtracted from a smaller one. 2.6 Bit and Byte Bit A single binary digit The basic unit for storing data on a computer Able to hold two distinct values only With many bits, we can represent a number large enough for practical use. 8 bits can represent 28 = 256 different values, and so can hold a number as large as 25510. 2.6 Bit and Byte Byte Consists of 8 bits The smallest addressable data in the microprocessor Unit Abbreviation Value Kilobyte KB 210 = 1,024 bytes Megabyte MB 220 = 1,024 KB Gigabyte GB 230 = 1,024 MB Terabyte TB 240 = 1,024 GB 2.6 Bit and Byte Notation of bit and byte ‘b’ as an abbreviation for bit A capital letter ‘B’ for byte bps stands for bits per second Bps stands for bytes per second 2.6 Bit and Byte Notation of bit and byte ‘b’ as an abbreviation for bit A capital letter ‘B’ for byte bps stands for bits per second Bps stands for bytes per second 2.7 Character Coding Systems Use of Character Coding System A way to represent data in a form that can be manipulated efficiently in a computer ASCII A common system for computers and communication devices Each code represents either a printable character or a non-printable character. Each character takes up 8 bits, but the first bit is always set as ‘0’, so it uses just 7 bits for each character, and it can have 27 = 128 different characters. 2.7 Character Coding Systems Chinese Character Coding Systems One-byte coding system does not have enough space to hold characters of most Asian languages. Chinese characters are usually represented in Big5 code, GB code and Unicode. Big5 code Mainly used to represent traditional Chinese Uses two bytes to represent one Chinese character 2.7 Character Coding Systems Chinese Character Coding Systems GB code Standard code for simplified Chinese Uses two bytes to represent one Chinese character Wrong coding system used may lead to strange characters. Unicode An international standard code that sets the codes for commonly-used characters in the world 2.7 Character Coding Systems Chinese Character Coding Systems GB code Standard code for simplified Chinese Uses two bytes to represent one Chinese character Wrong coding system used may lead to strange characters Unicode An international standard code that sets the codes for commonly-used characters in the world