2.2 Data Processing Cycle

advertisement
2.1 Difference Between Data and
Information
 Data
 A collection of raw facts that are not
organized and has no meaning on its own
 Information
 Data that has been organized.
 Meaningful and useful for decision making.
2.1 Difference Between Data and
Information
 Data
 A collection of raw facts that are not
organized and has no meaning on its own
 Information
 Data that has been organized.
 Meaningful and useful for decision making.
2.2 Data Processing Cycle
 Data Processing Cycle
 All data should go through stages of
processing cycle in a logical order.
Data
preparation
Data
collection
Information
output
Data input
Data
processing
2.2 Data Processing Cycle
 Data Collection
 An activity of collecting raw data from the
outside world so that it can be put into an
information system
 Can be done by survey, observation,
interview and experiment
 The raw data is collected in a file called
source document.
2.2 Data Processing Cycle
 Data Preparation
 A pre-process that involves,

Keep track of
incoming
data.
Data
categorization
Manual validity
check
Data-logging



Check for
completeness.
Check whether
the contextual
information is
included.
Check whether
the answers are
reasonable and
legible.


Divide the raw
data into
different groups
Sort the raw
data into a
specific order
for facilitating
subsequent
data entry
procedure
2.2 Data Processing Cycle
 Data Input and Sources of Error
 Mistakes can be made during data entry.
 The three different types of errors caused
by manual input are data source error,
transcription error and transposition error.
Error
Source of error
Example
Data source
error
Data source providers
provides incorrect data.
An interviewee reports
an incorrect telephone
number.
Transcription
error
Data is read or typed in
incorrectly.
‘1’ as ‘l’ or ‘o’ as ‘0’
Transposition
error
Two consecutive digits are
swapped.
Type 61 when you
intend to type 16.
2.2 Data Processing Cycle
 Data Validation
 The process of comparing data with a set of
rules or values to make sure that the data is
reasonable and valid.
Validity check
Function
Field presence
check
Ensure that all necessary fields are present.
Field length
check
Ensure that the data has the correct number of
characters or digits.
Range check
Ensure that the data value is within a predetermined range.
Format check
Ensure that the form of data follows some
known patterns.
Check digit
A check digit calculated using a mathematical
formula is added to a numeric data for selfchecking.
2.2 Data Processing Cycle
 Data Verification
 A control to check whether the input data
matches that in the source document
 Two commonly used methods: ‘input data
twice’ and ‘double data entry’
Input Data Twice
 An operator inputs the data twice.
 Computer checks second entry against first
one.
Double Data Entry
 Two operators enter the same data into two
different files.
 Computer checks for discrepancies.
2.2 Data Processing Cycle
 Data Verification
 Hierarchical Structure of Data
Database >> database table >> record,
field
 A database is a collection of related data files.
 A database table is a file containing a
collection of records with the same record
structures.
 A record contains all the data relating to one
item.
 Each piece of data is stored with specific
format called a field.
2.2 Data Processing Cycle
 Data Verification
 Key Field
One of the fields in a record which
contains a unique value for identification
of a specific record from a table
2.2 Data Processing Cycle
 Data Verification
Database table
Table 1: Student Information
Database
ID
Name
Gender
Date of
Birth
Table
1
Table
4
A001
Chu SW
M
19/9/1979
A002
Chan Y
M
15/6/1979
Table
2
Table
3
A003
Au FH
F
1/11/1980
A004
Chu YY
M
15/6/1979
A005
Leung FH
F
15/6/1980
Record
Key
Field
Field
2.2 Data Processing Cycle
 Processing Data
 The type of data processing method we use
depends on the result we hope to achieve.
 There are three types of data processing in
database management
sorting
 to organize a list of records in a specific order
searching
 to retrieve a specific record of data from a
database
merging
 to combine the records of two tables into a
new table
2.2 Data Processing Cycle
 Processing Data
 Sorting
The process of rearranging records in a
table in specific order
A sort key is the field in which its value is
used as a reference in rearranging
records in a sorting process
When the records are sorted, the order
of records is physically rearranged in the
table according to the sort key value
2.2 Data Processing Cycle
 Processing Data
 Sorting
Sort
key
Record
number
ID
Name
1
A001
Chu SW
2
A002
Chan Y
3
A003
Au FH
4
A004
Chu YY
Record
number
ID
Name
Record
number
ID
1
A004
Chu YY
1
A003
Au FH
2
A003
Au FH
2
A002
Chan Y
3
A002
Chan Y
3
A001
Chu SW
4
A001
Chu SW
4
A004
Chu YY
Sorted in
descending order
Name
Sorted in ascending
order
Sort
key
2.2 Data Processing Cycle
 Searching Data
 Purpose: To find specific information from a
large database.
 Sequential Search
Records are searched one by one until
either the target record is found or all
the records are searched.
usually applied to an unsorted database
table
only feasible for the database tables
containing a small number of records
2.2 Data Processing Cycle
 Searching Data
 Sequential Search
Looking for information of Frank
Record
found!
Name
Weight
(kg)
Sex
001
John
50
M
002
Mary
48
F
003
Susan
49
F
004
Luke
62
M
005
Matthew
70
M
006
Mark
65
M
007
Winnie
45
F
008
Frank
58
M
009
James
72
M
Record
number
2.2 Data Processing Cycle
 Searching Data
 Binary Search
 an algorithm for searching a target
record in a sorted database table
 Steps in a binary search
1. Check the record at the mid-point of the
table.
2. If the target record is located, the search is
completed. Otherwise, proceed to step 3.
3. Select the half of the list that should contain
the target record.
4. Repeat step 1 until either the target record
is found or the record list to be searched is
empty.
2.2 Data Processing Cycle
 Searching Data
 Binary Search
2nd search:
Record
found!
Name
Weight
(kg)
Sex
008
Frank
58
M
009
James
72
M
001
John
50
M
004
Luke
62
M
006
Mark
65
M
002
Mary
48
F
005
Matthew
70
M
003
Susan
49
F
007
Winnie
45
F
Record
number
Locate
1 4
2
=2nd record (integral part)
=‘James’
1st search:
locate
1 9
2
= 5th record =‘Mark’
Target ‘James’ < ‘Mark’
Therefore, search upper record
1 to 4
2.2 Data Processing Cycle
 Merging Data
 A merged table
 contains all the records of the source
tables
 preserves the record structure and
sorting natures of the source tables
Name
Mark
Mary
56
Peter
45
Paul
62
Merge
Name
Mark
David
80
James
75
William
45
Name
Mark
David
80
James
75
Mary
56
Peter
45
Paul
62
William
45
Sorting nature
preserved
2.2 Data Processing Cycle
 Merging Data
 A merged table
 contains all the records of the source
tables
 preserves the record structure and
sorting natures of the source tables
Name
Mark
Mary
56
Peter
45
Paul
62
Merge
Name
Mark
David
80
James
75
William
45
Name
Mark
David
80
James
75
Mary
56
Peter
45
Paul
62
William
45
Sorting nature
preserved
2.3 Processing Information

Information is processed for specific
purpose.
 Reorganization of Information
 Includes presenting the information with
different structures or manipulating the
information from existing records
 Common manipulations
 Filtering and sorting
 Statistic calculation
2.3 Processing Information
 Conversion of Information
 Information can be represented in various
forms that are favourable for subsequent
operations or for specific devices.
 Examples
 Grading system
 Store examination grades by number for
more efficient manipulation by computers.
 Analysis of fingerprint images
 Scanned images of fingerprints can be
analyzed for special characteristics.
 The computer is able to identify the person
even if his/her current fingerprints are
slightly different from the stored ones.
2.3 Processing Information
 Communication of Information
 the exchange of information between two
different systems
 Protocols: a set of communication rules by
which different computer systems
communicate
 TCP/IP is a common communication
protocol that is used on the Internet.
2.3 Processing Information
 Transmission of Information
 Send information from one device to
another.
 Can be categorized as serial and parallel
Serial
Transmission
Parallel
Transmission
Transmission
rate
1 bit at a time
8 bits or more
Cost of
transmission
medium
Low
Relatively
higher
Suitable
distance
Long
Short
2.3 Processing Information
 Transmission of Information
 Send information from one device to
another.
 Can be categorized as serial and parallel
Serial
Transmission
Parallel
Transmission
Transmission
rate
1 bit at a time
8 bits or more
Cost of
transmission
medium
Low
Relatively
higher
Suitable
distance
Long
Short
2.4 Batch Processing and Real-time
Processing
 Batch Processing
 A mode of operation that the computer
processes a series of jobs or data files
without the user’s interaction
 Procedures
 Accumulate data or jobs.
 Create a batch file to instruct the
computer with the work to be done.
 The computer processes the data and
jobs as instructed automatically at a
specified time.
2.4 Batch Processing and Real-time
Processing
 Real-time Processing
 A mode of operation that the program
allows a job to be handled as fast as
possible upon request
 Comparison
Batch Processing
Real-time Processing
Time lag between the data
collection and the data
processing
Response time is short.
Information may be
outdated.
Information is always
updated.
More efficient use of
resources
Longer system idle time,
hence lower system
utilization
2.4 Batch Processing and Real-time
Processing
 Real-time Processing
 A mode of operation that the program
allows a job to be handled as fast as
possible upon request
 Comparison
Batch Processing
Real-time Processing
Time lag between the data
collection and the data
processing
Response time is short.
Information may be
outdated.
Information is always
updated.
More efficient use of
resources
Longer system idle time,
hence lower system
utilization
2.5 Denary, Binary and
Hexadecimal Number System
 Number System
 A general number system of base b uses b
digits
 A number is formed by a combination of
these b digits
Denary
Uses
Digits
Contained
Binary
Hexadecimal
Daily
counting and
calculation
Computer
systems
For computerprogrammer
communication
0, 1, 2, 3, 4,
5, 6, 7, 8, 9
0, 1
0, 1, 2, 3, 4, 5, 6,
7, 8, 9, A, B, C, D,
E, F ( A – F stand
for 10 – 15
respectively)
2.5 Denary, Binary and
Hexadecimal Number System
 Number System Conversion
 Binary to Denary
 Evaluate the place values of the digits
 The binary number 10112 in its
expanded form is,
Binary digit
1
0
1
1
Place value
23
22
21
20
Digit value
1 x 23
0 x 22
1 x 21
1 x 20
 10112 = 1 x 23+0 x 22+ 1 x 21 +1 x 20
= 1110
2.5 Denary, Binary and
Hexadecimal Number System
 Number System Conversion
 Hexadecimal to Denary
 Evaluate the place values of the digits
 The hexadecimal number 2CA916 in its
expanded form is,
Hexadecimal
digit
2
C
A
9
Place value
163
162
161
160
Digit value
2 x 163
12 x 162 10 x 161
9 x 160
 2CA916
= 2 x 163+ 12 x 162 + 10 x 161 + 9 x 160
= 1143310
2.5 Denary, Binary and
Hexadecimal Number System
 Number System Conversion
 Denary to a number system with base b
 Divide the denary number by b
repetitively until the quotient is smaller
than b.
 Obtain the answer by writing up from
the quotient to the remainders in
reverse order.
2.5 Denary, Binary and
Hexadecimal Number System
 Number System Conversion
 Binary to hexadecimal
 Group the digits of the binary number
by four starting from your right-hand
side.
 Replace each group of the four digits
by an equivalent hexadecimal digit.
2.5 Denary, Binary and
Hexadecimal Number System
 Number System Conversion
 Hexadecimal to binary
 Convert each digit of the hexadecimal
number to a group of four binary digits.
 Obtain the binary number by grouping
all the binary digits together.
2.5 Denary, Binary and
Hexadecimal Number System
 Addition and Subtraction of
Different Number Systems
 The rules are the same as in the denary
system.
 A ‘carry’ is generated when the sum of
digits exceeds the base value.
 A ‘borrow’ from the left digit is necessary if
a larger digit is subtracted from a smaller
one.
2.5 Denary, Binary and
Hexadecimal Number System
 Addition and Subtraction of
Different Number Systems
 The rules are the same as in the denary
system
 A ‘carry’ is generated when the sum of
digits exceeds the base value.
 A ‘borrow’ from the left digit is necessary if
a larger digit is subtracted from a smaller
one.
2.6 Bit and Byte
 Bit
 A single binary digit
 The basic unit for storing data on a
computer
 Able to hold two distinct values only
 With many bits, we can represent a
number large enough for practical use.
 8 bits can represent 28 = 256 different
values, and so can hold a number as large
as 25510.
2.6 Bit and Byte
 Byte
 Consists of 8 bits
 The smallest addressable data in the
microprocessor
Unit
Abbreviation
Value
Kilobyte
KB
210 = 1,024 bytes
Megabyte
MB
220 = 1,024 KB
Gigabyte
GB
230 = 1,024 MB
Terabyte
TB
240 = 1,024 GB
2.6 Bit and Byte
 Notation of bit and byte




‘b’ as an abbreviation for bit
A capital letter ‘B’ for byte
bps stands for bits per second
Bps stands for bytes per second
2.6 Bit and Byte
 Notation of bit and byte




‘b’ as an abbreviation for bit
A capital letter ‘B’ for byte
bps stands for bits per second
Bps stands for bytes per second
2.7 Character Coding Systems
 Use of Character Coding System
 A way to represent data in a form that can
be manipulated efficiently in a computer
 ASCII
 A common system for computers and
communication devices
 Each code represents either a printable
character or a non-printable character.
 Each character takes up 8 bits, but the
first bit is always set as ‘0’, so it uses just 7
bits for each character, and it can have 27
= 128 different characters.
2.7 Character Coding Systems
 Chinese Character Coding
Systems
 One-byte coding system does not have
enough space to hold characters of most
Asian languages.
 Chinese characters are usually represented
in Big5 code, GB code and Unicode.
 Big5 code
 Mainly used to represent traditional
Chinese
 Uses two bytes to represent one
Chinese character
2.7 Character Coding Systems
 Chinese Character Coding
Systems
 GB code
 Standard code for simplified Chinese
 Uses two bytes to represent one
Chinese character
 Wrong coding system used may lead to
strange characters.
 Unicode
 An international standard code that sets
the codes for commonly-used characters in
the world
2.7 Character Coding Systems
 Chinese Character Coding
Systems
 GB code
 Standard code for simplified Chinese
 Uses two bytes to represent one
Chinese character
 Wrong coding system used may lead to
strange characters
 Unicode
 An international standard code that sets
the codes for commonly-used characters in
the world
Download