Data Processing Data Data is the term used to define facts which do not serve any useful purpose until they have been converted in to a more meaningful form by data processing operations. Data processing Refers to a class of programs that organize and manipulate data, usually large amounts of numeric data. Computer data processing is any process that a computer program does to enter data and summarize, analyze or otherwise convert data into usable information. Accounting programs are the prototypical examples of data processing applications. In contrast, word processors, which manipulate text rather than numbers, are not usually referred to as data processing applications. Evolution of data processing Data processing (the collecting, manipulating, and distributing of data) has been practiced since earliest recorded history. The methods of data processing have gone through an evolutionary process: from manual (data processing), to electromechanical (automatic data processing), to electronic or computerized. Electronic data processing is often referred to simply as data processing. Purpose of data processing Whether manual, electromechanical, or electronic, the purpose of data processing remains the same: to organize raw data into meaningful information needed for decision-making. In common parlance, data and information are used interchangeably. But strictly speaking, they are distinct and separate. A list of the day's checks and deposit slips (the data) means very little to a bank manager until they have been manipulated into summary report form (information), giving the total number and dollar value of deposit and withdrawals. Information, then, is data that has been organized and processed. The purpose of data processing is to evaluate and organize data, to produce meaningful information that can be used in decision-making. To be of value, information must be delivered to the right person at the right time, and in the right place. It must be accurate, timely, complete, concise, and relevant Data processing is a specialist activity that concerned with systematic recording, arranging, filling, processing and dissemination of facts relating to the physical events that occur. Steps in data processing Input Three steps are involved when inputting data into the computer: collection, verification, and coding. Collection refers to gathering the data from a variety of sources and assembling it. Verification means checking the data to determine whether it is accurate and complete, and if it should be included for processing. Coding is translating the data into machine-readable form. Data punched into IBM cards is one example of coding. Process During processing or manipulation, one or more of the following tasks may be performed on the input data. a) Classifying. Data are organized by characteristics meaningful to the user. For example; a student may be identified by Social Security number, class and exam number. b) Sorting. In this step, the data may be arranged in a particular sequence to facilitate processing. c) Calculating. Calculations may be required to determine a patient's account balance or a student's grade point average. Output a) Output activities include retrieving, converting, storing, and communicating. Retrieving involves pulling information from storage devices for use by the decision-maker. b) Converting means translating information from the computer form used to store it, to a form understandable by the user (such as, a CRT display or printed report). c) Storing involves transferring the data onto a storage medium, such as a disk or tape file for future use. d) Communication takes place when the relevant accurate information is in the right place at the right time. More explained data processing model Data processing cycle It is Sequence of steps performed repeatedly by a computer in the execution of a program. The computer's central processing unit (CPU) continuously works through a loop, involving fetching a program instruction from memory, fetching any data it needs, operating on the data, and storing the result in the memory, before fetching another program instruction. Collection For the data to be available it should be collected. Collecting data can be very time consuming and at certain time boring. It is a process that most employees will want to run away from. It may involve travelling or having to sit for a long time. Preparation After data is collected, it should be prepared for processing. Raw data cannot be processed. In today’s world, computers are the main tools of data processing. The data collected should therefore, be prepared for the computer. There should be codes assigned to each type of response or phenomenon. This is quite technical and if not well done, the results will not be valid. Methods of Data Collection and Preparation Data can be collected manually or by using an automated mechanism. Manual collection of data Data can be collected by manually using following methods. 1. Observation Method 2. Interview Method 3. Thru Questionnaires/Schedules Observation Method Observation becomes a scientific tool and the method of data collection, when it serves a formulated research purpose, is systematically planned and recorded and is subjected to checks and controls on validity and reliability. Main advantages are: Subjective bias is eliminated The information relates to what is currently happening This method is independent of respondent’s willingness to respond Main Limitations are: It is expensive The information provided by this method is very limited Unforeseen factors may interfere with the observation task Interview Method The Interview Method of collecting data involves presentation of oral-verbal stimuli and reply in terms of oral – verbal responses Advantages More information and in greater depth can be obtained Resistance may be overcome by a skilled interviewer Greater flexibility – an opportunity to restructure questions Observation method can also be applied to recording verbal answers Personal information can be obtained Possibility of spontaneous responses and thus more honest responses Disadvantages Expensive method Interviewer bias Respondent bias Time consuming Under the interview method the organization required for selecting, training, and supervising the field staff is complex with formidable problems Establishing rapport to facilitate free and frank responses is very difficult Data Collection thru Questionnaires This method is popular in major studies. Briefly – a Questionnaire is sent (by post) to the persons concerned with a request to answer the questions and return the Questionnaire. A Questionnaire consists of a number of questions printed in a definite order on a form. The Questionnaire is mailed to respondents who are expected to read and understand the questions and write down the reply in the space provided This method is Low cost – even when the universe is large and is widespread Free from interviewer bias Respondents have adequate time to think thru their answers Respondents who are not easily approachable, can also be reached conveniently Large samples can be used Also Low rate of return Respondents need to be educated and cooperative Inbuilt inflexibility Possibility of ambiguous replies or omission of items This method is slow Then the collected data are prepared in to a structured manner before the input stage. Devices that can be used to collect data Mark Sense Cards Cards are divided into boxes that can be marked. A mark sense reader then scans the card and detects where marks have been made. Examples: Answer sheet, lottery ticket. Advantages: Simple to use Fast to enter data Data entry is very accurate. Bar Codes Bar codes appear on almost every item you can buy from books and newspapers to tins of beans. Barcodes are made up of a series of lines used to represent information. The information stored is as follows: The country the product comes from. The code for the manufacturer of the product. The code for the product name and size. A check digit used to ensure the data is entered correctly. In a shop the bar code is scanned and further information about the price and amount of that product there is in stock is accessed from the shops computerized database. Using barcodes in shops means that items do not have to be individually priced and the change the price only a single field in the database needs to be changed. Other advantages are that prices cannot be entered incorrectly by the shop staff and the speed at which items are scanned is much quicker, causing less queuing and less tills needing to be opened. Magnetic Stripes Magnetic stripes can be seen on train tickets or bank or credit cards. These stripes hold a small amount of data (64 characters) and can be read by a magnetic stripe reader (card reader) that is connected to a computer system. These provide a quick and accurate way of entering details into a computer system and are simple to operate. Smart Cards Most bank and credit cards are now smart cards. Cards have their own processor and memory that can hold up to 64KB of data. The data that is stored can be updated and the processor can process simple programs. Magnetic Ink Character Recognition Special magnetic ink is used to print details that can be read by a magnetic ink reader. Cheques have the bank account number and sort code printed in magnetic ink. Advantages of using magnetic ink on cheques include: Bundles of cheques can be processed very quickly. It is very difficult to forge a cheque. The ink can be read by the reader even if the cheque gets marked or dirty. Optical Character Recognition Hardcopy is scanned and the image is then looked at by OCR software that recognizes text. Most scanner come with OCR software. It should be noted that OCR software usually only recognizes printed text, not handwriting and then it only recognize certain fonts. The main advantage of OCR is that time can be saved not having to retype documents. Input This is another time consuming process in data processing. For large companies, a lot of people are needed for several days to have this work done. The cost involves is so high and therefore, many businesses are resorting to outsource for this purpose. In that way, they save cost. The volume of data that must be entered in a day is so huge that people have set up data entry companies for that purpose. There are many others who are freelancing in this area. Data entry does not require much education. It is a simple process. What is required is speed and accuracy. Input Methods Data input method can be direct or indirect. Direct Input Direct data are machine readable data that can be fed to the system directly. The process of data conversion is time consuming and error prone. This can be avoided by using direct input method. Credit card is such a device that contains data that input in direct method. Example: Credit card reader is one of the direct input devices. The credit card has magnetic strip which is fixed on the card which contains vital information viz., owner’s code and the details. This card is inserted into card reader and it processes up the details. Then card number is noted and the amount is credited. Indirect data Data is in the human readable form hence it has to be converted into machine readable form. This involves the data conversion. This process of data conversion is time consuming and error prone and it causes a major bottleneck in the data processing, keyboard, mouse and joystick are some of the examples of indirect input devices. Checking Data Entry When data is entered into a computer system is it important to try to ensure that it is correct. There are two methods used to try to do this: 1. Validation The computer system checks to see that data meets certain requirements but they do not ensure that data is correct. Validation checks and how they work: a. b. c. d. Presence Check - does not allow the user to continue until an entry has been made. Range Check - Ensures data is between a specified upper and lower limit. Field Length Check - Ensures the correct number of characters has been used. Data Type Check - Ensure the correct type of data (numbers/text/date.) has been entered. e. Check Digits - This is a common form of validation used in barcodes and other validation of numeric codes. The check digit is the last number of the code and it is calculated by doing a calculation on the other numbers in the code. When the number is entered, the computer carries out the same calculation and it checks its answer against the check digit entered. If the two are the same then the number is accepted as being entered correctly. 2. Verification Data is checked to ensure that it is exactly correct. There are two methods of verification: a. Double Entry - The user enters the same data twice and if both entries are exactly the same then it is assumed that the entry is correct. b. Checks - Once data has been entered the computer asks the user to read the data and click YES or NO depending on whether the user thinks the data is correct or not. Processing After the input is over, then comes the time for the processing itself. This is the time various means and methods are used to manipulate the inputted data. In the past, when computers were not available, or when they were not common, people had to do this. It was then a herculean task. However, with an advent of computers the process is now very easy. Many software programs are available for processing large volumes of data within very short periods. Some are general while some are for specific industries and processes. Computers have made data processing at this stage very easy. A click at a button is enough to produce the content. Output and interpretation The importance of data processing is to provide information that will guide future company policies. That makes output very important. When the output is available, it should be interpreted in a way that makes it useful for the company. Without interpretation the company does not benefit from the whole process. The output can be interpreted using devices like monitors and printers. These are the most common methods of data output but often a different method is used. Output to File The data processing cycle shows that output from one process is often used as input for another process. Output can be saved as a data file that is then stored on backing storage from where it can be loaded as input for a process when required. Output activities Output activities include retrieving, converting, storing, and communicating. Retrieving involves pulling information from storage devices for use by the decision-maker. Converting means translating information from the computer form used to store it, to a form understandable by the user (such as, a CRT display or printed report). Storing involves transferring the data onto a storage medium, such as a disk or tape file for future use. Communication takes place when the relevant accurate information is in the right place at the right time Storage The last stage in data processing is storage. The data inputted, and the result of the process must be stored in a safe manner. This will enable it to be used another time. If the process is not stored, there will not be a good ground for future comparison. Since as the output data/information input data may be valuable and need to be recorded or stored safely. Data storage is the holding of data in an electromagnetic form for access by a computer processor. There are two main kinds of storage: Primary storage is data that is held in in random access memory (RAM) and other memory devices that are built into computers. Secondary storage is data that is stored on external storage devices such as hard disks, tapes, CD's. Hard disks Often called a disk drive, hard drive or hard disk drive, this method of data storage stores and provides relatively quick access to large amounts of data. The information is stored on electromagnetically charged surfaces called 'platters'. Floppy disks A floppy disk is a type of magnetic disk memory which consists of a flexible disk with a magnetic coating. Almost all floppy disks for personal computers now have a capacity of 1.44 megabytes. Floppy disks are readily portable, and are very popular for transferring software from one PC to another. They are, however, very slow compared to hard disks and lack storage capacity. Increasingly, therefore, computer manufacturers are not including floppy disk drives in the products as a built-in storage option. Tape storage Tape is used as an external storage medium. It consists of a loop of flexible celluloid-like material that can store data in the form of electromagnetic charges. A tape drive is the device that positions, writes from, and reads to the tape. A tape cartridge is a protectively-encased tape that is portable. Optical disks An optical disc is a storage medium that can be written to and read using a low-powered laser beam. A laser reads these dots, and the data is converted to an electrical signal, finally converted into the original data. CD-R Compact Disc-Recordable ("CD-R") discs have become a universal data storage medium worldwide. CD-Rs are becoming increasingly popular for music recording and for file storage or transfer between personal computers. CDR discs are write-once media. This means that - once used -they cannot be erased or re-recorded upon. CD-R discs can be played back in any audio CD player or CD-ROM drive, as well as many DVD players and drives. CD-RW Compact Disc-Rewritable (CD-RW) disks are rewritable and can be erased and re-recorded upon over and over again. CD-RW discs can only be used on CD players, CD-ROM drives, and DVD players and drives that are CD-RW playback-compatible. DVD A DVD (Digital Versatile Disc or Digital Video Disc) is a high density optical disc with large capacity for storage of data, pictures and sound. The capacity is 4.7 GB for single sided, single layer DVD disc - which is approximately 7 times larger than that of a compact disc. Data Processing systems Data Processing System is a system which processes data which has been captured and encoded in a format recognizable by the data processing system or has been created and stored. Data processing systems can be categorized in to following categories by the operation method that uses. 1. Batch Processing 2. Real-Time Processing Batch processing systems Non-continuous processing of data, instructions, or materials. In data transmission, batch processing is used for very large files or where a fast response time is not critical. The files to be transmitted are gathered over a period and then send together as a batch. Most data processing is done using batch processing (also known as serial, sequential, or offline processing). Batch processing involves processing transactions on the computer at specified times. Telephone billing system for example, is normally processed in batch mode. On a predetermined date at a predetermined time, the variable information about phone usage (call records- numbers, time duration, rate, etc.) is entered and the computer produces all of the phone bill checks and information at the same time. The phone bill information is allowed to accumulate and entered as a batch or group at a central computer site or other location. Advantages It allows sharing of computer resources among many users and programs. It shifts the time of job processing to when the computing resources are less busy. It avoids idling the computing resources with minute-by-minute manual intervention and supervision. By keeping high overall rate of utilization, it better amortizes the cost of a computer, especially an expensive one. Disadvantages There is always a delay before work is processed and returned. Batch processing usually involves an expensive computer and a large number of trained staff. Real-Time Processing Systems Data processing that appears to take place, or actually takes place, instantaneously upon data entry or receipt of a command. In computer science, real-time computing, or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint". Operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints. Often real-time response times are understood to be in the order of milliseconds and sometimes microseconds. In contrast, a non-real-time system is one that cannot guarantee a response time in any situation, even if a fast response is the usual result. Examples: Anti-missile defense systems, airplane landing control system, electronic fund transfer systems and tickets reservation systems. Advantages There is no significant delay for response. Information is always up-to-date. Output from the computer may be used to adjust and improve the input. Disadvantages A computer must be dedicated solely to the task. The computer must be continually online. And there are more processing systems like Online Processing- This is a method that utilizes Internet connections and equipment directly attached to a computer. It is used mainly for information recording and research. Distributed Processing- This method is commonly utilized by remote workstations connected to one big central workstation or server. ATMs are good examples of this data processing method.