Data Conversion ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 11 Slide Definition Data conversion is the transformation of data into a particular format. The data may be in the form of documents written in Word and we want to convert that data to database tables. It may be on another machine, it may be in a different encryption. We need to convert it to a particular format. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 22 Slide Tasks Involved 1. Data Loading 2. Data Scrubbing 3. Data Uploading 4. Data Validation ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 33 Slide Tasks Involved 1. Data Loading Data Loading is done by either Manual Data Loading • keying in the needed data from scratch OR Legacy System Extraction • ©Ian Sommerville 2004 extraction of data from legacy systems. Software Engineering, 7th edition. Chapter 4 Slide 44 Slide Tasks Involved 1. Data Loading 1a. Manual Data Loading Decide how you want to load your data making sure you define CLEARLY the format, null values, etc. Write a throw away program that allows you a one time load of the data. This task should be placed on your project plan and can be started as soon as your database tables are complete. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 55 Slide Tasks Involved 1. Data Loading 1b. Legacy System Extraction The objective in this task is find any data that currently exist in legacy systems that you can use and convert it to your format. It helps to find or develop tools that allow you to quickly export legacy data into a standard format, such as text files. Often the biggest challenges is understanding the legacy data schema, the “map” that tells where in the databases where you can find the full set of information that would be valuable to migrate to your system. Tools exist that can “reverse engineer” schemas, by a trial-and-error process of exploring the legacy database and producing outputs that are comparable with screens or reports. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 66 Slide Tasks Involved 2. Data Scrubbing Data scrubbing involves cleaning, re-formatting, and appropriately mapping legacy data, so that it can be smoothly uploaded into your system. Executing this phase well yields much of the benefit of a good data conversion, Avoid disruption from changes in key systems functions, clean up “bad data” and plan for the upload of correct data. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 77 Slide Tasks Involved 3. Data Uploading The Idea is to quickly and smoothly upload converted data by providing the data exactly in the format specified. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 88 Slide Tasks Involved 4. Data Validation Data conversion is complex, and it can be difficult to anticipate with 100% accuracy exactly how converted data will appearafter you go through steps loading the data. Data can become corrupted during extraction; unusual data can be overlooked and re-formatted incorrectly (for example a small number of international phone numbers or addresses in a database with 95%+ US data); and upload scripts can hiccup. A robust range of checks should be run to catch mistakes and to tune and refine the data conversion. Some data is more important such as money amounts and totals and ranges should be used to assure these fields are feasible. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 4 Slide 99 Slide