Chapter 7, Populating a Database |1| Chapter Overview A. B. C. D. E. Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with DTS Graphical Tools Working with DTS Packages Using the Bulk Copy Program (Bcp) and the BULK INSERT TransactSQL Statement Chapter 7, Lesson 1 Transferring and Transforming Data |2| 1. Data Import Preparation Tasks A. B. C. D. E. F. |3| 2. Data Transformations A. B. C. D. |4| 3. Verify internal consistency of data at data source. Determine whether additional columns are needed—due to assumed values in source data. Determine whether data format of source data requires modification— for example, due to data formats or readability of data. Determine whether existing data columns need to be aggregated or separated. Determine whether data import is a one-time task or a regularly occurring task. Determine access rights to the source data. Can data transformations be made in the data source? Should temporary tables and Transact-SQL statements be used to scrub and cleanse the data after import into SQL Server? Can Data Transformation Services be used to make changes to the data during the import process itself? Will transformations be applied to a regularly scheduled import or is the import a one-time task? Data Transfer Tools A. B. C. DTS is a graphical tool used to import, export, and transform data to and from a wide range of data sources. Bcp is a command-prompt utility used to copy data from a text file to a SQL Server 2000 table or view, or from a SQL Server 2000 table or view to a text file. The BULK INSERT statement is a Transact-SQL statement used to copy data from an ASCII text file to a SQL Server 2000 table or view. Chapter 7, Lesson 2 Introducing Microsoft Data Transformation Services (DTS) |5| 1. DTS Process A. B. |6| 2. DTS Connections A. B. C. D. |7| 3. D. 4. Discrete unit of work within a DTS package Can execute sequentially or in parallel SQL Server provides a wide range of tasks accessible from the DTS Import/Export wizard and DTS Designer. The DTS Parallel Data Pump task can only be accessed programmatically. DTS Tasks That Copy Data and Data Objects A. B. C. D. E. F. G. H. 2 DTS packages require a data source and a data destination. Packages can connect to a standard database, a Microsoft Excel spreadsheet, an HTML source, or any OLE DB or ODBC provider. Packages can also connect to a delimited or fixed field format text file. Packages can connect to an intermediate file (.UDL) that stores connection information. DTS Tasks A. B. C. |8| The DTS process consists of 1. Connecting to a data source and a data destination 2. Moving data from the data source to the data destination 3. Performing transformations, if desired, as data is moved to the data destination Data sources and destination types include 1. Native OLE DB 2. ODBC 3. ASCII text files 4. Customized third-party OLE DB providers The Bulk Insert task runs the BULK INSERT statement from within a DTS package. The Execute SQL task runs one or more Transact-SQL statements from within a DTS package. The Copy SQL Server Objects task copies SQL Server databasespecific objects between SQL Server instances. The Transfer Database task moves or copies an entire database between SQL Server instances. The Transfer Error Messages task copies user-specified error messages between SQL Server instances. The Transfer Logins task copies logins between SQL Server instances. The Transfer Jobs task copies jobs between SQL Server instances. The Transfer Master Stored Procedures task copies stored procedures in the master database between SQL Server instances. Outline, Chapter 7 Microsoft SQL Server 2000 System Administration |9| 5. DTS Tasks That Transform Data A. B. |10| 6. DTS Tasks That Function as Jobs A. B. C. D. E. F. |11| 7. B. C. D. E. F. G. H. I. 8. The ActiveX Script task runs any ActiveX script. The Dynamic Properties task retrieves data from an outside source and assigns values retrieved to selected DTS package values. The Execute Package task runs other DTS packages from within a DTS package. The Execute Process task runs an executable program from within a DTS package. The File Transfer Protocol task downloads data from an outside source using File Transfer Protocol (FTP). The Send Mail task sends e-mail from within a DTS package, requiring a MAPI client on the SQL Server computer. DTS Transformations A. |12| The Transform Data task copies data from a data source, transforms it, and inserts the transformed data into a data destination. The Data Driven Query task selects, customizes, and executes a Transact-SQL operation on a row based on an analysis of the row. The Copy Column transformation task copies one or more columns without transformation. The ActiveX Script transformation task uses an ActiveX script to perform transformations. The Date Time String transformation task converts the format of a date or time value. The Lowercase String transformation task converts string data to lowercase and can change data type. The Uppercase String transformation task converts string data to uppercase and can change data type. The Middle of String transformation task extracts a substring of string data and can perform case conversion. The Trim String transformation task removes white spaces and can perform case conversion. The Read File transformation task copies contents of a text file to a destination column. The Write File transformation task copies contents of a source column to a text file. DTS Package Workflow A. B. Tasks without precedence constraints execute in parallel within a package unless otherwise specified. This is the default precedence constraint. A task with an Unconditional precedence constraint executes upon the completion of a prior task, regardless of the success or failure of the prior task. Outline, Chapter 7 Microsoft SQL Server 2000 System Administration 3 C. D. |13| 9. DTS Package Storage A. B. C. D. |14| A task with an On Success precedence constraint executes upon the success of the prior task. A task with an On Failure precedence constraint executes upon the failure of the prior task. When you store a DTS package to Microsoft SQL Server 2000, it is stored in the msdb database on any SQL Server instance. When you store a DTS package to Meta Data Services, it is stored in the repository database in Meta Data Services on the local computer and allows tracking of data lineage. When you store a DTS package to a Microsoft Visual Basic file, it is stored in Visual Basic code for editing using Visual Basic or Microsoft Visual C++. When you store a DTS package to a structured storage file, it is stored in an operating system file. 10. DTS Tools A. B. C. D. The DTS Import/Export wizard creates simple DTS packages with minimal transformations and complexity. DTS Designer allows the creation of complex packages with complex transformations and workflows. Visual Basic/Visual C++ allows programmers to create packages with extremely fine control over complex package operations. The DTS Run utility and Dtsrun command are tools for executing DTS packages from a command prompt. Chapter 7, Lesson 3 Transferring and Transforming Data with DTS Graphical Tools |15| 1. DTS Import/Export Wizard A. B. C. D. E. F. G. |16| 2. DTS Designer A. 4 Available from the Microsoft SQL Server program group and from within SQL Server Enterprise Manager Specify a data source and destination. Define data to be imported, using a Transact-SQL query to limit data if necessary. Select or create the destination tables if necessary. Change default column mappings if necessary. Choose objects to transfer if applicable. Save and/or schedule the DTS package. Available from the Data Transformation Services container in SQL Server Enterprise Manager Outline, Chapter 7 Microsoft SQL Server 2000 System Administration B. C. D. E. |17| 3. Can open existing DTS package from the file system, the Local Packages container, or the Data Services container—or create a new package Select and define data connections. Select and define tasks, including transformations and jobs. Create workflow constraints if necessary. Additional DTS Package Functionality A. B. C. D. Transaction support—using Microsoft Distributed Transaction Coordinator (DTC) to ensure that all tasks within a package are part of a transaction Message Queue task—can be used to send messages between DTS packages on different computers, even when a computer is offline Send Mail task—used to notify an operator about the progress of a package, including attaching a dynamically updated file Programming templates—sample code on SQL Server 2000 compact disk for specific solutions, such as lookup queries Chapter 7, Lesson 4 Working with DTS Packages |18| 1. Storage in the msdb Database A. B. C. |19| 2. Storage Using Meta Data Services A. B. C. D. E. F. |20| 3. DTS packages can be saved in the local msdb database or in the msdb database in a central storage location. Each DTS package version is saved, allowing earlier versions to be accessible to preserve development history. DTS packages saved to the msdb database can be secured using an owner password and a user password, allowing users to run a package without being able to edit the package. DTS packages can be saved to SQL Server Meta Data Services on any SQL Server instance. Allows tracking of package version, meta data, and data lineage Saved meta data includes data transformations and data sources. Data lineage allows the tracking of the source of each row of data in a data destination. You must enable the tracking of data lineage and create a column in the data destination for the tracking information. You can browse Meta Data Services to determine all DTS packages that might have used a particular data source. Storage Using a Structured Storage File A. B. DTS packages can be saved to an operating system file. DTS packages saved to a structured storage file can be secured using an owner password and a user password, allowing users to run a package without being able to edit the package. Outline, Chapter 7 Microsoft SQL Server 2000 System Administration 5 C. D. |21| 4. DTS Package Execution Utilities A. B. C. D. |22| 5. Allows the DTS package to be moved, copied, or mailed across the network Open files in SQL Server Enterprise Manager, or use command-prompt utilities to execute them. DTS Run utility (Dtsrunui.exe) 1. Interactive command-prompt utility used to execute a DTS package 2. Allows scheduling of execution, enabling of event log, and specifying of global variables for the execution 3. Allows the generation of text for the Dtsrun command, with optional encryption Dtsrun command 1. Noninteractive command-prompt utility used to execute a DTS package 2. Requires command switches, which can be generated by the DTS Run utility Packages run in the security context of the logged in user. Scheduled packages run in the security context of the SQL Server Agent. Additional Options When Working with DTS Packages A. B. C. Use DTS package logs to record information regarding the success or failure of a DTS package. 1. Start and stop times, including execution time 2. Steps that did not run Use DTS exception logs to record error information about uncopied rows. Performing disconnected edits is possible if one or more of the data sources or data destinations is unavailable—normally a connection is required to verify that settings are valid. Chapter 7, Lesson 5 Using the Bulk Copy Program (Bcp) and the BULK INSERT Transact-SQL Statement |23| 1. Bulk Copying of Data Using Text Files A. B. C. D. E. 6 Use either the Bcp command-prompt utility or the BULK INSERT Transact-SQL statement. Used for importing large quantities of data at high speed with minimal transformation Format files can be used to specify the format of the data being imported. DTS can be used with the BULK INSERT task to handle formatting issues. Format files from earlier versions of SQL Server can be used. Outline, Chapter 7 Microsoft SQL Server 2000 System Administration |24| 2. Optimization of Bulk Copy Operations A. B. C. D. E. |25| Bulk transfers of large amounts of data can generate substantial transaction log activity—consider changing to Bulk-Logged Recovery during importing. Use the TABLOCK hint and a large batch size when importing a large amount of data from a single client into an empty table. If a table has nonclustered indexes, it is generally faster to drop them before importing large amounts of data (and then re-create them). If a table has a clustered index, it is generally faster to order the data in the text file, if possible, and then specify the ORDER hint. The greater the percentage of new data, the greater the benefit from dropping and then re-creating indexes. Chapter Summary A. B. C. D. E. F. Analyze your data before importing it to determine the type of transformations, if any, that is required. Determine the complexity of the DTS package you require. Use the DTS Import/Export wizard for simple transformations and copying database objects. Use DTS Designer to create DTS packages with complex transformations and workflow constraints. Choose a DTS package storage format based on usage of DTS packages. Use the BULK INSERT task in DTS to optimize and ease the importation of large amounts of text data, rather than creating format files manually. Outline, Chapter 7 Microsoft SQL Server 2000 System Administration 7