Chapter 3 - Data Flow CHAPTER 3 - DATA FLOW 3.1 Sources, Transforms and Destinations Data flows through an SSIS data flow task just like marbles through a marble run: The Source specifies where marbles (or data records) come from. Transforms manipulate marbles (or data records): adding extra columns, perhaps, or (as here) sending it down different paths. For any data flow task, you can have one or more Destinations, specifying where marbles (or data records) will end up. Our Example We’ll write a package to export films to a text file, then export this into another table: The source for the data flow will be this SQL Server table. © Copyright 2016 The destination will be a flat file. Page 19 Chapter 3 - Data Flow 3.2 Creating a Project Connection It’s likely in our example that we’ll reuse the same connection (ie the same link to a SQL Server or other database) many times, so it makes sense to define it at project level: a) Right-click on Connection Managers and choose to add a new one. b) You can link to almost every common type of data. If you’re using SQL Server, it makes sense to use an OLEDB connection. c) Click at the bottom right of the dialog box which appears to create a new connection: d) Type the name of your server (here we’re using a named instance called sql2008r2 on the current computer). e) Choose the database you want to link to from the drop down list. f) Click OK to create a connection. g) Give the connection a better name (we’ll call this one Movies): © Copyright 2016 Page 20 Chapter 3 - Data Flow 3.3 Data Flow Tasks To accomplish our task, we need a single data flow task (this can be as complex as you like – many applications in SSIS will consist of one data flow task only). Creating Data Flow Tasks You can create a data flow task in the same way as for any other task: a) Either drag this task onto the Control Flow window, or just click on it and press . b) It makes sense to rename this data flow task to give it a better description. Another way to create a data flow task – click on the Data Flow tab, then click on this link which appears: Wise Owl’s Hint Switching to the Data Flow Tab There is a separate tab for editing data flow tasks: Either double-click on the data flow task icon to edit it … … or switch to the Data Flow tab, click on the dropdown arrow and choose which data flow task you want to edit. © Copyright 2016 Page 21 Chapter 3 - Data Flow 3.4 Creating the Source Default Connections and Connection Scope As soon as you go to the new data flow task, you’ll see that it already has one connection listed: This is the connection manager we created for the entire project – it will be available to all of the packages in the project, and so is listed here. Connections can have project scope or package scope (in the latter case, they are only visible to and available for use in a single package). Creating Sources You can create sources either using the assistant (basically a wizard) or the “hard way”: This assistant will help you add sources to a data flow task (see hint below). Wise Owl’s Hint © Copyright 2016 Possible sources for data flows are: Source Notes ADO.NET An alternative to the faster OLEDB source, required by some data providers. CDC Links to a Change Data Capture data source, retrieving only rows which have changed. Excel Any table contained in an Excel worksheet. If you’re running a 64-bit copy of Excel, you will need to install an additional driver. Flat File Importing CSV or other text files (covered in a separate chapter). ODBC Uses any ODBC provider (useful when an application doesn’t support OLEDB). OLE DB Any OLEDB compliant data source, such as Access, SQL Server, Oracle or DB2. Raw File A fast but inflexible file format usually used to hold intermediate data stages at checkpoints. XML Connect to an XML file, possibly on a remote website. The advantage of using the source assistants is that SSIS will then only lists out sources for which you have drivers installed on your computer (although this manual will show creating sources the “long way”). Page 22 Chapter 3 - Data Flow Creating our OLEDB Source Whichever method you choose for adding a data source will lead you to the same place, so we’ll use the OLEDB Source tool: a) Double-click on this tool to add it to your Data Flow window. b) Optionally, rename the data source to something easier to recognise: Double-click on your new data source and set the following details: What Notes OLDEB connection manager If you’ve created one for your project (as we’ve done), you can just use this; otherwise, click on the New… button to create one specific to this package. Table or view The name of a table or view in your database (although it’s better practice to use an SQL command, as overleaf, to pick out only those columns we want to work with). © Copyright 2016 Page 23 Chapter 3 - Data Flow Specifying a SQL Command You should usually build a SQL query to pull down only those columns you need to work with: a) Choose to work with an SQL command. b) Click here to build your query (or just paste it in to the SQL command text window). c) Generate the SQL command that you want to use as shown on the right. Use this tool to run your query to see the results. Use this tool to add a table to your query. This query builder is almost identical to the ones used in Access, Management Studio, Reporting Services and other Microsoft applications. Choosing Output Columns Having chosen the connection to use, it’s time to choose the output columns: a) Click to specify which columns the data source should output. b) For each column, you can effectively rename it here. © Copyright 2016 Page 24 Chapter 3 - Data Flow 3.5 Creating the Flat File Connection Manager To create a connection manager for the destination for our data flow, we first need to create a connection to a flat file. Here’s how to do this! Step 1 – Ensure you have a Flat File Template For the purposes of this chapter’s example, you first need to create a new file in Notepad: We’ll export two bits of information: the name of each film, and how many Oscars it won. We’re using the vertical bar as a delimiter. Step 2 – Starting to Create a Connection to the Flat File You can create a connection while you create a destination, but it might be easier to do the two things separately, as here. a) Right-click in the Connection Managers section of the data flow task, and choose to create a new connection specific to this task. Wise Owl’s Hint © Copyright 2016 If you had several packages using the same flat file, you might choose instead to define a connection manager at project level. Page 25 Chapter 3 - Data Flow Step 3 – Configuring the New Connection Manager (General) You can now point this connection manager at your flat file: a) Give the flat file a sensible name (it can be an idea to include a reference to the type of connection in this name, as here). c) Usually the rows will be separated by carriage returns, and the first row will contain column names. © Copyright 2016 b) The description is optional! d) Click on this button to find and select the flat file you want to connect to. Page 26 Chapter 3 - Data Flow Step 4 – Configuring the Connection Manager’s Columns You should now say how your flat file’s columns are separated: a) Choose to show Columns. b) Choose the column separator (ours is a vertical bar character). c) Click on this button (disabled here) to bring the preview shown up to date. Step 5 – Choose Advanced Settings for the Connection Manager To get our particular data transformation to work, you need to make one more change: a) Show advanced settings for this connection manager. b) Change the column width to a higher number, to accommodate longer film names. © Copyright 2016 Page 27 Chapter 3 - Data Flow 3.6 Creating and Configuring the Destination Again, you could use the Destination Assistant to add a destination, but we’ll do things the (slightly) harder way. Step 1 - Creating the Destination To add a flat file destination to the data flow: c) SSIS adds the destination – you can now rename this if you like: a) Choose to add a flat file destination. b) Click where you want it to go. Step 2 - Connecting the Source to the Destination It’s a good idea to do this next, so that you can map columns in the destination: a) Click on the Movies database source, then click on the blue arrow emanating from it (this represents successful data; the red arrow is the flow of failed data). b) Drag this on to the destination – when you release the mouse, the source and destination will be joined. © Copyright 2016 Page 28 Chapter 3 - Data Flow Step 3 – Assigning a Connection Manager to the Destination It’s time now to tell SSIS which connection manager the destination will use: a) Double-click on this icon to edit the destination (the error message and red circle both show that there’s a problem with the destination, which we’re about to remedy!). b) SSIS automatically takes you to this tab. c) SSIS guesses that you want to assign the flat file connection manager to this flat file destination task – a not unreasonable guess! Step 4 – Mapping Columns Finally for the destination, choose which columns from the source map onto which columns in the destination: a) Choose this tab. b) Click on this drop arrow and choose to map the FilmOscarWins column from the source on to the Oscars column in the destination. © Copyright 2016 Page 29 Chapter 3 - Data Flow 3.7 Executing the Data Flow Package You can now test to see if your package works: a) Right-click in the Data Flow window to execute the task, or right-click on the package in Solution Explorer to execute the entire package: b) Things are going well – every part of the data flow has a tick next to it! c) When you look at the text file, you should see the films listed. Running in 32 bit Mode If you have the 64 bit SSIS runtime installed you may encounter an issue when executing the package. If you see this symbol next to the Excel data source then a likely cause is that the package is attempting to execute in 64 bit mode. To solve this issue you can tell the project to not use the 64 bit runtime. a) Right-click on the project in the Solution Explorer and click the Properties option. b) Expand the Configuration Properties node and select the Debugging option, then change this property to False. © Copyright 2016 Page 30