data flow tasks

advertisement
Chapter 3 - Data Flow
CHAPTER 3 - DATA FLOW
3.1
Sources, Transforms and Destinations
Data flows through an SSIS data flow task just
like marbles through a marble run:
The Source specifies where marbles (or data
records) come from.
Transforms manipulate marbles (or data records):
adding extra columns, perhaps, or (as here)
sending it down different paths.
For any data flow task, you can have one or more
Destinations, specifying where marbles (or data
records) will end up.
Our Example
We’ll write a package to export films to a text file, then export this into another table:
The source for the data flow will
be this SQL Server table.
© Copyright 2016
The destination will
be a flat file.
Page 19
Chapter 3 - Data Flow
3.2
Creating a Project Connection
It’s likely in our example that we’ll reuse the same connection (ie the same link to a SQL Server
or other database) many times, so it makes sense to define it at project level:
a) Right-click on Connection Managers and choose
to add a new one.
b) You can link to almost every common type of
data. If you’re using SQL Server, it makes
sense to use an OLEDB connection.
c) Click at the bottom right of the
dialog box which appears to create a
new connection:
d) Type the name of your server (here
we’re using a named instance called
sql2008r2 on the current computer).
e) Choose the database you want to
link to from the drop down list.
f)
Click OK to create a connection.
g) Give the connection a better name
(we’ll call this one Movies):
© Copyright 2016
Page 20
Chapter 3 - Data Flow
3.3
Data Flow Tasks
To accomplish our task, we need a single data flow task (this can be as complex as you like –
many applications in SSIS will consist of one data flow task only).
Creating Data Flow Tasks
You can create a data flow task in the same way as for any other task:
a) Either drag this task onto the Control
Flow window, or just click on it and
press
.
b) It makes sense to rename this data
flow task to give it a better description.
Another way to create a data flow task – click on the Data Flow tab,
then click on this link which appears:
Wise Owl’s Hint
Switching to the Data Flow Tab
There is a separate tab for editing data flow tasks:
Either double-click on the data flow
task icon to edit it …
… or switch to the Data Flow tab,
click on the dropdown arrow and
choose which data flow task you want
to edit.
© Copyright 2016
Page 21
Chapter 3 - Data Flow
3.4
Creating the Source
Default Connections and Connection Scope
As soon as you go to the new data flow task, you’ll
see that it already has one connection listed:
This is the connection manager we created for the entire
project – it will be available to all of the packages in the
project, and so is listed here. Connections can have project
scope or package scope (in the latter case, they are only
visible to and available for use in a single package).
Creating Sources
You can create sources either using the assistant (basically a wizard) or the “hard way”:
This assistant will help you
add sources to a data flow
task (see hint below).
Wise Owl’s Hint
© Copyright 2016
Possible sources for data flows are:
Source
Notes
ADO.NET
An alternative to the faster OLEDB source, required by
some data providers.
CDC
Links to a Change Data Capture data source, retrieving only
rows which have changed.
Excel
Any table contained in an Excel worksheet. If you’re
running a 64-bit copy of Excel, you will need to install an
additional driver.
Flat File
Importing CSV or other text files (covered in a separate
chapter).
ODBC
Uses any ODBC provider (useful when an application
doesn’t support OLEDB).
OLE DB
Any OLEDB compliant data source, such as Access, SQL
Server, Oracle or DB2.
Raw File
A fast but inflexible file format usually used to hold
intermediate data stages at checkpoints.
XML
Connect to an XML file, possibly on a remote website.
The advantage of using the source assistants is that SSIS will then only
lists out sources for which you have drivers installed on your computer
(although this manual will show creating sources the “long way”).
Page 22
Chapter 3 - Data Flow
Creating our OLEDB Source
Whichever method you choose for adding a data source will lead you to the same place, so we’ll
use the OLEDB Source tool:
a) Double-click on this tool to add it to
your Data Flow window.
b) Optionally, rename the data source to
something easier to recognise:
Double-click on your new data source and set the following details:
What
Notes
OLDEB connection manager
If you’ve created one for your project (as we’ve done), you can just use this;
otherwise, click on the New… button to create one specific to this package.
Table or view
The name of a table or view in your database (although it’s better practice to
use an SQL command, as overleaf, to pick out only those columns we want
to work with).
© Copyright 2016
Page 23
Chapter 3 - Data Flow
Specifying a SQL Command
You should usually build a SQL query to pull down only those columns you need to work with:
a) Choose to work
with an SQL
command.
b) Click here to
build your query
(or just paste it
in to the SQL
command text
window).
c) Generate the SQL command that you want to
use as shown on the right.
Use this tool to run your query to see the results.
Use this tool to add a table to your query. This
query builder is almost identical to the ones used in
Access, Management Studio, Reporting Services
and other Microsoft applications.
Choosing Output Columns
Having chosen the connection to use, it’s time to choose the output columns:
a) Click to specify which
columns the data
source should output.
b) For each column, you
can effectively rename
it here.
© Copyright 2016
Page 24
Chapter 3 - Data Flow
3.5
Creating the Flat File Connection Manager
To create a connection manager for the destination for our data flow, we first need to create a
connection to a flat file. Here’s how to do this!
Step 1 – Ensure you have a Flat File Template
For the purposes of this chapter’s example, you first need to create a new file in Notepad:
We’ll export two bits of information: the name of each film, and how many
Oscars it won. We’re using the vertical bar as a delimiter.
Step 2 – Starting to Create a Connection to the Flat File
You can create a connection while you create a
destination, but it might be easier to do the two
things separately, as here.
a) Right-click in the Connection Managers
section of the data flow task, and choose to
create a new connection specific to this task.
Wise Owl’s Hint
© Copyright 2016
If you had several packages using the same flat file, you might choose
instead to define a connection manager at project level.
Page 25
Chapter 3 - Data Flow
Step 3 – Configuring the New Connection Manager (General)
You can now point this connection manager at your flat file:
a) Give the flat file a sensible name (it can be an idea to include
a reference to the type of connection in this name, as here).
c) Usually the rows will be separated by
carriage returns, and the first row will
contain column names.
© Copyright 2016
b) The description is
optional!
d) Click on this button to find and
select the flat file you want to
connect to.
Page 26
Chapter 3 - Data Flow
Step 4 – Configuring the Connection Manager’s Columns
You should now say how your flat file’s columns are separated:
a) Choose to show
Columns.
b) Choose the
column separator
(ours is a vertical
bar character).
c) Click on this
button (disabled
here) to bring the
preview shown up
to date.
Step 5 – Choose Advanced Settings for the Connection Manager
To get our particular data transformation to work, you need to make one more change:
a) Show advanced
settings for this
connection manager.
b) Change the column
width to a higher
number, to
accommodate longer
film names.
© Copyright 2016
Page 27
Chapter 3 - Data Flow
3.6
Creating and Configuring the Destination
Again, you could use the Destination Assistant to add a destination, but we’ll do things the
(slightly) harder way.
Step 1 - Creating the Destination
To add a flat file destination to the data flow:
c) SSIS adds the destination – you
can now rename this if you like:
a) Choose to add a flat file
destination.
b) Click where you
want it to go.
Step 2 - Connecting the Source to the Destination
It’s a good idea to do this next, so that you can map columns in the destination:
a) Click on the Movies database source,
then click on the blue arrow emanating
from it (this represents successful data;
the red arrow is the flow of failed data).
b) Drag this on to the destination – when
you release the mouse, the source and
destination will be joined.
© Copyright 2016
Page 28
Chapter 3 - Data Flow
Step 3 – Assigning a Connection Manager to the Destination
It’s time now to tell SSIS which connection manager the destination will use:
a) Double-click on this icon to edit the destination (the
error message and red circle both show that there’s a
problem with the destination, which we’re about to
remedy!).
b) SSIS automatically
takes you to this tab.
c) SSIS guesses that you want to assign the flat file connection manager to
this flat file destination task – a not unreasonable guess!
Step 4 – Mapping Columns
Finally for the destination, choose which columns from the source map onto which columns in
the destination:
a) Choose this tab.
b) Click on this drop
arrow and choose
to map the
FilmOscarWins
column from the
source on to the
Oscars column in
the destination.
© Copyright 2016
Page 29
Chapter 3 - Data Flow
3.7
Executing the Data Flow Package
You can now test to see if your package works:
a) Right-click in the Data Flow window to execute the
task, or right-click on the package in Solution
Explorer to execute the entire package:
b) Things are going well – every part
of the data flow has a tick next to it!
c) When you look at the text file, you
should see the films listed.
Running in 32 bit Mode
If you have the 64 bit SSIS runtime installed you may encounter an issue when executing the
package.
If you see this symbol next to the Excel data source then
a likely cause is that the package is attempting to execute
in 64 bit mode.
To solve this issue you can tell the project to not use the 64 bit runtime.
a) Right-click on the project in the Solution Explorer and click
the Properties option.
b) Expand the Configuration Properties node and select the
Debugging option, then change this property to False.
© Copyright 2016
Page 30
Download