Thursday, July 23, 2015

Pivoting in SSIS with multiple measures and multiple row fields

Pivoted data is very useful and easy to understand the data and analyse it.  Excel Pivot functionality is very popular.

SQL Server and SSIS supports the pivot functionality. Both has its own limitations.
  • SQL Server supports only one measure(aggregate or Pivot value) allowed per query to pivot, however, SSIS supports multiple measures(pivot value).
  • Aggregation can happen within  SQL Server query but in SSIS we have to apply aggregation to input data before pivoting.
Now, we will see how to use SSIS Pivot transformation to pivot the little complex input data.

First we will see how to configure  Pivot transformation for simple pivoting.
For this exercise, I will use AdventureWorksDW2014 database and Visual Studio 2010 (SSDT). you can download this database from here .

Pivot table structure has mainly 3 sections, SSIS terms are mentioned in braces. Section positions are shown in diagram (1).
  • Row Field Area (Set Key)
  • Column Field Area ( Pivot Key)
  • Data Field Area (Pivot Value)
IMAGE (1)

It is important to decide what we would like to display as columns.

Now we will target to get the following output to flat File.
IMAGE(2)

We need Years in the column field. Product and product group in row field. But from database we will get the result as follows.
IMAGE(3)


If we compare the database result and expected target we can easily notice that year is converted to Column field. We can't achieve this output directly from SQL query because we need multiple measures(TOTALPRODUCTCOST, SALESAMOUNT) in  pivoted output. To get this output from SQL query we may need to use temp tables or complex query union query and this may be expensive if data is large.

We will use SSIS Pivot transformation to achieve the target result.

SSIS assigned value for each section called PivotUsage. Understanding this is important to configure the Pivot transformation.

Section Name
PivotUsage(Just for more information)
Description
Pass columns
0
The column is passed through unchanged to the output. Because many input rows can result only in one output row, the transformation copies only the first input value for the column
Set Key
1
The column acts as the key or part of the key that identifies a set of records
Pivot Key
2
The column defines the pivot. The values in this column are associated with columns in the pivoted dataset
Pivot Value
3
The column contains values that are placed in the columns that the pivot creates

courtesy pasfu

We will create Data flow task and OLE DB Source to get the fetch the data from database or we can use any other sources like flat file or Excel to get the data shown in IMAGE (3). Don't worry about the query, this is to just get desired output.
SELECT ENGLISHPRODUCTSUBCATEGORYNAME AS PRODUCT
 ,ENGLISHPRODUCTCATEGORYNAME AS PRODDUCT_GROUP
 ,sum(TOTALPRODUCTCOST) AS TOTALPRODUCTCOST
 ,sum(SALESAMOUNT) SALESAMOUNT
 ,YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY))) YR
FROM [DBO].[FACTINTERNETSALES] A
JOIN [DIMSALESTERRITORY] B ON A.SALESTERRITORYKEY = B.SALESTERRITORYKEY
JOIN [DBO].[DIMPRODUCT] PROD ON PROD.PRODUCTKEY = A.PRODUCTKEY
JOIN [DBO].[DIMPRODUCTSUBCATEGORY] SUB ON SUB.PRODUCTCATEGORYKEY = PROD.PRODUCTSUBCATEGORYKEY
JOIN [DBO].[DIMPRODUCTCATEGORY] CAT ON CAT.PRODUCTCATEGORYKEY = SUB.PRODUCTCATEGORYKEY
WHERE YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY))) IN (
  '2011'
  ,'2012'
  )
GROUP BY SUB.ENGLISHPRODUCTSUBCATEGORYNAME
 ,CAT.ENGLISHPRODUCTCATEGORYNAME
 ,YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY)))

Now drop Pivot transformation to data flow space and connect the output of OLE DB source.


Pivot Key : Select the field which you like to see as column, here We will select YR field.
Set Key : Select the field you like to see in row section, here we will select PRODUCT field.
Pivot Value : Select the column which should be part of calculated or measure (numeric) field, Here we will select TOTALPRODUCTCOST.

Final and tricky
Pivoted column : In this section, we have to list all distinct values of Pivot Key (YR) with comma separated and press 'Generate columns now' button , we may skip any value which we are not interested, ex: if we have 2013 value in Pivot Key field and we don't want it we can just ignore that. each value we listed in this section is converted to columns in output. 

After this change PIVOT edit should look like this.
IMAGE(4)

Connect to Flat file or any other destination to get the pivoted result. I connected to Flat file destination and got this result.

PRODUCT
C_2011_TOTALPRODUCTCOST
C_2012_TOTALPRODUCTCOST
Bottom Brackets
3430430.019
2205065.308
Brakes
3430430.019
2205065.308
Chains
3430430.019
2205065.308
Cranksets
3430430.019
2205065.308
Derailleurs
3430430.019
2205065.308
Forks
3430430.019
2205065.308
Handlebars
3430430.019
2205065.308
Headsets
3430430.019
2205065.308
Mountain Bikes
742829.796
1187543.213
Mountain Frames
3430430.019
2205065.308
Pedals
3430430.019
2205065.308
Road Bikes
742829.796
1187543.213
Road Frames
3430430.019
2205065.308
Saddles
3430430.019
2205065.308
Touring Bikes
742829.796
1187543.213
Touring Frames
3430430.019
2205065.308
Wheels
3430430.019
2205065.308

Cool! we got pivoted data but not our target result (IMAGE(2)) , according to that we need PRODUCT GROUP and SALESAMOUNT fields data. 
If you revisit to Pivot edit window, you don't have option to select these fields,  Additional configuration needs to be done in Pivot Advanced window.


  • Right click on Pivot transformation and select Show Advanced Editor window. 
  • Select PRODUCT GROUP and SALESAMOUNT in Input Columns tab.
  • Next  in Input and Output properties tab expand 'Pivot Default Input' and select  SALESAMOUNT .
  • In the right side change the Pivot Usage property to 3 (change this field as Pivot Value) and note down the lineageID of the SALESAMOUNT.  shown in IMAGE(5)
  • Create C_2011_SALESAMOUNT column in 'Pivot Default output' section  and change the
    PivotKeyValue : 2011 (value to be mapped )
    SourceColumn : 41 (LineageID of Imput column) IMAGE (6)
  • Similarly create C_2012_SALESAMOUNT
    PivotKeyValue : 2012 (value to be mapped )
    SourceColumn : 41 (LineageID of Imput column)
  • Create PRODDUCT_GROUP column in 'Pivot Default output' section and set SourceColumn to LineageID PRODDUCT_GROUP column in 'Pivot Default Input'
IMAGE (5)
IMAGE (6)

NOTE :
1. Pivot transformation does not do any aggregation, if we need we have to use aggregate the metrics and feed to Pivot transformation.
2. Field mentioned in the Set Key section assume to be unique for each record, If multiple fields contributes to uniqueness of the record, all those fields should be marked as PivotUsage =1

No comments:

Post a Comment