practitioner

Wednesday, March 2, 2016

Limitations of BCP command in SQL Server

Though BCP is very popular command option to fetch the data from SQL Server, It has its own limitations.

I trying to list few of them,

Input Query should be mentioned directly in the command line, we can not save the SQL query in external file and execute. This limitation can be serious if you have long and complex query to use.
However, we can write complex query in stored procedure and execute it from BCP.
We cannot use like operator with % wild card character in BCP SQL query. surprisingly we don't get any error for this.
However, we can overcome this by using %% in place of % character in the query.

Hope this help to decide whether to use BCP or not for your scenario.

Happy Coding :)

Tuesday, January 19, 2016

SQL Server T-SQL IN clause with parameter!!

It is common scenario that, like any other SQL query with parameter, we will expect IN clause also work same, but it not straight forward.
For Example :
if you would like fetch the details of only Subcategory "Chains" and "Forks", your query would look like this.

SELECT *
FROM ADVENTUREWORKSDW2014.[DBO].[DIMPRODUCTSUBCATEGORY]
WHERE ENGLISHPRODUCTSUBCATEGORYNAME IN (
  'CHAINS'
  ,'FORKS'
  )

Well, this works great!
What if this code is part of stored procedure and number of subcategory is unkown.
We can simply write stored procedure like,

CREATE PROCEDURE GET_SUBCAT_DETAILS @SUB_CAT VARCHAR(250) = 'CHAINS,FORKS'
AS
BEGIN
 SELECT *
 FROM ADVENTUREWORKSDW2014.[DBO].[DIMPRODUCTSUBCATEGORY]
 WHERE ENGLISHPRODUCTSUBCATEGORYNAME IN (@SUB_CAT)
END

EXEC GET_SUBCAT_DETAILS

Output : zero rows :(

Yes, we can not just use parameter directly like this, Internally it will consider 2 values as one,.

Work around is to use dynamic query like this,

CREATE PROCEDURE GET_SUBCAT_DETAILS @SUB_CAT VARCHAR(250) = 'CHAINS,FORKS'
AS
BEGIN
 DECLARE @SQL NVARCHAR(1000) = 'SELECT * FROM ADVENTUREWORKSDW2014.[DBO].[DIMPRODUCTSUBCATEGORY] WHERE ENGLISHPRODUCTSUBCATEGORYNAME IN (''' + REPLACE(@SUB_CAT, ',', '''' + ',' + '''') + ''')'

 EXEC SP_EXECUTESQL @SQL
END

Monday, November 23, 2015

Quick guide to setup a git repository in Windows.

1. Install git or portable git your machine.

2. Create a directory in local machine or in network server to use it as Repository. If you are using local machine for repository skip step 3 & 4.

Ex. D://CodeRepo

3. If you would like to use remote server (or any other machine other than your PC). Create shared directory in the remote server.

4. Create a Map directory in local machine pointing to Server.

Now we can access the server using drive T:\\
create directory CodeRepo in drive T.

5. Launch Git-Bash application

6. Traverse to T://CodeRepo using command

cd /T/CodeRepo/

7. To make this directory as git repository

git init -- bare

--bare represents we would like to use CodeRepo as just a repository not as a working directory. Working directory is one where we directly create and modify code files. Now CodeRepo acts as central repository.

8. Create working directory in local machine, I am considering C drive here.

mkdir /c/CodeWork.

9. Take a copy of your repository using Clone command. This will create copy of your CodeRepo into CodeWork directory.

git clone /T/CodeRepo/

10. Make the CodeWork directory as git directory

git init

11. Mark CodeRepo as your remote reposity using remote add command

git remote add origin /T/CodeRepo

12. Now our repository and Working directory is ready.

Happy coding :)

Tuesday, August 25, 2015

Loading the flat file data to SQL Server temporarily for reading and analyzing the data

Many situations we would like to read the flat file or simple text file but not so easy to analyze the data if we open the file in notepad or any other text editor.
As we know if data is in SQL server it is very easy to query and analyse the data.
Using temp table and Bulk insert functionality we can analyze the file data in SQL environment.

To use this we have to know following information about the flat file.

1. Number of columns

2. Column separator

And following access is required.

1. Access to Flat file location

2. Database access to run BULK INSERT command

Now, we will see sample file load. We will take one customer detail data file having following columns of data.

CustomerKey
FirstName
LastName
EmailAddress

First we will create a temporary table in the SQL Server

CREATE TABLE #TEMPCUSTOMER

(

CUSTOMERKEY INT,

FIRSTNAME VARCHAR(40),

LASTNAME VARCHAR(40),

EMAILADDRESS VARCHAR(250)

)

Run above statement to create the temp table.

Now we will create BULK INSERT statement to load the flat file data into temp table.

Mention the full path of the Customer file.

Column separator Character value should be passed to FIELDTERMINATOR parameter

And

New Line character value to ROWTERMINATOR parameter.

Starting row number should be passed to FIRSTROW parameter.

BULK INSERT #tempCustomer FROM 'C:\Customer.csv' WITH (

FIELDTERMINATOR = ',',

ROWTERMINATOR = '\n', FIRSTROW = 2 )

On Execution of above statement, file data will be loaded into Temp table.

Now it is easy to query the data and understand it better.

Happy coding.

Thursday, July 23, 2015

Pivoting in SSIS with multiple measures and multiple row fields

Pivoted data is very useful and easy to understand the data and analyse it. Excel Pivot functionality is very popular.

SQL Server and SSIS supports the pivot functionality. Both has its own limitations.

SQL Server supports only one measure(aggregate or Pivot value) allowed per query to pivot, however, SSIS supports multiple measures(pivot value).
Aggregation can happen within SQL Server query but in SSIS we have to apply aggregation to input data before pivoting.

Now, we will see how to use SSIS Pivot transformation to pivot the little complex input data.

First we will see how to configure Pivot transformation for simple pivoting.

For this exercise, I will use AdventureWorksDW2014 database and Visual Studio 2010 (SSDT). you can download this database from here .

Pivot table structure has mainly 3 sections, SSIS terms are mentioned in braces. Section positions are shown in diagram (1).

Row Field Area (Set Key)
Column Field Area ( Pivot Key)
Data Field Area (Pivot Value)

IMAGE (1)

It is important to decide what we would like to display as columns.

Now we will target to get the following output to flat File.
IMAGE(2)

We need Years in the column field. Product and product group in row field. But from database we will get the result as follows.
IMAGE(3)

If we compare the database result and expected target we can easily notice that year is converted to Column field. We can't achieve this output directly from SQL query because we need multiple measures(TOTALPRODUCTCOST, SALESAMOUNT) in pivoted output. To get this output from SQL query we may need to use temp tables or complex query union query and this may be expensive if data is large.

We will use SSIS Pivot transformation to achieve the target result.

SSIS assigned value for each section called PivotUsage. Understanding this is important to configure the Pivot transformation.

Section Name	PivotUsage(Just for more information)	Description
Pass columns	0	The column is passed through unchanged to the output. Because many input rows can result only in one output row, the transformation copies only the first input value for the column
Set Key	1	The column acts as the key or part of the key that identifies a set of records
Pivot Key	2	The column defines the pivot. The values in this column are associated with columns in the pivoted dataset
Pivot Value	3	The column contains values that are placed in the columns that the pivot creates

courtesy pasfu

We will create Data flow task and OLE DB Source to get the fetch the data from database or we can use any other sources like flat file or Excel to get the data shown in IMAGE (3). Don't worry about the query, this is to just get desired output.

SELECT ENGLISHPRODUCTSUBCATEGORYNAME AS PRODUCT
 ,ENGLISHPRODUCTCATEGORYNAME AS PRODDUCT_GROUP
 ,sum(TOTALPRODUCTCOST) AS TOTALPRODUCTCOST
 ,sum(SALESAMOUNT) SALESAMOUNT
 ,YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY))) YR
FROM [DBO].[FACTINTERNETSALES] A
JOIN [DIMSALESTERRITORY] B ON A.SALESTERRITORYKEY = B.SALESTERRITORYKEY
JOIN [DBO].[DIMPRODUCT] PROD ON PROD.PRODUCTKEY = A.PRODUCTKEY
JOIN [DBO].[DIMPRODUCTSUBCATEGORY] SUB ON SUB.PRODUCTCATEGORYKEY = PROD.PRODUCTSUBCATEGORYKEY
JOIN [DBO].[DIMPRODUCTCATEGORY] CAT ON CAT.PRODUCTCATEGORYKEY = SUB.PRODUCTCATEGORYKEY
WHERE YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY))) IN (
  '2011'
  ,'2012'
  )
GROUP BY SUB.ENGLISHPRODUCTSUBCATEGORYNAME
 ,CAT.ENGLISHPRODUCTCATEGORYNAME
 ,YEAR(CONVERT(DATE, CONVERT(VARCHAR(8), SHIPDATEKEY)))

Now drop Pivot transformation to data flow space and connect the output of OLE DB source.

Pivot Key : Select the field which you like to see as column, here We will select YR field.

Set Key : Select the field you like to see in row section, here we will select PRODUCT field.

Pivot Value : Select the column which should be part of calculated or measure (numeric) field, Here we will select TOTALPRODUCTCOST.

Final and tricky

Pivoted column : In this section, we have to list all distinct values of Pivot Key (YR) with comma separated and press 'Generate columns now' button , we may skip any value which we are not interested, ex: if we have 2013 value in Pivot Key field and we don't want it we can just ignore that. each value we listed in this section is converted to columns in output.

After this change PIVOT edit should look like this.
IMAGE(4)

Connect to Flat file or any other destination to get the pivoted result. I connected to Flat file destination and got this result.

PRODUCT	C_2011_TOTALPRODUCTCOST	C_2012_TOTALPRODUCTCOST
Bottom Brackets	3430430.019	2205065.308
Brakes	3430430.019	2205065.308
Chains	3430430.019	2205065.308
Cranksets	3430430.019	2205065.308
Derailleurs	3430430.019	2205065.308
Forks	3430430.019	2205065.308
Handlebars	3430430.019	2205065.308
Headsets	3430430.019	2205065.308
Mountain Bikes	742829.796	1187543.213
Mountain Frames	3430430.019	2205065.308
Pedals	3430430.019	2205065.308
Road Bikes	742829.796	1187543.213
Road Frames	3430430.019	2205065.308
Saddles	3430430.019	2205065.308
Touring Bikes	742829.796	1187543.213
Touring Frames	3430430.019	2205065.308
Wheels	3430430.019	2205065.308

Cool! we got pivoted data but not our target result (IMAGE(2)) , according to that we need PRODUCT GROUP and SALESAMOUNT fields data.

If you revisit to Pivot edit window, you don't have option to select these fields, Additional configuration needs to be done in Pivot Advanced window.

Right click on Pivot transformation and select Show Advanced Editor window.
Select PRODUCT GROUP and SALESAMOUNT in Input Columns tab.
Next in Input and Output properties tab expand 'Pivot Default Input' and select SALESAMOUNT .
In the right side change the Pivot Usage property to 3 (change this field as Pivot Value) and note down the lineageID of the SALESAMOUNT. shown in IMAGE(5)
Create C_2011_SALESAMOUNT column in 'Pivot Default output' section and change the
PivotKeyValue : 2011 (value to be mapped )
SourceColumn : 41 (LineageID of Imput column) IMAGE (6)
Similarly create C_2012_SALESAMOUNT
PivotKeyValue : 2012 (value to be mapped )
SourceColumn : 41 (LineageID of Imput column)
Create PRODDUCT_GROUP column in 'Pivot Default output' section and set SourceColumn to LineageID PRODDUCT_GROUP column in 'Pivot Default Input'

IMAGE (5)

IMAGE (6)

NOTE :
1. Pivot transformation does not do any aggregation, if we need we have to use aggregate the metrics and feed to Pivot transformation.
2. Field mentioned in the Set Key section assume to be unique for each record, If multiple fields contributes to uniqueness of the record, all those fields should be marked as PivotUsage =1

Tuesday, June 30, 2015

SSIS Bulk Load Task vs OLE DB Fast Load vs SQL Bulk Insert command

There is a different ways to load the data from Flat File to SQL Server. whenever we have more options, we would like to know which performs better. Following may help you take a decision to choose a way.

Among may options, I am taking following ways to do analysis.
1. SSIS Bulk Load Task
2. OLE DB Fast Load
3. SQL Bulk Insert command

I used following environment settings with each technique for this experiment.
1.Input Flat file size : 2.3 GB
2. Total number of rows : 35470898
3. Number of Columns : 12
3. Destination : SQL Server 2014 database
4. Flat file and database are in same machine.

Time taken by each data load is listed in following table.

Data Load Type	Time taken to load(minute)
SSIS Bulk Load Task	7.51
Fast Load in OLE DB	8.42
SQL Server Bulk Insert Command	10.41

This data clearly shows that there is no significant difference between OLE DB fast load and Bulk load.

Sunday, November 24, 2013

SSIS : Cannot Convert between Unicode and Non Unicode String Data Type

This error occur when source and destination table/file column type has different text type.

This can be easily fixed by using the Data Conversion transformation in data flow task

Cannot Convert between Unicode and Non Unicode String Data Type

Happy Coding :)