Thursday, April 14, 2016

SSIS – Running single SSIS packages in Parallel to get faster performance for multiple data loads

Brief description of the project

I had to move the data from a central database to different databases (all databases with the same schema)  and I wanted to achieve this trough parallel processing instead of sequential through  a FOR loop

I created a parent package and added a sequence container in it .
I added a for loop inside the sequence container and added a call ( any number that I want to ruin in parallel ) to the child packages.

Created a parent package .


I created a package level variable varLoopQuitParent = 0 and this will be used to exit the for loop after getting updated to 1 in the EXIT FOR  script in the child package later.




 
 
The Execute SQL Task: Update Processing List comes is used to reset the different databases active processing status once the data is moved.
 
Now I created a child package
 
 
( The child package is called the following way from the parent package using the Execute package Task)


 
Child package is the  real place where everything happens.
I have a table where I have a list of clients ( databases) and the connection string that I read and put into a variable ( child package level) and use that to create the dynamic connection string to the multiple databases.
 
The Execute SQL task : Get Connection string into variables on the child package calls a stored procedure (proc_ResultProcessing_List) that returns the client code and the connection string to the databases and assigns to the variable varChildConnString and varClientCode.
 
Here is the code in the sp

 
 
 

-Section 1

Declare @ConnString varchar(100)

DEclare @clientCode int



 
 
--Section 2
 
 

BEGIN TRAN TRAN1




--Section 3
 
 



 
 
SET Rowcount 1




--Section 4
 
 

 
select @ConnString = Connstring, @ClientCodeCode = clientCode from clientsProcessList where ds_id = 2 and Dataprocessing = 2



 
 
-- 1 is active processing

--2 is ready

--print @ConnString + space(5) + cast(@hospCode as varchar)
 
 




--section 5
 
 

UPDATE clientsProcessList

WITH (TABLOCK)

SET Dataprocessing = 1

WHERE clientCode = @clientCode

and ds_id = 2 and Dataprocessing = 2



 

 
 
--section 6
 
 



 
 
SET Rowcount 0



 
 
--section 7
 
 



 
 
SELECT isnull(@ConnString,'empty') as ConnectionString, isnull(@HospCode,0) as ClientCode




--section 9
 
 

COMMIT TRAN TRAN1



 

 
 
END TRY



 
 
BEGIN CATCH



 

 
 
print 'error here'



 

 
 
SELECT

ERROR_NUMBER() AS ErrorNumber

,ERROR_SEVERITY() AS ErrorSeverity

,ERROR_STATE() AS ErrorState

,ERROR_PROCEDURE() AS ErrorProcedure

,ERROR_LINE() AS ErrorLine

,ERROR_MESSAGE() AS ErrorMessage

,ERROR_PROCEDURE() AS ErrorProcedure

,ERROR_LINE() AS ErrorLine;

print 'rollback'



 

 
 
ROLLBACK TRANSACTION TRAN1 ;




 
END CATCH




 

 
Section 1
The first thing that we do is declare our @ConnString and  @clientCode
 Section 2
Next we start a Begin Tran
1.       NOTE: The reason for this is so that we can actually lock the table whilst this transaction is happening.
2.       NOTE 2: This is so that when the SSIS Packages are running in parallel, we know for a fact that each package will only get a valid ConnString   and clientCode when trying to get another value. If it tries whilst another SSIS Package is busy it will wait until the transaction has been          completed.
Section 3
We then set the RowCount to 1, this is so that we will only get one row back.
1.       NOTE: we are doing this because we only ever want to get back one ConnString  and clientcode

Section 4
Next is our TSQL Select statement
1.       Here we are getting any ConnString, clientcode from our table, where the DataProcessing is set to 2.( ready to process)
2.       We are then putting this into our @ConnString variabl and @ClientCode
Section 5
Next we are updating our table using the TABLOCK hint so that we can once again lock the entire table, to ensure that for our specific clientcode row it gets updated from a 2 to 1.
1.       NOTE: This then means that this data is and has been processed when the next package comes along looking for data to process.
 Section 6
We then set the RowCount back to 0 (zero)

 Section 7
 Then we run a TSQL Select statement so that we can then pass our  @ConnString and  @clientCode
( is null value to pass if reached EOR which is later used in the EXIT  LOOP precedence)
1.       NOTE: This is because we are using the varChildConnString and varClientCode passing this into a variable in our Execute SQL Task.
2.       NOTE 2: We also set the isNull to 0 (zero) so that when there are no more rows to process it defaults to zero, which is explained laterin the Exit Loop.
Section 8
Finally we are then committing our transaction.
Within our Execute SQL Task we also map the result set to our variable called:
 

 
 
 
Now the last thing to configure in this SSIS package is the Precedence Constraint which can be seen in the picture below

 We execute the  Execute SQL Task: Execute SPs while there is a  value in the variable ( while the sp is returning a row)
 
 We exit once the varHospitalCode is 0

 
The last step is to set the parent variable varLoopQuitParent = 1 in the EXIT loop by binding the parent variable in the child package .
 
Click on this LINK on how to bind (Passing Values To and From Child Packages)
 
The script for the EXIT LOOP
 
public void Main()
 
 

{

 

 
           
Dts.Variables["varLoopQuitParent"].Value = 1;
Dts.TaskResult = (int)ScriptResults.Success;
 
 

}


Make the varLoopQuitParent as ReadWriteVaroable.

 
 
 
 
 
 


No comments:

Post a Comment