Tuesday, 12 January 2016

ASP.NET Tips #68 - Use the SQLBulkCopy class to load data into SQL Server from .NET

Using SQLBulkCopy can dramatically decrease the time it takes to load data into SQL Server. A test using SQL Server 2012 on a local machine loading a 100,000 row file had the following results:

  • Using a stored procedure: 37 seconds
  • Using concatenated inline SQL: 45 seconds
  • Using Entity Framework: 45 minutes
  • Using the SQLBulkCopy class: 4.5 seconds

Let's say you need to load a web server log into SQL Server. You would still need to load a file, read the file, parse the file, and load the data into objects. Then you would create a DataTable (you could also use the DataReader or an array of DataRow too):

DataTable table = new DataTable();
table.TableName = "LogBulkLoad";

table.Columns.Add("IpAddress", typeof(string));
table.Columns.Add("Identd", typeof(string));
table.Columns.Add("RemoteUser", typeof(string));
table.Columns.Add("LogDateTime", typeof(System.DateTimeOffset));
table.Columns.Add("Method", typeof(string));
table.Columns.Add("Resource", typeof(string));
table.Columns.Add("Protocol", typeof(string));
table.Columns.Add("QueryString", typeof(string));
table.Columns.Add("StatusCode", typeof(int));
table.Columns.Add("Size", typeof(long));
table.Columns.Add("Referer", typeof(string));
table.Columns.Add("UserAgent", typeof(string));

Next step would be to load the DataTable with data that you've parsed:

foreach (var log in logData)
   DataRow row = table.NewRow();
   row["IpAddress"] = log.IpAddress;
   row["Identd"] = log.Identd;
   row["RemoteUser"] = log.RemoteUser;
   row["LogDateTime"] = log.LogDateTime;
   row["Method"] = log.Method;
   row["Resource"] = log.Resource;
   row["Protocol"] = log.Protocol;
   row["QueryString"] = log.QueryString;
   row["StatusCode"] = log.StatusCode;
   row["Size"] = log.Size;
   row["Referer"] = log.Referer;
   row["UserAgent"] = log.UserAgent;

Now you're ready to use the SqlBulkCopy class. You will need an open SqlConnection object (this example pulls the connection string from the config file). Once you've created the SqlBulkCopy object you need to do two things: set the destination table name (the name of the table you will be loading); and call the WriteToServer function passing the DataTable.

This example also provides the column mappings from the DataTable to the table in SQL Server. If your DataTable columns and SQL server columns are in the same positions, then there will be no need to provide the mapping, but in this case the SQL Server table has an ID column and the DataTable does not need to explicitly map them:

using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["LogParserContext"].ConnectionString))
   using (SqlBulkCopy s = new SqlBulkCopy(conn))
      s.DestinationTableName = "LogBulkLoad";
      s.ColumnMappings.Add("IpAddress", "IpAddress");
      s.ColumnMappings.Add("Identd", "Identd");
      s.ColumnMappings.Add("RemoteUser", "RemoteUser");
      s.ColumnMappings.Add("LogDateTime", "LogDateTime");
      s.ColumnMappings.Add("Method", "Method");
      s.ColumnMappings.Add("Resource", "Resource");
      s.ColumnMappings.Add("Protocol", "Protocol");
      s.ColumnMappings.Add("QueryString", "QueryString");
      s.ColumnMappings.Add("StatusCode", "StatusCode");
      s.ColumnMappings.Add("Size", "Size");
      s.ColumnMappings.Add("Referer", "Referer");
      s.ColumnMappings.Add("UserAgent", "UserAgent");

There are other features of the SqlBulkCopy class that are useful. The BatchSize property can control the number of rows in each batch sent to the server, and the NotifyAfter property allows an event to be fired after a specified number of rows, which is useful for updating the user on the progress of the load.

No comments :

Post a comment