Steve Chan: January 2016

Friday 29 January 2016

ASP.NET Tips #76 - Preallocate sizes on things if you can

Many objects like memory stream, list, and dictionary will double their size as needed causing wasteful copying and allocations. If you know your list will have 100,000 items, initialize it as such to avoid problems in the future.

Thursday 28 January 2016

ASP.NET Tips #75 - Learn about the .NET Garbage Collector (GC) and when it can cause pauses that slow your application down

Over time the .NET GC has become more advanced (most notably the Background Server GC Mode in .NET 4.5), but there are still situations where the GC can have an adverse effect on your application's performance.

Understanding how to detect these situations and more importantly how to fix them is a useful skill. For example, in many applications there are some actions which are more performance-critical than others, and it would be preferable for Garbage Collection to run during the less critical periods. Setting a GCLatencyMode is a useful way of asking the Garbage Collector to be more conservative about choosing to run during these times.

Thursday 21 January 2016

ASP.NET Tips #74 - Don't call GC.Collect() explicitly

The Garbage Collector is very good at working out appropriate times to run, influenced by factors like memory usage in the application and OS memory pressure. It's almost never necessary to call it explicitly.

Worse, running a Garbage Collection has an impact on application performance. The performance hit is proportional to the number of objects in memory which survive Garbage Collection, so running the Garbage Collector earlier or more frequently than necessary can seriously harm performance.

Wednesday 20 January 2016

ASP.NET Tips #73 - Use lists instead of arrays when the size is not known in advance

When you want to add or remove data, use lists instead of arrays. Lists grow dynamically and don't need to reserve more space than is needed, whereas resizing arrays is expensive. This is particularly useful when you know what the pattern of growth is going to be, and what your future pattern of access will be.

Monday 18 January 2016

ASP.NET Tips #72 - If you don't need all the columns from your table, don't select them

Entity Framework is a great productivity booster for developers. Instead of writing tedious SQL statements, you just write code like this:

var products = db.Products.AsNoTracking().ToList();

A line like this is great if you only have a few columns in your products table or if you don't care much about performance or memory consumption. With this code, Entity Framework selects all the columns of the products table, but if your product table has 25 columns and you only need two of them, your database, network, and PC all run slower.

Do your users a favor and retrieve only the columns you need. If, for example, you just need the 'Id' and 'Name' fields, you could rewrite your statement like this:

var db.Products.Select(p => new {p.Id, p.Name}).AsNoTracking().ToList();

Friday 15 January 2016

ASP.NET Tips #71 - Use caching to reduce load on the database

Accessing the database is often the slowest aspect of an application due to the physical nature of accessing the data from disk (database query caching not withstanding). Developing your application with an efficient caching mechanism in mind can relieve the need for your database to perform requests and let it devote its time where required.

Simple things like caching reference data or data that changes very infrequently can make easy gains and reduce load on your database. As you cache more and more, it's important to ensure you invalidate cached data when it is updated using a common key, and this needs to be factored into the design.

For a headstart on caching and supporting multiple cache engines easily, try this library https://bitbucket.org/glav/cacheadapter. This allows you to support ASP.NET web cache,memcached, Redis and the now defunct AppFabric on the same codebase via configuration only.

Thursday 14 January 2016

ASP.NET Tips #70 - Indexing tables is not an exact science

Indexing tables requires some trial and error combined with lots of testing to get things right. Even then, performance metrics change over time as more and more data is added or becomes aged.

When you're using SQL Server, it's a good idea to regularly run and analyze the standard reports SQL Server provides that show index usage (such as top queries by total IO, top queries by average IO, etc). This can highlight unused indexes, and can also show queries that are using excessive IO which may require further indexing.

For a deeper analysis of queries, you can also use a tool like ANTS Performance Profiler or Telerik JustTrace.

Wednesday 13 January 2016

ASP.NET Tips #69 - Use AsNoTracking when retrieving data for reading with Entity Framework

In a lot of cases, data retrieved by the Entity Framework will not be modified within the scope of the same DBContext. Typical examples of this are ASP.NET MVC or ASP.NET MVC API action methods that just answer a get request. However, if you're using the default way of retrieving data, Entity Framework will prepare everything to be able to detect changes on your retrieved entities in order to persist those changes later in the database. This doesn’t only add a performance penalty, but costs some memory, too.

A typical method of retrieving data is:

var products = db.Products.Where(p => p.InStock).ToList();

A better way uses the extension method AsNoTracking from the System.Data.Entity Namespace:

var products = db.Products.Where(p => p.InStock).AsNoTracking().ToList();

Tuesday 12 January 2016

ASP.NET Tips #68 - Use the SQLBulkCopy class to load data into SQL Server from .NET

Using SQLBulkCopy can dramatically decrease the time it takes to load data into SQL Server. A test using SQL Server 2012 on a local machine loading a 100,000 row file had the following results:

Using a stored procedure: 37 seconds
Using concatenated inline SQL: 45 seconds
Using Entity Framework: 45 minutes
Using the SQLBulkCopy class: 4.5 seconds

Let's say you need to load a web server log into SQL Server. You would still need to load a file, read the file, parse the file, and load the data into objects. Then you would create a DataTable (you could also use the DataReader or an array of DataRow too):

DataTable table = new DataTable();
table.TableName = "LogBulkLoad";

table.Columns.Add("IpAddress", typeof(string));
table.Columns.Add("Identd", typeof(string));
table.Columns.Add("RemoteUser", typeof(string));
table.Columns.Add("LogDateTime", typeof(System.DateTimeOffset));
table.Columns.Add("Method", typeof(string));
table.Columns.Add("Resource", typeof(string));
table.Columns.Add("Protocol", typeof(string));
table.Columns.Add("QueryString", typeof(string));
table.Columns.Add("StatusCode", typeof(int));
table.Columns.Add("Size", typeof(long));
table.Columns.Add("Referer", typeof(string));
table.Columns.Add("UserAgent", typeof(string));

Next step would be to load the DataTable with data that you've parsed:

foreach (var log in logData)
{
   DataRow row = table.NewRow();
   row["IpAddress"] = log.IpAddress;
   row["Identd"] = log.Identd;
   row["RemoteUser"] = log.RemoteUser;
   row["LogDateTime"] = log.LogDateTime;
   row["Method"] = log.Method;
   row["Resource"] = log.Resource;
   row["Protocol"] = log.Protocol;
   row["QueryString"] = log.QueryString;
   row["StatusCode"] = log.StatusCode;
   row["Size"] = log.Size;
   row["Referer"] = log.Referer;
   row["UserAgent"] = log.UserAgent;
   table.Rows.Add(row);
}

Now you're ready to use the SqlBulkCopy class. You will need an open SqlConnection object (this example pulls the connection string from the config file). Once you've created the SqlBulkCopy object you need to do two things: set the destination table name (the name of the table you will be loading); and call the WriteToServer function passing the DataTable.

This example also provides the column mappings from the DataTable to the table in SQL Server. If your DataTable columns and SQL server columns are in the same positions, then there will be no need to provide the mapping, but in this case the SQL Server table has an ID column and the DataTable does not need to explicitly map them:

using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["LogParserContext"].ConnectionString))
{
   conn.Open();
   using (SqlBulkCopy s = new SqlBulkCopy(conn))
   {
      s.DestinationTableName = "LogBulkLoad";
      s.ColumnMappings.Add("IpAddress", "IpAddress");
      s.ColumnMappings.Add("Identd", "Identd");
      s.ColumnMappings.Add("RemoteUser", "RemoteUser");
      s.ColumnMappings.Add("LogDateTime", "LogDateTime");
      s.ColumnMappings.Add("Method", "Method");
      s.ColumnMappings.Add("Resource", "Resource");
      s.ColumnMappings.Add("Protocol", "Protocol");
      s.ColumnMappings.Add("QueryString", "QueryString");
      s.ColumnMappings.Add("StatusCode", "StatusCode");
      s.ColumnMappings.Add("Size", "Size");
      s.ColumnMappings.Add("Referer", "Referer");
      s.ColumnMappings.Add("UserAgent", "UserAgent");
      s.WriteToServer((DataTable)table);
   }
}

There are other features of the SqlBulkCopy class that are useful. The BatchSize property can control the number of rows in each batch sent to the server, and the NotifyAfter property allows an event to be fired after a specified number of rows, which is useful for updating the user on the progress of the load.

Monday 11 January 2016

ASP.NET Tips #67 - Use LINQ's 'let' keyword to tune emitted SQL

Projecting a row into an object with a nested object has a big impact on the generated SQL. For example, here is an original query:

from s in Scholars
where s.ID == 2764
select new
{
   s.School.Address1,
   s.School.Address2,
   s.School.City,
   s.School.State,
   s.School.ZIP,
   s.School.PhoneNo,
   s.School.Email,
   Principal_FirstName = s.School.Leader.FirstName,
   Principal_LastName = s.School.Leader.LastName,
   Principal_Email = s.School.Leader.Email
}

This generates the following SQL:

SELECT
1 AS [C1],
[Extent2].[Address1] AS [Address1],
[Extent2].[Address2] AS [Address2],
[Extent2].[City] AS [City],
[Extent2].[State] AS [State],
[Extent2].[ZIP] AS [ZIP],
[Extent2].[PhoneNo] AS [PhoneNo],
[Extent2].[Email] AS [Email],
[Join2].[FirstName] AS [FirstName],
[Join4].[LastName] AS [LastName],
[Join6].[Email] AS [Email1]
   FROM [dbo].[Scholar] AS [Extent1]
   LEFT OUTER JOIN [dbo].[School] AS [Extent2] ON [Extent1].
[SchoolID] = [Extent2].[ID]
LEFT OUTER JOIN (SELECT [Extent3].[ID] AS [ID1],
[Extent4].[FirstName] AS [FirstName]
   FROM [dbo].[Staff] AS [Extent3]
   INNER JOIN [dbo].[Person] AS [Extent4] ON [Extent3].
[ID] = [Extent4].[ID] ) AS [Join2] ON [Extent2].
[LeaderStaffID] = [Join2].[ID1]
LEFT OUTER JOIN (SELECT [Extent5].[ID] AS [ID2],
[Extent6].[LastName] AS [LastName]
   FROM [dbo].[Staff] AS [Extent5]
   INNER JOIN [dbo].[Person] AS [Extent6] ON [Extent5].
[ID] = [Extent6].[ID] ) AS [Join4] ON [Extent2].
[LeaderStaffID] = [Join4].[ID2]
LEFT OUTER JOIN (SELECT [Extent7].[ID] AS [ID3],
[Extent8].[Email] AS [Email]
   FROM [dbo].[Staff] AS [Extent7]
   INNER JOIN [dbo].[Person] AS [Extent8] ON [Extent7].
[ID] = [Extent8].[ID] ) AS [Join6] ON [Extent2].
[LeaderStaffID] = [Join6].[ID3]
WHERE 2764 = [Extent1].[ID]

Using the 'let' keyword allows us to define the navigation as an alias:

from s in Scholars
where s.ID == 2764
let leader = s.School.Leader
select new
{
   s.School.Address1,
   s.School.Address2,
   s.School.City,
   s.School.State,
   s.School.ZIP,
   s.School.PhoneNo,
   s.School.Email,
   Principal = new {
      leader.FirstName,
      leader.LastName,
      leader.Email
   }
}

This results in a much smaller query:

SELECT
1 AS [C1],
[Extent2].[Address1] AS [Address1],
[Extent2].[Address2] AS [Address2],
[Extent2].[City] AS [City],
[Extent2].[State] AS [State],
[Extent2].[ZIP] AS [ZIP],
[Extent2].[PhoneNo] AS [PhoneNo],
[Extent2].[Email] AS [Email],
[Join2].[FirstName] AS [FirstName],
[Join2].[LastName] AS [LastName],
[Join2].[Email] AS [Email1]
   FROM [dbo].[Scholar] AS [Extent1]
   LEFT OUTER JOIN [dbo].[School] AS [Extent2] ON [Extent1].
[SchoolID] = [Extent2].[ID]
LEFT OUTER JOIN (SELECT [Extent3].[ID] AS [ID1],
[Extent4].[FirstName] AS [FirstName], [Extent4].
[LastName] AS [LastName], [Extent4].[Email] AS [Email]
   FROM [dbo].[Staff] AS [Extent3]
   INNER JOIN [dbo].[Person] AS [Extent4] ON [Extent3].
[ID] = [Extent4].[ID] ) AS [Join2] ON [Extent2].
[LeaderStaffID] = [Join2].[ID1]
WHERE 2764 = [Extent1].[ID]

Looking at the query execution plan in SSMS, the first query is roughly twice as expensive as the second, so not only is the query cleaner, but it performs and scales better as well.

A tool like Telerik JustTrace and ANTS Performance Profiler can be used here to discover the performance improvement because it lets you see the generated SQL from a LINQ to SQL/EF query.

Saturday 9 January 2016

ASP.NET Tips #66 - Beware hidden cursors

If you're using an ORM like Entity Framework, when you declare an object mapped to another table with foreign keys, you automatically get references of those related entities. Unfortunately, there is a significant hidden cost when accessing the related entities, as separate queries can be run to retrieve details of each referenced row. This is commonly called the n+1 select problem.

For example, consider the following code where we fetch a list of schools from a database then filter that list. On line 1, a query is run to retrieve a list of n schools. On line 2, for every item in the schools list, a query is run to retrieve the number of pupils at that school, so in total n+1 queries are run.

   List<School> schools = context.Schools.ToList();
   List<School> filteredSchools = schools.Where(s => s.Pupils.Count > 1000).ToList();

Consider using Eager Loading to avoid this scenario if you need to access properties of Pupils later:

   List<School> schools = context.Schools.Include(s => s.Pupils).ToList();

Or in this scenario, simply replace line 1 with:

   List<School> schools = context.Schools.Where(s => s.Pupils.Count > 1000).ToList();

Friday 8 January 2016

ASP.NET Tips #65 - Don't overlook 'WHERE IN' style LINQ to SQL Queries

Entity Framework is smart enough to convert the Contains() operator on LINQ queries to WHERE IN (...) in SQL. But there is a hidden problem: Giving a data set length of greater than around 10,000 records in a WHERE IN (...) clause will significantly degrade the performance of the query generation and query execution:

var ids = new int[]{0,1, 2,3,4,5,6,7,8,9,10........99995, 99996,99997,99998,99999};

var matches = (from person in people
               where ids.Contains(person.Id)
               select person).ToArray();

The above statement generates the following fat SQL query:

SELECT * FROM PERSON WHERE ID IN
   (0,1,2,3,4,5,6,7,8,9,10.....,99995,99996,99997,99998,99999)

It is advisable therefore to send data in batches. 500 records per batch, for example, would yield a significant improvement in performance, but you should do benchmarking to see what works for you.

Thursday 7 January 2016

ASP.NET Tips #64 - Avoid issues with string queries

Instructing Entity Framework to use the right kind of string when generating queries can resolve datatype conversion issues. A simple solution is to introduce Column Annotation:

public class MyTable
{
   [Column(TypeName="varchar")]
   public string Property1 { get; set; }
}

This is particularly valid for string datatypes where .NET Strings are Unicode by default and are not the same in SQL Server.

For more information, see: http://www.simple-talk.com/dotnet/.net-tools/catching-performance-issues-in-development/

Wednesday 6 January 2016

ASP.NET Tips #63 - Listen to generated SQL

Entity Framework greatly simplifies database access for the developer, but it can also introduce problems when you are writing more complex LINQ to SQL queries.

This may generate poorly-performing SQL, so troubleshooting the generated SQL with tools like SQL Server Profiler or ANTS Performance Profiler is very important.

Subscribe to: Posts ( Atom )

Steve Chan