How to write database friendly code when using an ORM?

−0

Note: this is mostly based on personal experience rather than benchmarks. The examples would focus on using EF with SQL Server, but some points might apply to other ORMs and relational databases

The provided references explain why ORM and relational databases are quite incompatible but do not provide some techniques to minimize the impact of this incompatibility.

A programmer can minimize this by acknowledging that despite the ORM trying its best to generate decent, it is their job to write the LINQ queries to avoid complicated queries.

General tips

basic familiarity with SQL - any programmer should get a minimal knowledge of how the SQL works. I mean the ability to write fairly complex SELECT statements (JOINs, Grouping, windows functions).

This is helpful to understand how a good query should look and why some ORM generated queries are so bad for performance.

using a profiler - for SQL Server, we have SQL Profiler that can provide great insight with fairly little knowledge about how your application is hitting the database.
1. For most scenarios ("queries" or "commands") one should see very small values for Reads (that is 8K pages), CPU and duration, especially in a reduced development database.
2. Most (if not all) of the SELECT statements should be readable. If you have trouble grasping what a query is doing, it is most likely too complex and might lead to performance issues. However, it is expected that saving changes lead to big and hard-to-read queries (especially the MERGEs), as EF is trying to squeeze as much change logic in a single query.
writing code that resembles the way you would write a query - using lots Includes and ThenIncludes seems very cool as it allows to select of an entire entity hierarchy at once. However, the SELECT queries are very inefficient since they are repeating a lot of data. Instead, consider SELECTing and joining in a SQL natural way:

   var rawData = (
      from f in _context.Set<Foo>()
      join b in _context.Set<Bar>() on f.Id equals b.FooId
      select f.Col1, f.Col2, b.Col3
   ).ToList();

Clearly this requires more code than simply chaining Includes and ThenIncludes, but this generates a very good query since it is virtually the exact query one would write against the database server.

minimize the joins - in a normalized relational database, there will be a significant number of "list of values" or dictionaries (i.e. tables with less frequent changes and a relatively small amount of records). These could be cached by the app as maps (dictionaries), thus avoiding some joins:

// this is a very basic list-of-values provider with some caching
public class ListOfValuesProviderService : IListOfValuesProviderService
{
	private readonly IAppCache _cache;
	private readonly IApplicationDbContext _context;
	private readonly IPtbTranslationService _translationService;

	///
	public ListOfValuesProviderService(IAppCache cache, IApplicationDbContext context)
	{
		_cache = cache;
		_context = context;
	}
	
	public async Task<IList<TEnt>> GetAll<TEnt>(CancellationToken token)
		where TEnt : class
	{
		var ret = await _cache.GetOrAddAsync(CacheKey<TEnt>(),
			() => GetAllInner<TEnt>(token), Constants.Cache.DefaultReferentialsOptions);

		return ret;
	}	
	
	private async Task<IList<TEnt>> GetAllInner<TEnt>(CancellationToken token = default)
		where TEnt : class
	{
		var dbItems = await _context.ReadSet<TEnt>().ToListAsync(token);
		return dbItems;
	}	
	
	public async Task<IReadOnlyDictionary<int, TEnt>> GetEntityMap<TEnt>(CancellationToken token = default)
		where TEnt : class, IIdentifier
	{
		var all = await GetAll<TEnt>(token);
		var map = all.ToDictionary(e => e.Id, e => e);
		return map;
	}
	
	public void InvalidateCache<TEnt>()
		where TEnt : class, IStandardListItem
	{
		_cache.Remove(CacheKey<TEnt>());
	}	
	
	private string CacheKey<TEnt>() where TEnt : class
	{
		return $"ListOfValues-{typeof(TEnt).Name}";
	}	
}


// example usage
var smallItemMap = _lovProvider.GetMap<SmallItem>();
var dtos = new List<ResultDto>();
foreach (var item in rawData)
{
    var dto = MapRawDataToDto(rawData);
        dto.SmallItemVal = smallItemMap.GetValueOrDefault(rawData.SmallItemId);
    }

I know that is quite verbose, but for some real-life examples with 4-5 such joins the actually executed query becomes much slimmer.

split the queries - if the fetch logic is quite complicated, consider splitting the query. There is no one size fits all advice and this depends on the cost of getting extra data on the webserver vs. the cost of having to run a costly query on the DB server.

One often met in practice example is loading a page of data. The split can be done by getting the ids and some quick data for a page and having extra queries with INs (List<>.Contains()) to get the rest of the data.

avoid TransactionScope (from Derek Elkins‭' comment) because it might include steps that are not part of the transaction or be promoted to a distributed transaction without wanting this.
use no tracking - if you are only reading (as it happens in most of the time), use no tracking. This does not have an influence over the generated queries but optimizes the used resources at the application layer. Example:

// this is part of the database context and exposed in an interface
public IQueryable<TEnt> ReadSet<TEnt>() where TEnt : class
{
    return Set<TEnt>().AsNoTracking();
}

// example usage
var res = _context.ReadSet<Foo>().Where(f => f.IsActive).ToListAsync(token);

Real-life example

This is fetched from an older presentation of mine for a legacy application running on .NET Core 3.1, so some of this information might be obsolete in .NET 5+.

Note: there is a normalization issue in ApplicationUserRole, but the comparison is still pretty valid.

Classic "quick and dirty" LINQ code

var dbUser = await _genericContextOperationService.GetById(impersonatorId,
	new List<Expression<Func<ApplicationUser, object>>>
	{
		u => u.ApplicationUserRole,
		u => u.ApplicationUserPurchaseCategory,
		u => u.ApplicationUserLocation,
		u => u.ApplicationUserDepartment
	});

var impersonatedUserData = new ImpersonationSavedApplicationUser
{
	ApplicationUserId = impersonatorId,
	Email = dbUser.Email,
	RoleId = dbUser.ApplicationUserRole.First().RoleId,
	ImpersonatedUserId = impersonatedId,
	ImpersonationSavedApplicationUserBusinessUnit = dbUser.ApplicationUserRole
		.Select(ur => new ImpersonationSavedApplicationUserBusinessUnit { BusinessUnitId = ur.BuId })
		.Distinct()
		.ToList(),
	ImpersonationSavedApplicationUserPurchaseCategory = dbUser.ApplicationUserPurchaseCategory
		.Select(upc => new ImpersonationSavedApplicationUserPurchaseCategory { PurchaseCategoryId = upc.PurchaseCategoryId })
		.ToList(),
	ImpersonationSavedApplicationUserLocation = dbUser.ApplicationUserLocation
		.Select(ul => new ImpersonationSavedApplicationUserLocation { LocationId = ul.RefUserLocationId })
		.ToList(),
	ImpersonationSavedApplicationUserDepartment = dbUser.ApplicationUserDepartment
		.Select(ud => new ImpersonationSavedApplicationUserDepartment { DepartmentId = ud.RefUserDepartmentId })
		.ToList()
}

The generated query looks like this:

exec sp_executesql N'SELECT [t].[Id], [t].[Archive], [t].[CultureName], [t].[DateCreation], [t].[DateModification], [t].[Email], [t].[EmployeeId], [t].[FirstName], [t].[LastName], [t].[Login], [t].[User], [a0].[Id], [a0].[BuId], [a0].[DateCreation], [a0].[DateModification], [a0].[RoleId], [a0].[User], [a0].[UserId], [a1].[Id], [a1].[DateCreation], [a1].[DateModification], [a1].[PurchaseCategoryId], [a1].[User], [a1].[UserId], [a2].[Id], [a2].[DateCreation], [a2].[DateModification], [a2].[RefUserLocationId], [a2].[User], [a2].[UserId], [a3].[Id], [a3].[DateCreation], [a3].[DateModification], [a3].[RefUserDepartmentId], [a3].[User], [a3].[UserId]
FROM (
    SELECT TOP(1) [a].[Id], [a].[Archive], [a].[CultureName], [a].[DateCreation], [a].[DateModification], [a].[Email], [a].[EmployeeId], [a].[FirstName], [a].[LastName], [a].[Login], [a].[User]
    FROM [ApplicationUser] AS [a]
    WHERE [a].[Id] = @__id_0
) AS [t]
LEFT JOIN [ApplicationUserRole] AS [a0] ON [t].[Id] = [a0].[UserId]
LEFT JOIN [ApplicationUserPurchaseCategory] AS [a1] ON [t].[Id] = [a1].[UserId]
LEFT JOIN [ApplicationUserLocation] AS [a2] ON [t].[Id] = [a2].[UserId]
LEFT JOIN [ApplicationUserDepartment] AS [a3] ON [t].[Id] = [a3].[UserId]
ORDER BY [t].[Id], [a0].[Id], [a1].[Id], [a2].[Id]',N'@__id_0 int',@__id_0=1

This big query consistently has more than 300 reads, even after running it for multiple types (might not be cached).

Optimized LINQ code

One way to optimize is to split the fetching. Most likely a compromise between split and using Includes is the best approach.

public async Task<ApplicationUserFullIdsOnlyModel> GetApplicationUserFullIdsOnlyModel(int userId, CancellationToken cancellationToken)
{
	var buIds = await _context.ReadSet<ApplicationUserRole>()
		.SelectEx(ur => ur.UserId == userId, ur => new { ur.RoleId, ur.BuId }, cancellationToken);
	if (!buIds.Any())
		throw new ArgumentException($"No role found for user {userId}");

	var pcIds = await _context.ReadSet<ApplicationUserPurchaseCategory>()
		.SelectEx(upc => upc.UserId == userId, upc => upc.PurchaseCategoryId, cancellationToken);
	var locationIds = await _context.ReadSet<ApplicationUserLocation>()
		.SelectEx(ul => ul.UserId == userId, ul => ul.RefUserLocationId, cancellationToken);
	var departmentIds = await _context.ReadSet<ApplicationUserDepartment>()
		.SelectEx(ul => ul.UserId == userId, ul => ul.RefUserDepartmentId, cancellationToken);

	var ret = new ApplicationUserFullIdsOnlyModel
	{
		RoleId = buIds.First().RoleId,
		BusinessUnitIds = buIds.Select(ur => ur.BuId).Distinct().ToList(),
		PurchaseCategoryIds = pcIds,
		UserLocationIds = locationIds,
		UserDepartmentIds = departmentIds
	};
	return ret;

The generated queries look like this:

exec sp_executesql N'SELECT TOP(1) [a].[Email]
FROM [ApplicationUser] AS [a]
WHERE [a].[Id] = @__impersonatorId_0',N'@__impersonatorId_0 int',@__impersonatorId_0=1

exec sp_executesql N'SELECT [a].[RoleId], [a].[BuId]
FROM [ApplicationUserRole] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[PurchaseCategoryId]
FROM [ApplicationUserPurchaseCategory] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[RefUserLocationId]
FROM [ApplicationUserLocation] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[RefUserDepartmentId]
FROM [ApplicationUserDepartment] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

These consistently generate less than 30 reads in total and subsequent runs even generate none (everything seems to be cached).

What the profiler is not showing and might be an issue is the roundtrips overhead (these queries are issued by the app and processed by the database server one by one).

From my experience these techniques helped getting very good performance in most cases and the DBA appreciated our team's effort to reduce the strain on the DB.

posted almost 3 years ago

CC BY-SA 4.0

3y ago

Alexei‭

5003 reputation 112 101 690 487

Copy Link

Raw

Markdown

History

1 comment thread

TransactionScope (2 comments)

Communities

How to write database friendly code when using an ORM?

0 comment threads

1 answer

General tips

Real-life example

Classic "quick and dirty" LINQ code

Optimized LINQ code

1 comment thread

Communities

How to write database friendly code when using an ORM?

0 comment threads

1 answer

General tips

Real-life example

Classic "quick and dirty" ​LINQ code

Optimized ​LINQ code

1 comment thread

Classic "quick and dirty" LINQ code

Optimized LINQ code