Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

How to write database friendly code when using an ORM?

+2
−0

There are a lot of articles and presentations that show little love for ORMs.

This is mainly because some queries are so complex and heavy on the database that they lead to significant issues in production.

After quickly reading the aforementioned article I understand that there are many incompatibilities between how databases (tables) work and ORM which is typically tied to an OOP language and that one should not expect them to work together very well.

I am interested in a short guide that shows how to try to get the best of both worlds, as much as possible.

For narrowing down the request, I will focus on the Entity Framework.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+2
−0

Note: this is mostly based on personal experience rather than benchmarks. The examples would focus on using EF with SQL Server, but some points might apply to other ORMs and relational databases

The provided references explain why ORM and relational databases are quite incompatible but do not provide some techniques to minimize the impact of this incompatibility.

A programmer can minimize this by acknowledging that despite the ORM trying its best to generate decent, it is their job to write the LINQ queries to avoid complicated queries.

General tips

  • basic familiarity with SQL - any programmer should get a minimal knowledge of how the SQL works. I mean the ability to write fairly complex SELECT statements (JOINs, Grouping, windows functions).

This is helpful to understand how a good query should look and why some ORM generated queries are so bad for performance.

  • using a profiler - for SQL Server, we have SQL Profiler that can provide great insight with fairly little knowledge about how your application is hitting the database.

    1. For most scenarios ("queries" or "commands") one should see very small values for Reads (that is 8K pages), CPU and duration, especially in a reduced development database.
    2. Most (if not all) of the SELECT statements should be readable. If you have trouble grasping what a query is doing, it is most likely too complex and might lead to performance issues. However, it is expected that saving changes lead to big and hard-to-read queries (especially the MERGEs), as EF is trying to squeeze as much change logic in a single query.
  • writing code that resembles the way you would write a query - using lots Includes and ThenIncludes seems very cool as it allows to select of an entire entity hierarchy at once. However, the SELECT queries are very inefficient since they are repeating a lot of data. Instead, consider SELECTing and joining in a SQL natural way:

   var rawData = (
      from f in _context.Set<Foo>()
      join b in _context.Set<Bar>() on f.Id equals b.FooId
      select f.Col1, f.Col2, b.Col3
   ).ToList();

Clearly this requires more code than simply chaining Includes and ThenIncludes, but this generates a very good query since it is virtually the exact query one would write against the database server.

  • minimize the joins - in a normalized relational database, there will be a significant number of "list of values" or dictionaries (i.e. tables with less frequent changes and a relatively small amount of records). These could be cached by the app as maps (dictionaries), thus avoiding some joins:
// this is a very basic list-of-values provider with some caching
public class ListOfValuesProviderService : IListOfValuesProviderService
{
	private readonly IAppCache _cache;
	private readonly IApplicationDbContext _context;
	private readonly IPtbTranslationService _translationService;

	///
	public ListOfValuesProviderService(IAppCache cache, IApplicationDbContext context)
	{
		_cache = cache;
		_context = context;
	}
	
	public async Task<IList<TEnt>> GetAll<TEnt>(CancellationToken token)
		where TEnt : class
	{
		var ret = await _cache.GetOrAddAsync(CacheKey<TEnt>(),
			() => GetAllInner<TEnt>(token), Constants.Cache.DefaultReferentialsOptions);

		return ret;
	}	
	
	private async Task<IList<TEnt>> GetAllInner<TEnt>(CancellationToken token = default)
		where TEnt : class
	{
		var dbItems = await _context.ReadSet<TEnt>().ToListAsync(token);
		return dbItems;
	}	
	
	public async Task<IReadOnlyDictionary<int, TEnt>> GetEntityMap<TEnt>(CancellationToken token = default)
		where TEnt : class, IIdentifier
	{
		var all = await GetAll<TEnt>(token);
		var map = all.ToDictionary(e => e.Id, e => e);
		return map;
	}
	
	public void InvalidateCache<TEnt>()
		where TEnt : class, IStandardListItem
	{
		_cache.Remove(CacheKey<TEnt>());
	}	
	
	private string CacheKey<TEnt>() where TEnt : class
	{
		return $"ListOfValues-{typeof(TEnt).Name}";
	}	
}


// example usage
var smallItemMap = _lovProvider.GetMap<SmallItem>();
var dtos = new List<ResultDto>();
foreach (var item in rawData)
{
    var dto = MapRawDataToDto(rawData);
        dto.SmallItemVal = smallItemMap.GetValueOrDefault(rawData.SmallItemId);
    }

I know that is quite verbose, but for some real-life examples with 4-5 such joins the actually executed query becomes much slimmer.

  • split the queries - if the fetch logic is quite complicated, consider splitting the query. There is no one size fits all advice and this depends on the cost of getting extra data on the webserver vs. the cost of having to run a costly query on the DB server.

One often met in practice example is loading a page of data. The split can be done by getting the ids and some quick data for a page and having extra queries with INs (List<>.Contains()) to get the rest of the data.

  • avoid TransactionScope (from Derek Elkins‭' comment) because it might include steps that are not part of the transaction or be promoted to a distributed transaction without wanting this.

  • use no tracking - if you are only reading (as it happens in most of the time), use no tracking. This does not have an influence over the generated queries but optimizes the used resources at the application layer. Example:

// this is part of the database context and exposed in an interface
public IQueryable<TEnt> ReadSet<TEnt>() where TEnt : class
{
    return Set<TEnt>().AsNoTracking();
}

// example usage
var res = _context.ReadSet<Foo>().Where(f => f.IsActive).ToListAsync(token);

Real-life example

This is fetched from an older presentation of mine for a legacy application running on .NET Core 3.1, so some of this information might be obsolete in .NET 5+.

Note: there is a normalization issue in ApplicationUserRole, but the comparison is still pretty valid.

Classic "quick and dirty" ​LINQ code

var dbUser = await _genericContextOperationService.GetById(impersonatorId,
	new List<Expression<Func<ApplicationUser, object>>>
	{
		u => u.ApplicationUserRole,
		u => u.ApplicationUserPurchaseCategory,
		u => u.ApplicationUserLocation,
		u => u.ApplicationUserDepartment
	});

var impersonatedUserData = new ImpersonationSavedApplicationUser
{
	ApplicationUserId = impersonatorId,
	Email = dbUser.Email,
	RoleId = dbUser.ApplicationUserRole.First().RoleId,
	ImpersonatedUserId = impersonatedId,
	ImpersonationSavedApplicationUserBusinessUnit = dbUser.ApplicationUserRole
		.Select(ur => new ImpersonationSavedApplicationUserBusinessUnit { BusinessUnitId = ur.BuId })
		.Distinct()
		.ToList(),
	ImpersonationSavedApplicationUserPurchaseCategory = dbUser.ApplicationUserPurchaseCategory
		.Select(upc => new ImpersonationSavedApplicationUserPurchaseCategory { PurchaseCategoryId = upc.PurchaseCategoryId })
		.ToList(),
	ImpersonationSavedApplicationUserLocation = dbUser.ApplicationUserLocation
		.Select(ul => new ImpersonationSavedApplicationUserLocation { LocationId = ul.RefUserLocationId })
		.ToList(),
	ImpersonationSavedApplicationUserDepartment = dbUser.ApplicationUserDepartment
		.Select(ud => new ImpersonationSavedApplicationUserDepartment { DepartmentId = ud.RefUserDepartmentId })
		.ToList()
}

The generated query looks like this:

exec sp_executesql N'SELECT [t].[Id], [t].[Archive], [t].[CultureName], [t].[DateCreation], [t].[DateModification], [t].[Email], [t].[EmployeeId], [t].[FirstName], [t].[LastName], [t].[Login], [t].[User], [a0].[Id], [a0].[BuId], [a0].[DateCreation], [a0].[DateModification], [a0].[RoleId], [a0].[User], [a0].[UserId], [a1].[Id], [a1].[DateCreation], [a1].[DateModification], [a1].[PurchaseCategoryId], [a1].[User], [a1].[UserId], [a2].[Id], [a2].[DateCreation], [a2].[DateModification], [a2].[RefUserLocationId], [a2].[User], [a2].[UserId], [a3].[Id], [a3].[DateCreation], [a3].[DateModification], [a3].[RefUserDepartmentId], [a3].[User], [a3].[UserId]
FROM (
    SELECT TOP(1) [a].[Id], [a].[Archive], [a].[CultureName], [a].[DateCreation], [a].[DateModification], [a].[Email], [a].[EmployeeId], [a].[FirstName], [a].[LastName], [a].[Login], [a].[User]
    FROM [ApplicationUser] AS [a]
    WHERE [a].[Id] = @__id_0
) AS [t]
LEFT JOIN [ApplicationUserRole] AS [a0] ON [t].[Id] = [a0].[UserId]
LEFT JOIN [ApplicationUserPurchaseCategory] AS [a1] ON [t].[Id] = [a1].[UserId]
LEFT JOIN [ApplicationUserLocation] AS [a2] ON [t].[Id] = [a2].[UserId]
LEFT JOIN [ApplicationUserDepartment] AS [a3] ON [t].[Id] = [a3].[UserId]
ORDER BY [t].[Id], [a0].[Id], [a1].[Id], [a2].[Id]',N'@__id_0 int',@__id_0=1

This big query consistently has more than 300 reads, even after running it for multiple types (might not be cached).

Optimized ​LINQ code

One way to optimize is to split the fetching. Most likely a compromise between split and using Includes is the best approach.

public async Task<ApplicationUserFullIdsOnlyModel> GetApplicationUserFullIdsOnlyModel(int userId, CancellationToken cancellationToken)
{
	var buIds = await _context.ReadSet<ApplicationUserRole>()
		.SelectEx(ur => ur.UserId == userId, ur => new { ur.RoleId, ur.BuId }, cancellationToken);
	if (!buIds.Any())
		throw new ArgumentException($"No role found for user {userId}");

	var pcIds = await _context.ReadSet<ApplicationUserPurchaseCategory>()
		.SelectEx(upc => upc.UserId == userId, upc => upc.PurchaseCategoryId, cancellationToken);
	var locationIds = await _context.ReadSet<ApplicationUserLocation>()
		.SelectEx(ul => ul.UserId == userId, ul => ul.RefUserLocationId, cancellationToken);
	var departmentIds = await _context.ReadSet<ApplicationUserDepartment>()
		.SelectEx(ul => ul.UserId == userId, ul => ul.RefUserDepartmentId, cancellationToken);

	var ret = new ApplicationUserFullIdsOnlyModel
	{
		RoleId = buIds.First().RoleId,
		BusinessUnitIds = buIds.Select(ur => ur.BuId).Distinct().ToList(),
		PurchaseCategoryIds = pcIds,
		UserLocationIds = locationIds,
		UserDepartmentIds = departmentIds
	};
	return ret;

The generated queries look like this:

exec sp_executesql N'SELECT TOP(1) [a].[Email]
FROM [ApplicationUser] AS [a]
WHERE [a].[Id] = @__impersonatorId_0',N'@__impersonatorId_0 int',@__impersonatorId_0=1

exec sp_executesql N'SELECT [a].[RoleId], [a].[BuId]
FROM [ApplicationUserRole] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[PurchaseCategoryId]
FROM [ApplicationUserPurchaseCategory] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[RefUserLocationId]
FROM [ApplicationUserLocation] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

exec sp_executesql N'SELECT [a].[RefUserDepartmentId]
FROM [ApplicationUserDepartment] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=1

These consistently generate less than 30 reads in total and subsequent runs even generate none (everything seems to be cached).

What the profiler is not showing and might be an issue is the roundtrips overhead (these queries are issued by the app and processed by the database server one by one).

From my experience these techniques helped getting very good performance in most cases and the DBA appreciated our team's effort to reduce the strain on the DB.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

TransactionScope (2 comments)

Sign up to answer this question »