Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics

Dashboard
Notifications
Mark all as read
Q&A

What are the factors to consider when deciding to split ORM queries or use eager loading?

+2
−0

I have extensive working experience with Entity Framework ORM and have noticed two major ways of writing LINQ (LINQ2SQL):

  • lot of eager loading: the most prevalent, uses (lots) of Includes to eager load children entities
  • split in multiple queries: rarely using Include, splits the loading in multiple queries

In the beginning, I have used the first choice, but after becoming knowledgeable also in SQL development and profiling the applications, I have seen the horrible queries generated by this and switched to the second way of writing LINQ.

To illustrate what I am saying I have written the same functionality using Include and loading of data using distinct queries. The resulting queries look like this:

Eager load

exec sp_executesql N'SELECT [t].[Id], [t].[Archive], [t].[CultureName], [t].[DateCreation], [t].[DateModification], [t].[Email], [t].[EmployeeId], [t].[FirstName], [t].[LastName], [t].[Login], [t].[User], [a0].[Id], [a0].[BuId], [a0].[DateCreation], [a0].[DateModification], [a0].[RoleId], [a0].[User], [a0].[UserId], [a1].[Id], [a1].[DateCreation], [a1].[DateModification], [a1].[PurchaseCategoryId], [a1].[User], [a1].[UserId], [a2].[Id], [a2].[DateCreation], [a2].[DateModification], [a2].[RefUserLocationId], [a2].[User], [a2].[UserId]
FROM (
	SELECT TOP(1) [a].[Id], [a].[Archive], [a].[CultureName], [a].[DateCreation], [a].[DateModification], [a].[Email], [a].[EmployeeId], [a].[FirstName], [a].[LastName], [a].[Login], [a].[User]
	FROM [ApplicationUser] AS [a]
	WHERE [a].[Id] = @__id_0
) AS [t]
LEFT JOIN [ApplicationUserRole] AS [a0] ON [t].[Id] = [a0].[UserId]
LEFT JOIN [ApplicationUserPurchaseCategory] AS [a1] ON [t].[Id] = [a1].[UserId]
LEFT JOIN [ApplicationUserLocation] AS [a2] ON [t].[Id] = [a2].[UserId]
ORDER BY [t].[Id], [a0].[Id], [a1].[Id], [a2].[Id]',N'@__id_0 int',@__id_0=617

Separate queries

exec sp_executesql N'SELECT TOP(1) [a].[Email]
FROM [ApplicationUser] AS [a]
WHERE [a].[Id] = @__impersonatorId_0',N'@__impersonatorId_0 int',@__impersonatorId_0=617

exec sp_executesql N'SELECT [a].[RoleId], [a].[BuId]
FROM [ApplicationUserRole] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[PurchaseCategoryId]
FROM [ApplicationUserPurchaseCategory] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[RefUserLocationId]
FROM [ApplicationUserLocation] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[RefUserDepartmentId]
FROM [ApplicationUserDepartment] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

By running the queries in SSMS and monitoring with SQL profiler I see the following differences:

  • way less loaded data for the second case
  • no matter how many times I run the queries for several IDs, the big one does not seem to use a cache (the same number of page reads). The split ones seem to be cached (even if I change the IDs) and they generate a tiny amount of reads and CPU

I am trying to understand why so many developers are always using Include if the profiler indicates that SQL Server activity is far from being optimal:

  • C# code is very easy to write for eager loading everything. Just a DbSet<> followed by several Includes. However, EF creates a "denormalized" query that selects all columns from all involved tables.
  • C# is rather convoluted for the multiple LINQs approach. This reduces the data paylod retrieved from the server and also allow for some queries to be replaced with getting data from some cache (e.g. for data that rarely changes). However, this clearly generates multiple round-trips to the SQL server.

I am wondering if assuming current infrastructure (data centers), is having multiple round-trips an issue in these cases? Are there any other factors to consider when deciding which approach is fit for a particular case?

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comments

0 answers

Sign up to answer this question »