Communities

Writing

Codidact Meta

The Great Outdoors

Photography & Video

Scientific Speculation

Cooking

Electrical Engineering

Judaism

Languages & Linguistics

$Mathematics$

tag:snake search within a tag

answers:0 unanswered questions

user:xxxx search by author id

score:0.5 posts with 0.5+ score

"snake oil" exact phrase

votes:4 posts with 4+ votes

created:<1w created < 1 week ago

post_type:xxxx type of post

Search help

Notifications

Mark all as read See all your notifications »

Q&A Code Reviews Meta

Q&A

Posts Tags Edits

Ask Question

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

What are the factors to consider when deciding to split ORM queries or use eager loading?

−0

I have extensive working experience with Entity Framework ORM and have noticed two major ways of writing LINQ (LINQ2SQL):

lot of eager loading: the most prevalent, uses (lots) of Includes to eager load children entities
split in multiple queries: rarely using Include, splits the loading in multiple queries

In the beginning, I have used the first choice, but after becoming knowledgeable also in SQL development and profiling the applications, I have seen the horrible queries generated by this and switched to the second way of writing LINQ.

To illustrate what I am saying I have written the same functionality using Include and loading of data using distinct queries. The resulting queries look like this:

Eager load

exec sp_executesql N'SELECT [t].[Id], [t].[Archive], [t].[CultureName], [t].[DateCreation], [t].[DateModification], [t].[Email], [t].[EmployeeId], [t].[FirstName], [t].[LastName], [t].[Login], [t].[User], [a0].[Id], [a0].[BuId], [a0].[DateCreation], [a0].[DateModification], [a0].[RoleId], [a0].[User], [a0].[UserId], [a1].[Id], [a1].[DateCreation], [a1].[DateModification], [a1].[PurchaseCategoryId], [a1].[User], [a1].[UserId], [a2].[Id], [a2].[DateCreation], [a2].[DateModification], [a2].[RefUserLocationId], [a2].[User], [a2].[UserId]
FROM (
	SELECT TOP(1) [a].[Id], [a].[Archive], [a].[CultureName], [a].[DateCreation], [a].[DateModification], [a].[Email], [a].[EmployeeId], [a].[FirstName], [a].[LastName], [a].[Login], [a].[User]
	FROM [ApplicationUser] AS [a]
	WHERE [a].[Id] = @__id_0
) AS [t]
LEFT JOIN [ApplicationUserRole] AS [a0] ON [t].[Id] = [a0].[UserId]
LEFT JOIN [ApplicationUserPurchaseCategory] AS [a1] ON [t].[Id] = [a1].[UserId]
LEFT JOIN [ApplicationUserLocation] AS [a2] ON [t].[Id] = [a2].[UserId]
ORDER BY [t].[Id], [a0].[Id], [a1].[Id], [a2].[Id]',N'@__id_0 int',@__id_0=617

Separate queries

exec sp_executesql N'SELECT TOP(1) [a].[Email]
FROM [ApplicationUser] AS [a]
WHERE [a].[Id] = @__impersonatorId_0',N'@__impersonatorId_0 int',@__impersonatorId_0=617

exec sp_executesql N'SELECT [a].[RoleId], [a].[BuId]
FROM [ApplicationUserRole] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[PurchaseCategoryId]
FROM [ApplicationUserPurchaseCategory] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[RefUserLocationId]
FROM [ApplicationUserLocation] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

exec sp_executesql N'SELECT [a].[RefUserDepartmentId]
FROM [ApplicationUserDepartment] AS [a]
WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617

By running the queries in SSMS and monitoring with SQL profiler I see the following differences:

way less loaded data for the second case
no matter how many times I run the queries for several IDs, the big one does not seem to use a cache (the same number of page reads). The split ones seem to be cached (even if I change the IDs) and they generate a tiny amount of reads and CPU

I am trying to understand why so many developers are always using Include if the profiler indicates that SQL Server activity is far from being optimal:

C# code is very easy to write for eager loading everything. Just a DbSet<> followed by several Includes. However, EF creates a "denormalized" query that selects all columns from all involved tables.
C# is rather convoluted for the multiple LINQs approach. This reduces the data paylod retrieved from the server and also allow for some queries to be replaced with getting data from some cache (e.g. for data that rarely changes). However, this clearly generates multiple round-trips to the SQL server.

I am wondering if assuming current infrastructure (data centers), is having multiple round-trips an issue in these cases? Are there any other factors to consider when deciding which approach is fit for a particular case?

posted over 4 years ago

CC BY-SA 4.0

Alexei‭

5082 reputation 115 102 702 499

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

0 comment threads

0 answers

Score Active Age