Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
I have extensive working experience with Entity Framework ORM and have noticed two major ways of writing LINQ (LINQ2SQL): lot of eager loading: the most prevalent, uses (lots) of Includes to eag...
#1: Initial revision
What are the factors to consider when deciding to split ORM queries or use eager loading?
I have extensive working experience with Entity Framework ORM and have noticed two major ways of writing LINQ (LINQ2SQL): - lot of eager loading: the most prevalent, uses (lots) of Includes to eager load children entities - split in multiple queries: rarely using Include, splits the loading in multiple queries In the beginning, I have used the first choice, but after becoming knowledgeable also in SQL development and profiling the applications, I have seen the horrible queries generated by this and switched to the second way of writing LINQ. To illustrate what I am saying I have written the same functionality using `Include` and loading of data using distinct queries. The resulting queries look like this: ### Eager load exec sp_executesql N'SELECT [t].[Id], [t].[Archive], [t].[CultureName], [t].[DateCreation], [t].[DateModification], [t].[Email], [t].[EmployeeId], [t].[FirstName], [t].[LastName], [t].[Login], [t].[User], [a0].[Id], [a0].[BuId], [a0].[DateCreation], [a0].[DateModification], [a0].[RoleId], [a0].[User], [a0].[UserId], [a1].[Id], [a1].[DateCreation], [a1].[DateModification], [a1].[PurchaseCategoryId], [a1].[User], [a1].[UserId], [a2].[Id], [a2].[DateCreation], [a2].[DateModification], [a2].[RefUserLocationId], [a2].[User], [a2].[UserId] FROM ( SELECT TOP(1) [a].[Id], [a].[Archive], [a].[CultureName], [a].[DateCreation], [a].[DateModification], [a].[Email], [a].[EmployeeId], [a].[FirstName], [a].[LastName], [a].[Login], [a].[User] FROM [ApplicationUser] AS [a] WHERE [a].[Id] = @__id_0 ) AS [t] LEFT JOIN [ApplicationUserRole] AS [a0] ON [t].[Id] = [a0].[UserId] LEFT JOIN [ApplicationUserPurchaseCategory] AS [a1] ON [t].[Id] = [a1].[UserId] LEFT JOIN [ApplicationUserLocation] AS [a2] ON [t].[Id] = [a2].[UserId] ORDER BY [t].[Id], [a0].[Id], [a1].[Id], [a2].[Id]',N'@__id_0 int',@__id_0=617 ### Separate queries exec sp_executesql N'SELECT TOP(1) [a].[Email] FROM [ApplicationUser] AS [a] WHERE [a].[Id] = @__impersonatorId_0',N'@__impersonatorId_0 int',@__impersonatorId_0=617 exec sp_executesql N'SELECT [a].[RoleId], [a].[BuId] FROM [ApplicationUserRole] AS [a] WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617 exec sp_executesql N'SELECT [a].[PurchaseCategoryId] FROM [ApplicationUserPurchaseCategory] AS [a] WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617 exec sp_executesql N'SELECT [a].[RefUserLocationId] FROM [ApplicationUserLocation] AS [a] WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617 exec sp_executesql N'SELECT [a].[RefUserDepartmentId] FROM [ApplicationUserDepartment] AS [a] WHERE [a].[UserId] = @__userId_0',N'@__userId_0 int',@__userId_0=617 By running the queries in SSMS and monitoring with SQL profiler I see the following differences: - way less loaded data for the second case - no matter how many times I run the queries for several IDs, the big one does not seem to use a cache (the same number of page reads). The split ones seem to be cached (even if I change the IDs) and they generate a tiny amount of reads and CPU I am trying to understand why so many developers are always using Include if the profiler indicates that SQL Server activity is far from being optimal: - C# code is very easy to write for eager loading everything. Just a DbSet<> followed by several Includes. However, EF creates a "denormalized" query that selects all columns from all involved tables. - C# is rather convoluted for the multiple LINQs approach. This reduces the data paylod retrieved from the server and also allow for some queries to be replaced with getting data from some cache (e.g. for data that rarely changes). However, this clearly generates multiple round-trips to the SQL server. I am wondering if assuming current infrastructure (data centers), is having multiple round-trips an issue in these cases? Are there any other factors to consider when deciding which approach is fit for a particular case?