Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Health checks with caching in ASP.NET Core
Context
I noticed that an application was flooding the database with simple SELECTs. The investigation revealed some bugs in the health check which theoretically implemented caching (to avoid querying external sources too often), but it was not working properly.
The application health check includes checking all the important dependencies' health status (e.g. a simple select against the main database, a simple GET against an API).
The code
Health check base class
This abstract class provides a common lifecycle for all health checks, allowing for caching:
public abstract class CachedHealthCheckBase: IHealthCheck
{
private readonly int _checkInterval;
/// <summary>
///
/// </summary>
protected HealthCheckResult LastHealthCheckResult { get; set; } = HealthCheckResult.Healthy();
private static readonly SemaphoreSlim Semaphore = new(1);
/// <summary>
///
/// </summary>
/// <param name="cacheInterval"></param>
protected CachedHealthCheckBase(int cacheInterval)
{
_checkInterval = cacheInterval;
}
/// <summary>
///
/// </summary>
/// <param name="context"></param>
/// <param name="cancellationToken"></param>
/// <returns></returns>
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context,
CancellationToken cancellationToken = default)
{
if (!IsCacheExpired())
return LastHealthCheckResult;
try
{
await Semaphore.WaitAsync(cancellationToken);
if (IsCacheExpired())
{
LastCheckTime = DateTime.UtcNow;
await CheckService();
}
}
finally
{
Semaphore.Release();
}
return LastHealthCheckResult;
}
private bool IsCacheExpired()
{
return (_checkInterval == 0 || LastCheckTime.AddSeconds(_checkInterval) <= DateTime.UtcNow);
}
/// <summary>
///
/// </summary>
/// <returns></returns>
protected abstract Task CheckService();
protected abstract DateTime LastCheckTime { get; set; }
}
Database check (one of the health checks)
This is extending the class above and adds the actual health check code (health check specific depending on the checked target):
public class DatabasePlatformHealthCheck : CachedHealthCheckBase
{
private readonly IApplicationDbContext _ctx;
private readonly ILogger<DatabasePlatformHealthCheck> _logger;
///
public DatabasePlatformHealthCheck(IApplicationDbContext ctx, IOptions<HealthCheckSettings> options,
ILogger<DatabasePlatformHealthCheck> logger)
: base(options.Value.HealthCheckInterval)
{
_ctx = ctx;
_logger = logger;
}
/// using a static member because the health check is instantiated for each health check (scoped)
private static DateTime _lastCheckTime = DateTime.MinValue;
protected override DateTime LastCheckTime
{
get => _lastCheckTime;
set => _lastCheckTime = value;
}
///
protected override async Task CheckService()
{
LastHealthCheckResult = HealthCheckResult.Healthy();
try
{
var response = await _ctx.ReadSet<ApplicationUser>().FirstOrDefaultAsync();
if (response != null)
{
LastHealthCheckResult = HealthCheckResult.Healthy();
}
else
{
_logger.LogError("Unable to contact the main database");
LastHealthCheckResult = HealthCheckResult.Unhealthy("Unable to contact the main database");
}
}
catch (Exception ex)
{
_logger.LogError(ex, $"Unable to contact SQL Server - Body: {ex.Message}");
LastHealthCheckResult = HealthCheckResult.Unhealthy(exception: ex);
}
}
}
LastCheckTime
was initially placed in the base class, but since the health checks implementation classes are instantiated for each call, it was reset (part of the bug I have fixed). So, I have declared it as a static property for each health check.
The health check setup is shown below (Program.cs):
services.AddScoped<DatabasePlatformHealthCheck>();
services.AddHealthChecks().Add(new HealthCheckRegistration("databasePlatform",
provider => provider.GetService<DatabasePlatformHealthCheck>()!, HealthStatus.Unhealthy, null));
return services;
What I don't like is the way I am handling the last check datetime, which must be added as a static property within each implementing class. Any ideas about how to improve this part?
Based on the received feedback I have almost completely rewritten the health check:
/// <summary>
/// base class for cached health check results
/// </summary>
public class CachedHealthCheckService<THealthCheck> where THealthCheck: class, IHealthCheck
{
private readonly TimeSpan _checkInterval;
private readonly static Dictionary<string, (DateTime LastCheckTime, HealthCheckResult Result)> _checkResults = new();
private static readonly SemaphoreSlim Semaphore = new(1);
private static string HealthCheckKey => typeof(THealthCheck).Name;
///
public CachedHealthCheckService(TimeSpan cacheInterval)
{
_checkInterval = cacheInterval;
}
///
public async Task<HealthCheckResult> GetCachedHealthCheck(Func<CancellationToken,
Task<HealthCheckResult>> checkHealthFunc,
CancellationToken token = default)
{
try
{
await Semaphore.WaitAsync(token);
if (IsCacheExpired())
{
HealthCheckResult result = await checkHealthFunc(token);
_checkResults[HealthCheckKey] = (LastCheckTime: DateTime.UtcNow, Result: result);
}
return _checkResults[HealthCheckKey].Result;
}
finally
{
Semaphore.Release();
}
}
private bool IsCacheExpired()
{
if (!_checkResults.ContainsKey(HealthCheckKey))
return true;
DateTime lastCheckTime = _checkResults[HealthCheckKey].LastCheckTime;
return DateTime.UtcNow - lastCheckTime >= _checkInterval;
}
}
public class DatabasePlatformHealthCheck : IHealthCheck
{
private readonly IApplicationDbContext _ctx;
private readonly ILogger<DatabasePlatformHealthCheck> _logger;
private readonly CachedHealthCheckService<DatabasePlatformHealthCheck> _cachedHealthCheckService;
///
public DatabasePlatformHealthCheck(IApplicationDbContext ctx, IOptions<HealthCheckSettings> options,
ILogger<DatabasePlatformHealthCheck> logger)
: base()
{
_ctx = ctx;
_logger = logger;
_cachedHealthCheckService = new CachedHealthCheckService<DatabasePlatformHealthCheck>(TimeSpan.FromSeconds(options.Value.HealthCheckInterval));
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken token = default)
{
var result = await _cachedHealthCheckService.GetCachedHealthCheck(CheckService, token);
return result;
}
///
protected async Task<HealthCheckResult> CheckService(CancellationToken token)
{
try
{
var response = await _ctx.ReadSet<ApplicationUser>().FirstOrDefaultAsync(token);
if (response != null)
return HealthCheckResult.Healthy();
_logger.LogError("Unable to contact the main database");
return HealthCheckResult.Unhealthy("Unable to contact the main database");
}
catch (Exception ex)
{
_logger.LogError(ex, $"Unable to contact SQL Server - Body: {ex.Message}");
return HealthCheckResult.Unhealthy(exception: ex);
}
}
}
1 answer
private readonly int _checkInterval;
What are the units? I have to search for usages to find out. I strongly favour using TimeSpan
instead of int
.
private static readonly SemaphoreSlim Semaphore = new(1);
Is it essential that the semaphore be shared between all health checks? That conditions possible solutions to the problem raised in the question.
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default) { if (!IsCacheExpired()) return LastHealthCheckResult; try { await Semaphore.WaitAsync(cancellationToken); if (IsCacheExpired()) { LastCheckTime = DateTime.UtcNow; await CheckService(); } } finally { Semaphore.Release(); } return LastHealthCheckResult; }
I assume this is the method defined in IHealthCheck
, and there are two things about it which I find questionable.
- Is there a reason for choosing inheritance over composition? I.e. instead of calling an abstract
CheckService
, why not callCheckHealthAsync
on a wrappedIHealthService
? (I recognise that the answer might be "constraints of the IoC library"). - Is the double-checked locking with both
return
s outside the critical block actually safe? (It may be, but I'm wary and would like to see a comment justifying it).
protected abstract Task CheckService();
Not CheckServiceAsync
?
protected HealthCheckResult LastHealthCheckResult { get; set; } = HealthCheckResult.Healthy(); ... protected abstract DateTime LastCheckTime { get; set; }
LastCheckTime
is done like this because otherwise it doesn't get persisted. But that surely means that LastHealthCheckResult
doesn't get persisted, so if the test against LastCheckTime
determines that it's not necessary to perform the test again it will always return Healthy()
. I'm not sure whether the bug is in the implementation or the use of Cached in the name, but one of them is wrong.
What I don't like is the way I am handling the last check datetime, which must be added as a static property within each implementing class. Any ideas about how to improve this part?
Sure: move the static property up to CachedHealthCheckBase
. I see multiple options, although their elegance may be debatable:
-
If you want to stick to inheritance, use F-bounded polymorphism:
CachedHealthCheckBase<T> where T : CachedHealthCheckBase<T>
. Now each concrete implementation has a different base class. I concede that it's somewhat inelegant to create a type parameter which is then never used in the source code. -
Favour composition over inheritance and use a static
Dictionary<TKey, (DateTime, HealthCheckResult)
whereTKey
is either a string passed in to the constructor, a type passed into the constructor, orGetType()
called on the wrapped check. This would force you not to use double-checked locking and to move everything into the critical region so that access to the dictionary is thread-safe. -
Hybrid: use composition with polymorphism:
public class CachedHealthCheck<THealthCheck> : IHealthCheck where THealthCheck : IHealthCheck { ... public CachedHealthCheck(THealthCheck uncachedCheck, TimeSpan cacheInterval)
0 comment threads