Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics

Dashboard
Notifications
Mark all as read
Q&A

How to tackle database migrations failure during application initialization on production systems?

+2
−0

Context

My ASP.NET Core application got stuck in Production with a 5xx error. By inspecting the logs, I have noticed that a database error occurred during application initialization, namely a SQL Server access error.

This happened because the application restarted and at that very moment, it tried to apply the database migrations (which basically meant just checking __EFMigrationsHistory since everything was already up to date) the SQL Server was not accessible.

This is a very unlikely event (happened once in about 1 month), but it still bothers me because I do not have access to the production myself and those who do are not paying much attention to the health monitoring alerts.

Trying to fix the problem

One quick step that I can take in order to minimize the likelihood of this failure happening is to add connection resiliency, which basically retries the SQL commands for certain errors (e.g. server unavailable, timeouts).

I have also realized that I have never deeply dived into applying migration best practices, so I did this time. The aforementioned article says that migrations should be applied manually in production (and I doing the exact opposite):

It's possible for the application itself to apply migrations programmatically, typically during startup. While productive for local development and testing of migrations, this approach is inappropriate for managing production databases, for the following reasons:

  • If multiple instances of your application are running, both applications could attempt to apply the migration concurrently and fail (or worse, cause data corruption).
  • Similarly, if an application is accessing the database while another application migrates it, this can cause severe issues.
  • The application must have elevated access to modify the database schema. It's generally good practice to limit the application's database permissions in production.
  • It's important to be able to roll back an applied migration in case of an issue. The other strategies provide this easily and out of the box.
  • The SQL commands are applied directly by the program, without giving the developer a chance to inspect or modify them. This can be dangerous in a production environment.

The main reason I chose to rely on automatic database migration for all environments is that there two teams managing a web server and database respectively and upgrading the code and the database at the same time never happens.

I have analyzed the reasons provided by Microsoft for manually applying the migrations:

  • currently, I have a single instance, so collisions cannot occur.
  • elevated rights - I have created a separate login and the migrations use their own connection string
  • migrations are applied as transactions. The first failure rolls back and stops. I do not understand how this point is an issue
  • no inspection - my current setup involves a pre-production environment restored each night. PreProd = Copied Production + anonymizations. This means I get the latest schema from Production each day. If I can apply migrations on PreProd, I can apply them on Prod. So far, I have never had any issues due to schema when applying migrations.

As I see it, I have to make a choice between the following:

  • manually apply migrations for production (more manual work which is something we want to minimize)
  • keep the automatic database migrations run, add the resilience and accept that the application restart has a very small change (maybe once in 1-2 months) of getting stuck (but still know a couple of minutes later due to health check monitoring)
  • keep the automatic database migrations run, but allow the application to run even if a migration fails (and maybe sent an e-mail to dev team to analyze the context). The only risk here would be that the app might run with an outdated schema (very unlikely)

How should I approach this?

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comments

The issues that might prompt rollback aren't just "the migration failed to complete successfully". Commonly, after deploying (especially database migrations), it's common to do at least some kind of smoke testing to make sure the production environment is behaving as expected. If these tests fail, rolling back is usually the easiest and safest option. For example, incorrect paths/credentials accidentally being hardcoded might be valid for PreProd but not Prod and completely bring down Prod. Derek Elkins‭ 7 days ago

@DerekElkins Yes, this can also happen and a smoke is complementary to the deployment using migrations. Also, production has a health monitoring in place which also checks that SELECTs can be performed against the database. Alexei‭ 7 days ago

1 answer

+3
−0

manually apply migrations for production (more manual work which is something we want to minimize)

Do not do this. We are in an age where DevOps is king. Adding more manual steps to your deployment just means pulling your back from doing what is more important.

Maybe I can provide a third option which is:

  • Use CI to test whether your migrations are valid - don't deploy if they aren't. What I mean by that is build a database from scratch to see if the migration scripts all work as a CI stage.

  • Use CD to migrate via the command line dotnet ef database update. If it succeeds great if it doesn't - then don't deploy. EF will rollback anyway. So you don't have a problem since the new code hasn't been deployed and the migration hasn't been applied.

EF also provides down migrations as as well as up so if you had only a partial migration you can roll it back to the original state before its starting.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comments

Sign up to answer this question »