Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

How to tackle database migrations failure during application initialization on production systems?

+3
−0

Context

My ASP.NET Core application got stuck in Production with a 5xx error. By inspecting the logs, I have noticed that a database error occurred during application initialization, namely a SQL Server access error.

This happened because the application restarted and at that very moment, it tried to apply the database migrations (which basically meant just checking __EFMigrationsHistory since everything was already up to date) the SQL Server was not accessible.

This is a very unlikely event (happened once in about 1 month), but it still bothers me because I do not have access to the production myself and those who do are not paying much attention to the health monitoring alerts.

Trying to fix the problem

One quick step that I can take in order to minimize the likelihood of this failure happening is to add connection resiliency, which basically retries the SQL commands for certain errors (e.g. server unavailable, timeouts).

I have also realized that I have never deeply dived into applying migration best practices, so I did this time. The aforementioned article says that migrations should be applied manually in production (and I doing the exact opposite):

It's possible for the application itself to apply migrations programmatically, typically during startup. While productive for local development and testing of migrations, this approach is inappropriate for managing production databases, for the following reasons:

  • If multiple instances of your application are running, both applications could attempt to apply the migration concurrently and fail (or worse, cause data corruption).
  • Similarly, if an application is accessing the database while another application migrates it, this can cause severe issues.
  • The application must have elevated access to modify the database schema. It's generally good practice to limit the application's database permissions in production.
  • It's important to be able to roll back an applied migration in case of an issue. The other strategies provide this easily and out of the box.
  • The SQL commands are applied directly by the program, without giving the developer a chance to inspect or modify them. This can be dangerous in a production environment.

The main reason I chose to rely on automatic database migration for all environments is that there two teams managing a web server and database respectively and upgrading the code and the database at the same time never happens.

I have analyzed the reasons provided by Microsoft for manually applying the migrations:

  • currently, I have a single instance, so collisions cannot occur.
  • elevated rights - I have created a separate login and the migrations use their own connection string
  • migrations are applied as transactions. The first failure rolls back and stops. I do not understand how this point is an issue
  • no inspection - my current setup involves a pre-production environment restored each night. PreProd = Copied Production + anonymizations. This means I get the latest schema from Production each day. If I can apply migrations on PreProd, I can apply them on Prod. So far, I have never had any issues due to schema when applying migrations.

As I see it, I have to make a choice between the following:

  • manually apply migrations for production (more manual work which is something we want to minimize)
  • keep the automatic database migrations run, add the resilience and accept that the application restart has a very small change (maybe once in 1-2 months) of getting stuck (but still know a couple of minutes later due to health check monitoring)
  • keep the automatic database migrations run, but allow the application to run even if a migration fails (and maybe sent an e-mail to dev team to analyze the context). The only risk here would be that the app might run with an outdated schema (very unlikely)

How should I approach this?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (2 comments)

1 answer

+3
−0

manually apply migrations for production (more manual work which is something we want to minimize)

Do not do this. We are in an age where DevOps is king. Adding more manual steps to your deployment just means pulling your back from doing what is more important.

Maybe I can provide a third option which is:

  • Use CI to test whether your migrations are valid - don't deploy if they aren't. What I mean by that is build a database from scratch to see if the migration scripts all work as a CI stage.

  • Use CD to migrate via the command line dotnet ef database update. If it succeeds great if it doesn't - then don't deploy. EF will rollback anyway. So you don't have a problem since the new code hasn't been deployed and the migration hasn't been applied.

EF also provides down migrations as as well as up so if you had only a partial migration you can roll it back to the original state before its starting.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »