Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs

Dashboard
Notifications
Mark all as read
Q&A

Transferring files from a legacy project to an existing one as varbinary

+3
−0

Our team is currently transferring all functionality (+ some changes) from small and very old project A (almost code freeze) to project B (actively developed). As part of the data migration, there are about 10GB of files to be transferred.

Currently, project B stores files as varbinary in the database (not OK, will be migrated in a couple of months to an internal storage that provides an API). The files migration boils down to migrating physical files of project A to varbinary in project B.

The projects reside on different web servers which access different SQL Server instances. What I know for sure now is that a linked server is configured between the two SQL Server instances.

Project A and B will run side by side for a little while, so I need to support differential migration of data (real-time not required though).

Project A's files reside on the webserver and developers are not allowed to access any production server, except through the application's APIs.

Installing a new application on the production requires a lot of headaches (all apps are installed through an old-school pipeline), so the solution should involve using the existing applications.

What I have in mind for the migration:

  • add an endpoint in application A that syncs the files in project A's database
  • add an endpoint in application B that sync the files in its database using the existing linked server

An alternative that should be easier to maintain would be for application B to be able to read files directly from webserver A, but I am not sure that the security rules allow it.

Any ideas about how to improve this process? Or possible pitfalls? The only thing that concerns me is the transfer of large data over the linked server.

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Do the files in Project A get updated in place or are they immutable once they are uploaded/created e... (2 comments)

1 answer

+3
−0

The approach I would take given the constraints you've stated is to make much simpler and safer changes to Project A. Namely, 1) provide an API endpoint for fetching a file, and 2) provide an API endpoint, if needed, that takes some kind of "timestamp" (real or logical) and returns a list of files (optionally with some kind of checksum e.g. a MD5 hash) that have been added since that "timestamp" including any other metadata that's required.

As you've likely already gathered, Project B can then just fetch the list and then fetch files on its own schedule. It can refetch the list whenever it reaches the end of the current list or at any other time, e.g. if the Project B application is restarted.

The benefits of this approach is that it doesn't require any schema changes to Project A (assuming there's something already that can serve as a "timestamp"), and it's purely read-only from Project A. This avoids bloating Project A's database with file data that is not used by Project A. It also lessens risk of corrupting or breaking Project A, or even of needing downtime for it. It may also simplify throttling to avoid the "syncing" from being disruptive, if necessary.

On Project B's side, it's arguably less coupling to Project A. While accessing the file list endpoint is specialized to Project A, the file data endpoint can be generic and useful for other purposes. Even for syncing with Project A, it may be handy to be able to pull individual files on-demand outside of some background "syncing" process. Indeed, it's possible you could drop the file list endpoint entirely and do the file copying entirely on-demand.

This approach is basically the "read files directly from Project A's server" just mediated by Project A.

There may be different security requirements needed for these endpoints. You might be able to do something like IP address filtering since only Project B's web server needs access to these endpoints, though you can certainly use the same authorization mechanism you use for the other endpoints with just a "Project B" user or whatever.

This approach relies on the fact that the files are immutable. More coordination and complexity would be needed otherwise.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »