Scheduling an rsync twice to avoid a publishing problem

A couple of weeks ago someone pointed out that we had a very subtle problem with publishing updated content to the live Web server which has led to us reorganising how we manage publishing, and has actually duplicated work in some cases.

Publishing workflow

Our publishing workflow is as follows:

Diagram of Site Manager publishing to a local folder then being copied over to live server

Publishing workflow

  1. Publish locally
    At 30 minutes past each hour Site Manager publishes the university Web site to a local directory, let’s call it the staging server.
  2. Transfer changed files
    On the hour rsync (we were using Transfer Manager until a problem prevented us from using it) copies over the changed files from the staging server to the live server.

The problem

The problem is, however, that we have no control over which files are copied first.  So we would repeatedly receive helpdesk calls from users complaining that images on the homepage were missing, downloadable documents were not present, or RSS feed links led to a 404 page instead of the full news story.

It turned out that the issue was to do with the order that files were copied over from the staging server to live: some web pages were being copied over—sometimes minutes—before any related or supporting files.

The solution

Our solution was therefore to schedule two rsync synchronisations.  About 10 minutes before the whole site synchronisation takes place we run a first sync on just the content of the media library (images, documents, videos, etc.).

Then 10 minutes later we run the full sync on the whole site which means that it doesn’t matter so much on the order of the files being copied over: the supporting media library items are already on the server.

Conclusion

I don’t like that workaround terribly much, to be honest, as it means that the contents of the media library is being copied over twice but it’s worked so far and the number of calls that we’ve received about missing content has dropped completely (so far).

Gareth @ St Andrews

About these ads

About Gareth J M Saunders

Hi, I'm Gareth J M Saunders, 6' 4", married to Jane, father of 3 boys (twins and singleton), I'm a priest in the Scottish Episcopal Church, employed as the Web Architect at the University of St Andrews. My main interests are HTML, CSS (inc. frameworks), jQuery, Information Architecture and Agile development.
This entry was posted in TransferManager and tagged , , , , , , , , , . Bookmark the permalink.

One Response to Scheduling an rsync twice to avoid a publishing problem

  1. The problem here sounds to be the time that rsync is taking to copy the data. Are you able to rsync the data to a folder on the web server, and the folder published by the web server (simply switch a symbolic link)? You may encounter problems with processes still accessing the old data. You may need to use copy/rsync between folders in the machine itself.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s