A couple of weeks ago someone pointed out that we had a very subtle problem with publishing updated content to the live Web server which has led to us reorganising how we manage publishing, and has actually duplicated work in some cases.
Publishing workflow
Our publishing workflow is as follows:
- Publish locally
At 30 minutes past each hour Site Manager publishes the university Web site to a local directory, let’s call it the staging server. - Transfer changed files
On the hour rsync (we were using Transfer Manager until a problem prevented us from using it) copies over the changed files from the staging server to the live server.
The problem
The problem is, however, that we have no control over which files are copied first. So we would repeatedly receive helpdesk calls from users complaining that images on the homepage were missing, downloadable documents were not present, or RSS feed links led to a 404 page instead of the full news story.
It turned out that the issue was to do with the order that files were copied over from the staging server to live: some web pages were being copied over—sometimes minutes—before any related or supporting files.
The solution
Our solution was therefore to schedule two rsync synchronisations. About 10 minutes before the whole site synchronisation takes place we run a first sync on just the content of the media library (images, documents, videos, etc.).
Then 10 minutes later we run the full sync on the whole site which means that it doesn’t matter so much on the order of the files being copied over: the supporting media library items are already on the server.
Conclusion
I don’t like that workaround terribly much, to be honest, as it means that the contents of the media library is being copied over twice but it’s worked so far and the number of calls that we’ve received about missing content has dropped completely (so far).
Gareth @ St Andrews

The problem here sounds to be the time that rsync is taking to copy the data. Are you able to rsync the data to a folder on the web server, and the folder published by the web server (simply switch a symbolic link)? You may encounter problems with processes still accessing the old data. You may need to use copy/rsync between folders in the machine itself.