An upgrade to the server's operating system has gone badly wrong.
Incident Report for pandammonium
Postmortem

We are sorry pandammonium.org was down for so long (3 January 2023 till 5 February 2023).

We upgraded our Digital Ocean (DO) droplet’s operating system (OS) from Ubuntu 20.04 to Ubuntu 22.04, which turned out to be a really bad idea: WordPress (WP), which pandammonium.org is built on, stopped working.

We have learnt to never upgrade the operating system (OS) without checking it’s a good idea first, and to always make a backup, even if we think it’s going to be a straightforward update.

We understand that backups are great, even if they cost real money. Taking a backup is cheaper in terms of how much time is required to fix everything and how much stress is caused.

The reason we undertook the upgrade in the first place was because WordPress Site Health Status suggested we use a persistent object cache (POC) to improve site performance. To do this, we were going to follow a tutorial on DO’s website, which we thought would probably work, even though it was for Ubuntu 14.04. On logging in to the server, Ubuntu’s message of the day said New release '22.04.2 LTS' available. Run 'do-release-upgrade' to upgrade to it. So we did, and that's how we ended up in this pickle.

Stack Exchange said it would be too difficult a task to roll back Ubuntu to a previous version, so we got in touch with DO, who said it would be easiest to create a new droplet and transfer everything from the old droplet to the new one. Creating a new droplet was easy, but the transfer of data didn’t go particularly smoothly; we even had to create another new droplet after messing up again.

This was when we learned our lesson about taking backups, so we turned on weekly backups and took snapshots of the droplet before every major step. These snapshots turned out to be pretty useful!

Next time we even think about updating packages on our droplet, we’ll take a snapshot first. It’s far easier to roll back to a snapshot taken immediately before changes that break the site than it is to recreate the whole droplet and get everything just so.

We’ve added an addendum to the Ubuntu message of the day to remind us not to upgrade the OS.

As for the POC, we installed a plugin to do that, and WP has never complained about it since. We are learning to stop shaving yaks.

Thank you for your patience with this long-running problem.

Posted Mar 24, 2023 - 23:31 GMT

Resolved
An upgrade to the server's operating system, Ubuntu 20.04, went badly wrong on 3 January 2023. We worked with our hosting provider, Digital Ocean, to fix it. This took a while (until 5 February 2023).
Posted Feb 05, 2023 - 14:00 GMT