2023 September 16
Made a mirror backup to Juno.
Made a backup AMI:
Scanalyst Backup 2023-09-16 ami-0abbdc019aed923b4
/ snap-063a73568642f7234
/server snap-02a20e8959480a284
The system had been up for 37 days.
Installed 62 update packages, 30 for security including a new kernel.
super
yum update
Stopped Discourse.
cd ~/discourse/image
./launcher stop app
Rebooted.
The system came up promptly after the reboot. We are now running on
kernel 4.14.322-246.539.amzn2.x86_64.
On the Discourse Upgrade manager page:
https://scanalyst.fourmilab.ch/admin/upgrade
I upgraded:
docker_manager (This upgrade must be run first:
the other upgrade buttons are
disabled until it has been updated.)
The upgrade page, running in Chromium, hung displaying the
progress in this update. I finally gave up, reloaded the page,
and it reported docker-manager updated successfully.
With docker_manager updated, I now proceeded to update:
discourse-spoiler-alert
discourse-chatbot
With the update of discourse-chatbot, a series of failures began which
ended up with the site completely down and unable to rebuild. This
began with an out of disc condition during the rebuild. I went through
the usual procedure of purging obsolete copies of the container to
recover space, which completed normally and freed up abundant space
for the update.
I then retried the build of discourse-chatbot, which failed. I was
still able to get to the admin/upgrade page at this point, which
reported discourse ready to update. I started this update, which
crashed with failures during the rebuild of discourse-chatbot.
Next, I fell back to rebuilding the entire application from the ground
up with:
cd ~/discourse/image
./launcher stop app
git pull
./launcher rebuild app
This crashed and sent me to try running:
./discourse-doctor
which promptly crashed as well.
At this point I was out of altitude, out of airspeed, and out of ideas.
I was about to give up and restore the site from the snapshots I made
at the start of the update process, but before doing that I decided
to try disabling discourse-chatbot, as it is a non-standard plug-in
and its failure to update was the first sign of the impending collapse.
I edited ~/discourse/image/containers/app.yml and commented out the
statement that includes discourse-chatbot:
# - git clone https://github.com/merefield/discourse-chatbot.git
and rebuilt. This ran to completion and the site came up. So, it
appears discourse-chatbot is at the root of the problem.
Just to confirm this, I then re-enabled discourse-chatbot and performed
a "./launcher rebuild app", which failed as before. Disabling the
chatbot again and rebuilding restored the site to working.
The Admin page now reports we're running on 3.2.0.beta2-dev.
Verified that spoilers are working after being updated.
Verified that MathJax is working after being updated.
Shalmaneser is now out of action until and unless discourse-chatbot is
fixed.