Development Log 2023-09-16

2023 September 16

Made a mirror backup to Juno.

Made a backup AMI:
    Scanalyst Backup 2023-09-16 ami-0abbdc019aed923b4
        /           snap-063a73568642f7234
        /server     snap-02a20e8959480a284

The system had been up for 37 days.

Installed 62 update packages, 30 for security including a new kernel.
    super
    yum update

Stopped Discourse.
    cd ~/discourse/image
    ./launcher stop app

Rebooted.

The system came up promptly after the reboot.  We are now running on
kernel 4.14.322-246.539.amzn2.x86_64.

On the Discourse Upgrade manager page:
    https://scanalyst.fourmilab.ch/admin/upgrade

I upgraded:
    docker_manager      (This upgrade must be run first:
                        the other upgrade buttons are
                        disabled until it has been updated.)
The upgrade page, running in Chromium, hung displaying the
progress in this update.  I finally gave up, reloaded the page,
and it reported docker-manager updated successfully.

With docker_manager updated, I now proceeded to update:
    discourse-spoiler-alert
    discourse-chatbot
With the update of discourse-chatbot, a series of failures began which
ended up with the site completely down and unable to rebuild.  This
began with an out of disc condition during the rebuild.  I went through
the usual procedure of purging obsolete copies of the container to
recover space, which completed normally and freed up abundant space
for the update.

I then retried the build of discourse-chatbot, which failed.  I was
still able to get to the admin/upgrade page at this point, which
reported discourse ready to update.  I started this update, which
crashed with failures during the rebuild of discourse-chatbot.

Next, I fell back to rebuilding the entire application from the ground
up with:
    cd ~/discourse/image
    ./launcher stop app
    git pull
    ./launcher rebuild app

This crashed and sent me to try running:
    ./discourse-doctor
which promptly crashed as well.

At this point I was out of altitude, out of airspeed, and out of ideas.
I was about to give up and restore the site from the snapshots I made
at the start of the update process, but before doing that I decided
to try disabling discourse-chatbot, as it is a non-standard plug-in
and its failure to update was the first sign of the impending collapse.
I edited ~/discourse/image/containers/app.yml and commented out the
statement that includes discourse-chatbot:
    #   - git clone https://github.com/merefield/discourse-chatbot.git
and rebuilt.  This ran to completion and the site came up.  So, it
appears discourse-chatbot is at the root of the problem.

Just to confirm this, I then re-enabled discourse-chatbot and performed
a "./launcher rebuild app", which failed as before.  Disabling the
chatbot again and rebuilding restored the site to working.

The Admin page now reports we're running on 3.2.0.beta2-dev.

Verified that spoilers are working after being updated.

Verified that MathJax is working after being updated.

Shalmaneser is now out of action until and unless discourse-chatbot is
fixed.
5 Likes