2023 September 16
Made a mirror backup to Juno. Made a backup AMI: Scanalyst Backup 2023-09-16 ami-0abbdc019aed923b4 / snap-063a73568642f7234 /server snap-02a20e8959480a284 The system had been up for 37 days. Installed 62 update packages, 30 for security including a new kernel. super yum update Stopped Discourse. cd ~/discourse/image ./launcher stop app Rebooted. The system came up promptly after the reboot. We are now running on kernel 4.14.322-246.539.amzn2.x86_64. On the Discourse Upgrade manager page: https://scanalyst.fourmilab.ch/admin/upgrade I upgraded: docker_manager (This upgrade must be run first: the other upgrade buttons are disabled until it has been updated.) The upgrade page, running in Chromium, hung displaying the progress in this update. I finally gave up, reloaded the page, and it reported docker-manager updated successfully. With docker_manager updated, I now proceeded to update: discourse-spoiler-alert discourse-chatbot With the update of discourse-chatbot, a series of failures began which ended up with the site completely down and unable to rebuild. This began with an out of disc condition during the rebuild. I went through the usual procedure of purging obsolete copies of the container to recover space, which completed normally and freed up abundant space for the update. I then retried the build of discourse-chatbot, which failed. I was still able to get to the admin/upgrade page at this point, which reported discourse ready to update. I started this update, which crashed with failures during the rebuild of discourse-chatbot. Next, I fell back to rebuilding the entire application from the ground up with: cd ~/discourse/image ./launcher stop app git pull ./launcher rebuild app This crashed and sent me to try running: ./discourse-doctor which promptly crashed as well. At this point I was out of altitude, out of airspeed, and out of ideas. I was about to give up and restore the site from the snapshots I made at the start of the update process, but before doing that I decided to try disabling discourse-chatbot, as it is a non-standard plug-in and its failure to update was the first sign of the impending collapse. I edited ~/discourse/image/containers/app.yml and commented out the statement that includes discourse-chatbot: # - git clone https://github.com/merefield/discourse-chatbot.git and rebuilt. This ran to completion and the site came up. So, it appears discourse-chatbot is at the root of the problem. Just to confirm this, I then re-enabled discourse-chatbot and performed a "./launcher rebuild app", which failed as before. Disabling the chatbot again and rebuilding restored the site to working. The Admin page now reports we're running on 3.2.0.beta2-dev. Verified that spoilers are working after being updated. Verified that MathJax is working after being updated. Shalmaneser is now out of action until and unless discourse-chatbot is fixed.