2023 September 27
Made a mirror backup to the Fourmilab_Mirror external USB drive. Made a backup AMI: Scanalyst Backup 2023-09-27 ami-00dfeaa6905e18327 / snap-0bcc2cd67d245512b /server snap-03cf4c2d70db7f538 The system had been up for 10 days. Installed 10 update packages, 8 for security including a new kernel. super yum update Stopped Discourse. cd ~/discourse/image ./launcher stop app Rebooted. The system came up promptly after the reboot. We are running on the same kernel as before, 4.14.322-246.539.amzn2.x86_64. It appears that what led to the disaster rebuilding the site on 2023-09-16, necessitating the removal of Shalmaneser (the ChatGPT bot integrated using the Chatbot plug-in) was the fact that when I tried to rebuild the site to apply official updates from Discourse, the build process pulled in a version of Chatbot, under development, which was in an unstable state and this caused the rebuild of the site to collapse in a manner which did not directly point back to the root cause. This is a vulnerability in the way plug-ins in Discourse are integrated. A plug-in is added to the build by inserting a statement in the ~/discourse/image/containers/app.yml file that clones the GitHub repository in which the plug-in is implemented. This runs the risk that if the site happens to be rebuilt while the plug-in developer is in the process of updating the repository and it is in a temporarily unstable state, the partially baked code will be sucked into the build with potentially disastrous results. Today, I checked the Chatbot repository's home page: https://github.com/merefield/discourse-chatbot to see if a new version had been released that might remedy the problems I experienced and discovered that a new major update had been released. This version required additional code be added to the app.yml file, without which the site would not build. I don't know whether it was the absence of this code which caused the problems the last time or the repository being in an unstable state as these changes were being made, but in any case it looked like it was worth a try re-integrating Chatbot following the new directions. This points to a problem working with plug-ins in Discourse. Since the latest version from GitHub is automatically incorporated into a build, if changes have been made which require modifications to app.yml or other settings on the site, there isn't an obvious way for the site administrator to know without manually reviewing the GitHub documentation for each plug-in, whether or not it was deliberately being updated in the site rebuild. Anyway, I incorporated the required additional code in app.yml, re-enabled the Chatbot plug-in, and rebuilt the site. The first several attempts failed because the app.yml file is exquisitely sensitive to indentation, and some of the new code did not conform precisely to the indentation of the existing file (if you get a message like: did not find expected key while parsing a block mapping at line 95 column 3 it's probably a pissy indentation discrepancy around that line). There is a special place in Hell for people who make white space syntactically significant. It's probably in one the pits on the Seventh Circle where sinners are eternally constricted by pythons. After fixing the indentation, the site rebuilt and restarted correctly, and checking the Settings revealed that even though the site had been rebuilt without Chatbot, most of its settings had persisted and did not need to be re-entered. I had to disable the floating "chatbot quick access talk button", which setting it had apparently forgotten. This release of Chatbot has a major new feature in that it can be set to operate in "agent" mode where, as opposed to "normal" mode where it purely submits prompts to the large language model and displays the replies, as an agent it can perform queries to external data sources and incorporate that information in the prompts it sends to the language model. Three data sources are supported by agent mode, and I obtained API keys (in free mode) for each and enabled them for Chatbot. Registered for newsapi.org API key: Registration: https://newsapi.org/ First name: John Email address: REDACTED Password: REDACTED API key: REDACTED This account allows 1000 requests per "period". The account dashboard is: https://newsapi.org/account Registered for SerpApi (Google search API): Registration: https://serpapi.com/ Signed in with Google account REDACTED Verified E-mail Verified mobile phone API key: REDACTED This allows 100 searches per month. For more, you have to subscribe to a paying plan. Account dashboard is: https://serpapi.com/dashboard Registered for Marketstack (closing stock price) free API: Registration: https://marketstack.com/ E-mail: REDACTED Password: REDACTED API key: REDACTED This API allows 1000 accesses per month. The account control panel is: https://marketstack.com/dashboard Entered API keys in the Chatbot Settings items. The OpenAI model used by Shalmaneser remains GPT-4. Once the site was rebuilt and running, I built the initial embedding database which fine-tunes Shalmaneser's prompts to be aware of the content of the Scanalyst site. This is performed from an SSH login to the site with: super cd ~/discourse/image ./launcher enter app rake chatbot:refresh_embeddings[1] This takes about an hour to process the roughly 20,000 posts and comments made on the site. Embeddings are created only for posts which are visible to a user at Trust Level 1, and hence restricted posts such as meet-up access codes are not compromised by or accessible through embedding. Once initialised, new content on the site should be automatically added to the embedding database. Tested Shalmaneser in chat mode with queries requiring knowledge of posts on Scanalyst, current stock market quotes, and events in the news of recent days. Shalmaneser is no longer confined to a box with information only for events before September 2021! Note that if Shalmaneser must be disabled or removed again, before removing the code in app.yml that includes it, you must: super cd ~/discourse/image ./launcher enter app rake db:migrate:down VERSION=20230826010103 # reverses an index rename rake db:migrate:down VERSION=20230826010101 # reverses table name change rake db:migrate:down VERSION=20230820010105 # drops the index exit