Development Log 2023-09-27

2023 September 27

Made a mirror backup to the Fourmilab_Mirror external USB drive.

Made a backup AMI:
    Scanalyst Backup 2023-09-27 ami-00dfeaa6905e18327
        /           snap-0bcc2cd67d245512b
        /server     snap-03cf4c2d70db7f538

The system had been up for 10 days.

Installed 10 update packages, 8 for security including a new kernel.
    super
    yum update

Stopped Discourse.
    cd ~/discourse/image
    ./launcher stop app

Rebooted.

The system came up promptly after the reboot.  We are running on the
same kernel as before, 4.14.322-246.539.amzn2.x86_64.

It appears that what led to the disaster rebuilding the site on
2023-09-16, necessitating the removal of Shalmaneser (the ChatGPT bot
integrated using the Chatbot plug-in) was the fact that when I tried to
rebuild the site to apply official updates from Discourse, the build
process pulled in a version of Chatbot, under development, which was in
an unstable state and this caused the rebuild of the site to collapse
in a manner which did not directly point back to the root cause.

This is a vulnerability in the way plug-ins in Discourse are
integrated.  A plug-in is added to the build by inserting a statement
in the ~/discourse/image/containers/app.yml file that clones the
GitHub repository in which the plug-in is implemented.  This runs the
risk that if the site happens to be rebuilt while the plug-in developer
is in the process of updating the repository and it is in a temporarily
unstable state, the partially baked code will be sucked into the build
with potentially disastrous results.

Today, I checked the Chatbot repository's home page:
    https://github.com/merefield/discourse-chatbot
to see if a new version had been released that might remedy the
problems I experienced and discovered that a new major update had
been released.  This version required additional code be added to the
app.yml file, without which the site would not build.  I don't know
whether it was the absence of this code which caused the problems the
last time or the repository being in an unstable state as these changes
were being made, but in any case it looked like it was worth a try
re-integrating Chatbot following the new directions.

This points to a problem working with plug-ins in Discourse.  Since the
latest version from GitHub is automatically incorporated into a build,
if changes have been made which require modifications to app.yml or
other settings on the site, there isn't an obvious way for the site
administrator to know without manually reviewing the GitHub
documentation for each plug-in, whether or not it was deliberately
being updated in the site rebuild.

Anyway, I incorporated the required additional code in app.yml,
re-enabled the Chatbot plug-in, and rebuilt the site.  The first
several attempts failed because the app.yml file is exquisitely
sensitive to indentation, and some of the new code did not conform
precisely to the indentation of the existing file (if you get a
message like:
    did not find expected key while parsing a block mapping at line 95 column 3
it's probably a pissy indentation discrepancy around that line).  There
is a special place in Hell for people who make white space
syntactically significant.  It's probably in one the pits on the
Seventh Circle where sinners are eternally constricted by pythons.

After fixing the indentation, the site rebuilt and restarted correctly,
and checking the Settings revealed that even though the site had been
rebuilt without Chatbot, most of its settings had persisted and did
not need to be re-entered.  I had to disable the floating "chatbot
quick access talk button", which setting it had apparently forgotten.

This release of Chatbot has a major new feature in that it can be set
to operate in "agent" mode where, as opposed to "normal" mode where it
purely submits prompts to the large language model and displays the
replies, as an agent it can perform queries to external data sources
and incorporate that information in the prompts it sends to the
language model.  Three data sources are supported by agent mode, and
I obtained API keys (in free mode) for each and enabled them for
Chatbot.

Registered for newsapi.org API key:
    Registration:   https://newsapi.org/
    First name:     John
    Email address:  REDACTED
    Password:       REDACTED
    API key:        REDACTED
This account allows 1000 requests per "period".  The account dashboard
is:
    https://newsapi.org/account

Registered for SerpApi (Google search API):
    Registration:   https://serpapi.com/
    Signed in with Google account REDACTED
    Verified E-mail
    Verified mobile phone
    API key:    REDACTED
This allows 100 searches per month.  For more, you have to subscribe to
a paying plan.  Account dashboard is:
    https://serpapi.com/dashboard

Registered for Marketstack (closing stock price) free API:
    Registration: https://marketstack.com/
    E-mail:     REDACTED
    Password:   REDACTED
    API key:    REDACTED
This API allows 1000 accesses per month.  The account control panel is:
    https://marketstack.com/dashboard

Entered API keys in the Chatbot Settings items.

The OpenAI model used by Shalmaneser remains GPT-4.

Once the site was rebuilt and running, I built the initial embedding
database which fine-tunes Shalmaneser's prompts to be aware of the
content of the Scanalyst site.  This is performed from an SSH login to
the site with:
    super
    cd ~/discourse/image
    ./launcher enter app
    rake chatbot:refresh_embeddings[1]
This takes about an hour to process the roughly 20,000 posts and
comments made on the site.  Embeddings are created only for posts which
are visible to a user at Trust Level 1, and hence restricted posts such
as meet-up access codes are not compromised by or accessible through
embedding.  Once initialised, new content on the site should be
automatically added to the embedding database.

Tested Shalmaneser in chat mode with queries requiring knowledge of
posts on Scanalyst, current stock market quotes, and events in the news
of recent days.  Shalmaneser is no longer confined to a box with
information only for events before September 2021!

Note that if Shalmaneser must be disabled or removed again, before
removing the code in app.yml that includes it, you must:
    super
    cd ~/discourse/image
    ./launcher enter app
    rake db:migrate:down VERSION=20230826010103 # reverses an index rename
    rake db:migrate:down VERSION=20230826010101 # reverses table name change
    rake db:migrate:down VERSION=20230820010105 # drops the index
    exit
3 Likes