Fourmilab Web Server Amazon AWS Linux 2023 Migration System Narrative

Here we go again. After migrating the agora.fourmilab.server and the Scanalyst server to the recently-released Amazon Web Services (AWS) Linux 2023 platform, it’s time for the Big Gulp: migration of the main Fourmilab server, with more than 90 gigabytes of data and heritage that goes back to its first step onto the Web back at the end of 1993. There is a lot of “legacy” stuff here, and that means it’s prone to being torpedoed by “new and improved” innovations in the underlying platform.

Let’s see how it goes.

3 Likes

2023 October 18

Began the campaign to migrate the server to Amazon AWS Linux 2023.
The general outline of the procedure will be the same as that for
Agora, which served as the pathfinder for migration from Linux 2 to
2023.

Cycled HTTP log file.  The most recent archived log is cycle 460.

Server file systems at the start of the migration are:
    /dev/nvme0n1p1  8.0G  4.9G  3.2G  62% /
    /dev/nvme1n1    126G   91G   31G  75% /server

Made a mirror backup to Juno.

Made a backup AMI:
    Fourmilab Backup 2023-10-18  ami-0dbb034bbccd8c086
        /           snap-0c6bf35e3b5f5f3c1
        /server     snap-042f0ac5270f04a2e

Applied all updates.  This installed 90 updates, 66 for
security, including a new kernel.  Rebooted.  The system had
been up for 68 days.

The system came up normally after the reboot.  We are now
running on kernel 4.14.326-245.539.amzn2.x86_64.

Made volumes from the snapshots of the previously made backup AMI:
        /           snap-0c6bf35e3b5f5f3c1  vol-0d89a00d65ca7e5a2
        /server     snap-042f0ac5270f04a2e  vol-062b0b1ab9533aa2d
These were all created in Availability Zone eu-central-1b.  The volume
type of the old root file system was made as gp2, as before, but I
upgraded the /server volume to the new gp3 storage type.  I increased
the size of the /server volume from 128 Gb to 192 Gb to accommodate
growth in the site, which currently occupies 91 Gb, filling around 75%
of the /server file system with a freshly-trimmed log file.

Named the volumes as follows:
    vol-0d89a00d65ca7e5a2   Fourmilab old root L2023
    vol-062b0b1ab9533aa2d   Fourmilab server L2023

Not it's time to begin the migration in earnest.

Created a new instance with:
    AMI:            Amazon Linux 2023 AMI 2023.2.20231016.0 x86_64 HVM kernel-6.1
                    ami-0fb820135757d28fd
Clicked "Launch instance from AMI":
    Name:           Fourmilab L2023
    Instance type:  t3.medium
    Key pair name:  AmazonAWS1
    Instance details:
        Network:    (default)
        Subnet:     eu-central-1b
        Auto-assign IPv6 IP: Enable
        Firewall: Existing security group, launch-wizard-1, sg-05763c6c
        All other: (default)
    Storage:
        Root    /dev/xvda   snap-03a9fa3ce1ae4ee80  8 Gb
        (Other volumes will be attached after the instance is created.)
    Tags:
        Name:   Fourmilab L2023
Selected Launch.

New instance was created as i-001f3b098ccbd4966 with:
    IPv4 public address:    3.72.109.21
    IPv6 address:           2a05:d014:d43:3101:94aa:a276:e035:6a2a
    /dev/xvda   vol-0ec1381fa1d17a317       Fourmilab root L2023

Made an /etc/hosts entry on Hayek:
    3.72.109.21     aws2
to reduce the amount of typing in system configuration.

Logged in with:
    $ ssh -i AmazonAWS1.pem ec2-user@aws2
    X11 forwarding request failed on channel 0
       ,     #_
       ~\_  ####_        Amazon Linux 2023
      ~~  \_#####\
      ~~     \###|
      ~~       \#/ ___   https://aws.amazon.com/linux/amazon-linux-2023
       ~~       V~' '->
        ~~~         /
          ~~._.   _/
             _/ _/
           _/m/'

Ran:
    sudo su
    dnf update
which reported nothing to update.

uname -a reports:
    Linux ip-172-31-25-145.eu-central-1.compute.internal 6.1.55-75.123.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 26 20:06:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Attached volumes to instance  i-001f3b098ccbd4966:
    vol-062b0b1ab9533aa2d   Fourmilab Server L2023      /dev/sdb
    vol-0d89a00d65ca7e5a2   Fourmilab old root L2023    /dev/sdc

Created a mount points and mounted /dev/sdb and /dev/sdc.
    sudo su
    mkdir /server
    fsck -f /dev/sdb
    mount /dev/sdb /server

    mkdir /o
    mount /dev/sdc1 /o
    umount /o
    #   Mounting and unmounting applies the journal before
    #   running the consistency check.
    xfs_repair -n /dev/sdc1
    mount -o ro /dev/sdc1 /o

We now see:
    df -h
    /dev/nvme0n1p1    8.0G  1.5G  6.5G  19% /
    /dev/nvme1n1      126G   91G   31G  75% /server
    /dev/nvme2n1p1    8.0G  4.9G  3.2G  61% /o
    grep nvme /etc/mtab
    /dev/nvme0n1p1 / xfs rw,seclabel,noatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=1024,noquota 0 0
    /dev/nvme1n1 /server ext4 rw,seclabel,relatime 0 0
    /dev/nvme2n1p1 /o xfs ro,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0

This allows us to access the root file system of the old system by
prefixing the path name with /o, for example cat /o/etc/hosts.

Added:
    /dev/sdb    /server     ext4    defaults        1   2
    /dev/sdc1   /o          xfs     ro              0   2
to /etc/fstab.  The /server file system is ext4, inherited from the
old system.

Set /etc/hostname to "fourmilab".

Resized the /server file system to occupy the entire larger storage we
allocated to it.
    umount /server
    e2fsck -f /dev/sdb
    resize2fs /dev/sdb
    mount /server
Note that resize2fs detects the size of the device and uses it all
without having to be told.  We now see in df-h:
    /dev/nvme1n1      189G   91G   91G  50% /server
If it had been an xfs file system, we'd have used xfs_growfs instead of
resize2fs.

Rebooted to make sure it was re-mounted.  It was.

The system came up with the hostname changed and /server and /o mounted.

Added accounts to /etc/passwd:
    REDACTED
...and to /etc/shadow:
    REDACTED

Added corresponding entries to /etc/group:
    REDACTED
and /etc/gshadow:
   REDACTED

Now I can:
    su - kelvin
and get to my /server/home/kelvin directory.

And, now I can log into my account from Hayek with ssh, since my
ssh keys are present in the home directory on /server.

Added kelvin to /etc/sudoers.d/90-cloud-init-users to permit
sudo without a password.  Note that this file used to be called
cloud-init on the old Linux AMI.
    # User rules for kelvin
    kelvin   ALL=(ALL) NOPASSWD:ALL

Installed our magic /bin/super utility.  I simply copied:
    cp -p /o/bin/super /bin
Tested it and it works.

Transferred the /root/.ssh/authorized_keys file from AWS to
linux2, saving the original as authorized_keys_ORIGINAL.

Edited /etc/ssh/sshd_config and set:
    PermitRootLogin yes

Restarted:
    systemctl restart sshd

Now I can log in as root from local machines without a
password.  Verified that regular user logins continue to work.
This will allow a mirror backup from Juno.

Rebooted to confirm that all of the configuration and
permission changes so far persist.

Installed:
    dnf install xauth
in the hope this will allow X11 forwarding on SSH logins.  After
logging out and back in, it re-created ~/.Xauthority and now X11
tunnelling works.

Installed:
    dnf install gtk3-devel
    # This installed 147 dependencies.
    dnf install gcc
    dnf install gcc-c++
    dnf install intltool

This permitted building and installing Geany, which had been downloaded
from:
    https://www.geany.org/download/releases/
into ~/linuxtools/geany-1.38 and I re-built with:
    ./configure
    make
    super
    make install
This installs in the /usr/local directory tree.  Since Geany is
not available as an AWS package, this locally-built version will
not be automatically updated by the dnf package manager
(although all of its dependencies will be).  But then Geany is a
stable package which changes only very slowly.  After the install,
Geany now works, demonstrating that X11 tunnelling is also working.

Installed:
    super
    dnf install "perl(JSON)"
    dnf install "perl(CGI)"
This is required by "credit".  Copied over, from Hayek:
    scp -p sc:~/bin/aws_stats.pl aws2:~/bin/
which was patched to fix a torpedo under the water line by Linux 2023,
and now "credit" works.

Finally, we're ready to start installing actual components of the
server.

Installed:
    super
    dnf install httpd mariadb105-server

Started the httpd:
    systemctl start httpd

The HTTPD test page came up properly at:
    http://aws2/

Set httpd to start at boot time:
    systemctl enable httpd
    systemctl is-enabled httpd

Linked /var/www to our Web home directory on /server:
    cd /var/www
    mv html html.ORIG
    ln -s /server/pub/www.fourmilab.ch/web html
    chown root:apache html

Now we can get to the static content of the site.

Changed ownership:
    chown kelvin:apache /server/pub/www.fourmilab.ch/web
I've been meaning to do this for a long time.  There's
no reason you need to be root to create a new top level
directory under ~/web.  It keeps getting set back to root
by tar extracts containing root material.

Created a ~/web/.htaccess file on aws2 to keep Web crawlers
from poking around in our clone of the production server
database.  Access is restricted to the Fourmilab LAN.

Set MariaDB to use our database directory on /server by editing
/etc/my.cnf and setting:
    [mysqld]
    datadir=/server/var/mysql
at the start of the file.

Started MariaDB with:
    systemctl start mariadb
and
    systemctl status mariadb
reports it running.

And, at this point, we fell into a multi-hour Hell of incompatibility,
where MariaDB doesn't like anything about our trivially simple
configuration where we direct the location of the database directory
to our /server/var/mysql directory.  I will not burden you with the
many blind alleys and misdirections in online tutorials, but simply,
at this point, remind readers that if something wants you to install
MySQL or MariaDB on your server, get a good night's sleep, then wake up
refreshed and say "Hell, no!".

I am going to get a good night's sleep.
1 Like

Somebody is living dangerously. If you want convenient access to root functionality, you should add your SSH public key to /root/.shh/authorized_keys and connect directly through SSH. That is,

ssh -l root some.domain.example.com

It is much more secure than allowing password-less sudo or using a hacky privileged binary.

2 Likes

The only way to log on to this system is via ssh with a private key on my local machine. It does not support telnet or login by password for any account: only private key. If my local machine is compromised, the attacker would be able to obtain not only my private key but also the root login private key used for automated administration.

Given that, I don’t consider no password sudo an added risk. Note also that every AWS Linux distribution has a built-in user, ec2-user, which is by default set for no password sudo. I doubt many people bother to turn that off, so that’s the most obvious means of attack if somebody gets into such a system.

Eww!

Point taken.

1 Like

Do you not use ssh-agent to moderate access, so that the on-disk form of your private key is always encrypted?

(Then your local machine has to be compromised while you are using it, and compromised within the user session that has access to the agent.)

2 Likes

No. The firewall here blocks all external access via telnet and allows inbound ssh only based upon a separate authentication token to open the port to a specific machine here. I rely upon the firewall rather than trying to secure every machine behind it independently.

1 Like

2023 October 19

Decided to set aside the assault on MariaDB (which is used only by
Movable Type for Fourmilog and the original Scanalyzer) and proceed
with bringing up other site infrastructure.

    dnf install netpbm
    dnf install ImageMagick-devel
    dnf install ImageMagick-perl
    dnf install perl-CPAN

Configured Perl CPAN installer with:
    perl -MCPAN -e "shell"
then performed:
        o conf init pushy_https
        Do you want to turn the pushy_https behaviour on? [yes] yes
        o conf commit
        exit

    perl -MCPAN -e "shell"
then performed:
        install Term::ReadLine::Perl
        install Bundle::CPAN
        reload cpan
reconfigured, followed by:
        o conf commit
        exit

Installed Perl module packages:
    dnf install \
        > perl-Compress-Zlib \      # Already installed
        > perl-Convert-ASN1 \
        > perl-DateManip \
        > perl-DBD-MySQL \
        > perl-DBI \                # Already installed
        > perl-HTML-Parser \        # Already installed
        > perl-HTML-Tagset \        # Installed by perl-HTML-Parser
        > perl-libwww-perl \
        > perl-libxml-perl \
        > perl-SGMLSpm \
        > perl-URI \                # Already installed
        > perl-XML-Dumper \
        > perl-XML-NamespaceSupport \
        > perl-XML-Parser \         # Installed by perl-libxml-perl
        > perl-XML-SAX

Installed prerequisites for Perl packages:
    dnf install gd                  # Already installed
    dnf install gd-devel

Installed dependencies for assorted Web services:
    dnf install ncurses-devel
    dnf install perl-GD
    dnf install libxml2-devel       # Already installed
    dnf install openssl-devel
    dnf install flex
    dnf install mailx
    dnf install aspell
    dnf install gnuplot
#   dnf install rdist               # Removed.  Use rsync instead
    dnf install pigz
    dnf install sendmail-cf
    dnf install gcc-gfortran

Installed support for HTTPS access:
    dnf install mod_ssl

Deleted ~/bin/nedit.  The OpenMotif libraries it requires are no longer
supported by standard AWS, and the pain in building them from scratch
isn't worth it to run this vintage editor.

Installed Perl modules accessible with dnf:
    dnf install "perl(LWP)"         # Already installed
    dnf install "perl(GD)"          # Already installed
    dnf install "perl(Digest::SHA1)" # Already installed
    dnf install "perl(Time::Local)" # Already installed
    dnf install "perl(XML::LibXML)"
    dnf install "perl(Sys::Syslog)" # Already installed
    dnf install "perl(CGI)"         # Already installed
    dnf install "perl(Getopt::Long)" # Already installed
    dnf install "perl(Encode)"      # Already installed
    dnf install "perl(Digest::MD5)" # Already installed

Installed Perl modules accessible only through CPAN:
    perl -MCPAN -e "shell"
        install Crypt::OpenSSL::AES
        install Crypt::CBC
        install Crypt::SSLeay       # Not clear we need this any more
                                    # due to changes in LWP.

At this point, I believe we have all of the dependencies
installed.

Copied over the existing:
    /etc/httpd
        conf/httpd.conf
        conf.d/
            fourmilab_0.conf
            fourmilab_Aliases.conf
            fourmilab_Hosting.conf
            ssl.conf
renaming pre-existing files _ORIGINAL.  Restarted HTTPD.
HTTPD crashed because the certificate files cited in ssl.conf
aren't present.  We have something of a chicken and egg problem
because we haven't yet installed Let's Encrypt, which we can't
do until we're running under the domain name for which we're
issuing the certificate.

The installation procedure for Certbot has changed substantially:
    https://certbot.eff.org/instructions?ws=apache&os=pip
We start with:
    dnf install python3             # Already installed
    dnf install augeas-libs
Install Python virtual environment:
    python3 -m venv /opt/certbot/
    /opt/certbot/bin/pip install --upgrade pip
Install Certbot in virtual environment:
    /opt/certbot/bin/pip install certbot certbot-apache
Link certbot in virtual environment to /usr/bin:
    ln -s /opt/certbot/bin/certbot /usr/bin/certbot
Run certbot in certificate-only mode:
    certbot certonly --apache

And now the chicken starts pecking us from within the egg:
    Error while running apachectl configtest.
    AH00526: Syntax error on line 129 of /etc/httpd/conf.d/ssl.conf:
    SSLCertificateFile: file '/etc/letsencrypt/live/fourmilab.ch/fullchain.pem' does not exist or is empty
    Error while running systemctl restart httpd.
    Job for httpd.service failed because the control process exited with error code.
    See "systemctl status httpd.service" and "journalctl -xeu httpd.service" for details.
    The apache plugin is not working; there may be problems with your existing configuration.
Apache won't start because the letsencrypt certificate directory is
missing, and certbot won't run because it can't restart Apache.

To try to Band-Aid this to permit further testing, I just bodily copied
over the entire /etc/letsencrypt directory from the production server
and installed it here.
    cp -pRv /o/etc/letsencrypt /etc
Now I can:
    systemctl start httpd
and it starts successfully, reporting:
    Server configured, listening on: port 443, port 80
I can still access static site content via:
    http://aws2/
Attempting:
    https://aws2/
puts up a warning:
    NET::ERR_CERT_COMMON_NAME_INVALID
which is correct since we're not running within the domain for which
the certificate was issued.  If I click through the warning, I can
then get to the site.

We still require a symbolic link from the cgi-bin
directory in /var/www.  I added:
    super
    cd /var/www
    mv cgi-bin cgi-bin.ORIG
    ln -s /server/bin/httpd/cgi-bin cgi-bin
and now Earth and Moon Viewer works.  These symbolic links
are really tacky, and were only intended as a stopgap when
we were bringing up the site.  We should modify the Apache
configuration files to point directly to the directories
in /server, eliminating the need for them.  But I'll defer
that until we're running stably on the new platform.

After copying over the /etc/letsencrypt directory tree, I can run:
    super
    certbot renew
and it runs normally and reports:
    The following certificates are not due for renewal yet:
      /etc/letsencrypt/live/fourmilab.ch/fullchain.pem expires on 2024-01-13 (skipped)
    No renewals were attempted.
It appears everything is properly configured for Certbot.

Festival (text to speech) failed because it was linked to an obsolete
shared library.  I rebuilt it with:
    cd /server/src/festival-2.4
    cd speech-tools
    ./configure
    make
    cd ../festival
    ./configure
    make
The CGI scripts run these programs directly from the bin directories
within this source build tree.  After the rebuild, the:
    https://aws2/cgi-bin/SayJulian
    https://aws2/cgi-bin/SayTime
utilities worked.

For SayQuote, installed:
    super
    perl -MCPAN -eshell
    install Finance::YahooQuote
Now SayQuote works mechanically through Festival, but doesn't include
the actual quotes because of another gratuitous change by Yahoo.  But
this fails the same way on the production server, so it's not a
migration issue.

Tested cursorily to verify there aren't problems which
require recompiling CGI binaries or Perl library problems.
    Earth and Moon Viewer
    Solar System Live
    Your Sky (Map, Horizon, and Telescope)
    Bombcalc

Installed:
    dnf install cronie
This is the user-side crontab tools, which used to be installed
out of the box on Amazon Linux 2.

Installed my crontab.

Installed root crontab.

Trying to start local servers, bacula and waisserver failed due to
referencing obsolete shared libraries.

Rebuilt waisserver:
    cd /server/src/wais/freeWAIS-sf-2.2.14_Fourmilab_a
    ./BuildFourmilab
The build fails with dozens of build errors.
    /usr/bin/ld: ../lib/libwais.a(stoplist.o):(.bss+0x28): multiple definition of `use_both_stoplist'; waisindex.o:(.bss+0x10): first defined here
    /usr/bin/ld: ../lib/libwais.a(irhash.o):(.bss+0x18): multiple definition of `use_both_stoplist'; waisindex.o:(.bss+0x10): first defined here
Patched:
    /server/src/wais/freeWAIS-sf-2.2.14_Fourmilab_a/freeWAIS-sf-2.2.14/lib/ir/stoplist.h
    to declare: extern boolean use_both_stoplist;
and added to:
    /server/src/wais/freeWAIS-sf-2.2.14_Fourmilab_a/freeWAIS-sf-2.2.14/lib/ir/stoplist.c
    declaration of: boolean use_both_stoplist = true;
Now all of the modules built successfully.  Tried:
    /server/init/wais start
and it says the server started successfully.

Test of WAIS search in the Internal Revenue Code database failed with:
    /var/www/cgi-bin/TaxSearch: dump() must be written as CORE::dump()
    as of Perl 5.30 at /var/www/cgi-bin/TaxSearch line 2221.,
    referer: https://aws2/ustax/TaxSearch.html
Fixed that and now it dies with:
     Can't locate getopts.pl in @INC at /var/www/cgi-bin/TaxSearch line 2003
Well, it turns out that starting with Perl 5.26, the current directory
is no longer on the module include search path.  I added a statement to
the start of /server/bin/httpd/cgi-bin/TaxSearch:
    use lib '/server/bin/httpd/cgi-bin';
and now the thing can find its getopts.pl and the search works.

Of course, the same two fixes have to be made to ~/cgi/T8Search as
well.  After the patches, it also works.

The same Perl library search was what torpedoed ~/cgi/HackDiet below
the water line.  I added a statement:
    use lib "/server/bin/httpd/cgi-bin";
before the first reference to an HDiet:: module and it's now working.

Attempted to rebuild the Bacula file daemon to fix the library
incompatibility in the binary we copied from the production system:
    cd /server/src/bacula/bacula-5.2.10
    ./BuildFourmilabClient
This collapsed in a hideous efflorescence of error messages compiling
the file src/lib/crypto.c, with G++'s trademark incomprehensibility
such as:
    error: expected constructor, destructor, or type conversion before 'IMPLEMENT_STACK_OF'
    error: field 'ctx' has incomplete type 'EVP_MD_CTX' {aka 'evp_md_ctx_st'}
Now, this was very puzzling, since a few days ago I had built a Bacula
file daemon for the Scanalyst server from precisely the same source code
with no problems at all.  I eventually resorted to saving transcripts of
the build on both the Scanalyst and aws2 sites and diffing them, and
discovered the clue:
    195,196c201
    < checking for OpenSSL... yes
    < checking for EVP_PKEY_encrypt_old in -lcrypto... yes
    ---
    > checking for OpenSSL... no
So, on aws2 we have OpenSSL installed, since it's required to build
several Perl modules we use for various things, but on the Scanalyst
site, where most of the functionality is embedded in the Docker image
of Discourse, we had no reason to install it.  The mere presence of
OpenSSL on the system, however, was enough to cause Bacula to try to
compile features in the File Daemon to support encrypting the
transmission of backups, and the code to handle this wasn't up to date
with the incessant changes in the OpenSSL programming interface in the
interest of purity of essence and other goals which transcend mere
compatibility, stability, and protection of users' investment.  The fix
was to add the option:
    --without-openssl
to the ./configure run in:
    /server/src/bacula/bacula-5.2.10/BuildFourmilabClient
which excludes OpenSSL from the build even if happens to be installed
on the system performing the build.  With this specified, it worked
just fine, installed, and now I'm able to start the Bacula file
daemon with no problems.

HotBits requests work, so the HotBits proxy server doesn't
need to be rebuilt.  But, just to prevent library incompatibilities in
the future and be sure we can, I rebuilt it anyway with no problems.

Started local servers with:
    /server/init/servers start
They all appear to be running.

Rebuilt Webalizer, after verifying this was the current version, with:
    cd /server/src/webalizer/webalizer-2.23-08
    ./BuildFourmilab
    ./InstallFourmilab
and now it runs.  Note that this will have to be re-done on the
production /server once we cut over.  I'm doing it now to be sure we
have everything installed we need to rebuild it. Deleted the
now-obsolete webaizer/webalizer-2.21-02 from /server/src and
server/bin.

Ran a webalizer job for the main site and a hosted site to
confirm they work.
    /server/bin/webalizer/current/DailyUpdate
    /server/pub/hosting/fondation.lignieres.org/statistics/UpdateStats
Both jobs ran OK.

Rebuilt units:
    cd /server/src/units/units-2.19
    ./BuildFourmilab
to fix a shared library incompatibility.

Rebuilt Your Sky:
    cd /server/src/yoursky/yoursky-2.6
    ./BuildFourmilab
to verify library compatibility and buy time until the next torpedo.

Rebuilt UnCGI:
    cd /server/src/uncgi/uncgi-1.11
    ./BuildFourmilab
to verify library compatibility and buy time until the next torpedo.

Rebuilt Terranova:
    cd /server/src/terranova/terranova-2.1
    ./BuildFourmilab
Tested with:
    /server/cron/TerraNova
Terranova worked, but the job failed with:
    pnmtopng: error while loading shared libraries: libpng15.so.15: cannot open shared object file: No such file or directory
...shared library Hell again.  It turns out this failed because of a
shared library problem in the local copy of Netpbm we maintain due to
submerged magnetic mines in earlier distribution versions of NetPBM.
Let's rebuild *that*:
    cd /server/src/netpbm/netpbm-10.73.20
    ./BuildFourmilab
    #   Note that you'll have to answer a lot of questions here.  Most
    #   can accept defaults, but you should specify static linking to
    #   avoid library path Hell when these are run from CGI tasks or
    #   cron jobs, and you should install in the corresponding bin
    #   directory, in this case /server/bin/netpbm/netpbm-10.73.20.
And now, the Terranova cron job runs to completion and the output is
correct.

Deleted the obsolete:
    /server/src/netpbm-10.35.97
    /server/bin/netpbm-10.35.97
directories.

Installed the system:
    dnf install netpbm-progs
The "netpbm" package installs only libraries, not the executables
in /usr/bin.  This also installs Ghostscript.

In the quest for library compatibility, attempted to rebuild:
    cd /server/src/barcode
    ./BuildFourmilab
which crashed due to the latest GNU fad in anti-compatibility munitions,
banning multiple C exports of the same name being treated like Fortran
COMMON BLOCKs and being mapped to the same memory location.  This has
been a feature of C since the early 1970s, but now it has been
proscribed by the high priests of sacred wildebeest, and all must now
comply with their fat-headed fatwas.  So, I patched:
    /server/src/barcode/barcode-0.99
        barcode.h       Declare "streaming" as extern
        library.c       Declare exported "int streaming"
and, Bob's your (creepy) uncle, it builds and works.  I verified that:
    http://aws2/cgi-bin/ISBNquest?isbn=9780804139298&delim=-&assoc=fourmilabwwwfour&asite=www.amazon.com
can generate a bar code for the book it's investigating.

Changed permissions:
    chown kelvin:wheel /server/bin/httpd/cgi-bin
Something (probably tar extracts) keeps setting back to root:root, which
is intensely irritating.

Rebuilt Earth and Moon Viewer:
    cd /server/src/earthview/earthview-3.0
    ./BuildFourmilab
This was in the interest of library compatibility.

Rebuilt Solar System Live:
    cd /server/src/solar/solar-2.4
    ./BuildFourmilab
It works.

Deleted the following directories containing obsolete versions of
server components imported from the production server.
    /server/src/earthview
        earthview-2.8
        earthview-2.7
        earthview-2.6
        earthview-2.5
        earthview-2.4
    /server/src/hotbits
        hotbits-3.11
        hotbits-3.9
        hotbits-3.8
        hotbits-3.7
        hotbits-3.6
        hotbits-3.5
        hotbits-3.4
        hotbits-3.3
    /server/bin/netpbm
        netpbm-10.35.97
    /server/src/solar
        solar-2.3

I shall defer the next assault on the citadel of MariaDB until "no earlier than" to-morrow.
3 Likes

2023 October 20

The Stratified Bible wasn't working because we'd neglected to install
the Mersenne Twister Perl modules it requires.

Installed:
    perl -MCPAN -e "shell"
    install Math::Random::MT
    install Math::Random::MT::Auto
    exit

After these fixes, kaboom again:
    [Fri Oct 20 01:35:42.144191 2023] [cgid:error] [pid 193286:tid 193451]
    [client 193.8.230.147:52830] AH01215: stderr from /var/www/cgi-bin/BibleStrat:
    Experimental unshift on scalar is now forbidden at
    /var/www/cgi-bin/BibleStrat line 167, near ""KJV")", referer:
    http://aws2/etexts/www/BibleStrat/
    etc.
Fixed Perl quibbles in ~/cgi/BibleStrat, adding at signs on push()
functions.  Now it works.  Has the G++ priesthood taken over syntacic
enforcement in Perl?

Integrated the local code for IPv6 tunnelling to the instance's private
IPv4 address and MASQUERADE_AS into /etc/mail/sendmail.mc.
Rebuilt with:
    ./make
and restarted:
    systemctl restart sendmail
There were no error messages in /var/log/maillog.

With the sendmail configuration in place, mail from aws2 is delivered
but dropped into the spam bucket by Gmail on receipt, presumably
because we're not sending from an SPF-approved IP address.  Once we
attach the Fourmilab Elastic IP address mail delivery should work
correctly.

Well, it wasn't enough that systemd completely messed up the starting
and stopping of system services, so now they've gang raped system log
files as well, turning them into something resembling the Windows 3.1
registry called "journalctl".  Here's the /var/log/README explanation.
    You are running a systemd-based OS where traditional syslog has
    been replaced with the Journal. The journal stores the same (and
    more) information as classic syslog. To make use of the journal and
    access the collected log data simply invoke "journalctl", which
    will output the logs in the identical text-based format the syslog
    files in /var/log used to be. For further details, please refer to
    journalctl(1).

Performed an audit of the entire development log for the server to see
if I'd missed any Perl modules,  The only one I found was:
    perl -MCPAN -e"shell"
    install URI::Encode

Digging back into the mystery of why Mariadb won't let the movabletype
user log in, further excavation reveals that despite my setting:
    datadir=/server/var/mysql
in /etc/my.cnf, it's still using the original directory of
/var/lib/mysql which, of course, has nothing other than a blank
configuration.  This can be seen by:

    mysql -u root -p
    Enter password: REDACTED
    select @@datadir;
    +-----------------+
    | @@datadir       |
    +-----------------+
    | /var/lib/mysql/ |
    +-----------------+
    1 row in set (0.000 sec)

To see if Mariadb was reading my.cnf or possibly getting its
configuration from somewhere else, I added a gibberish statement to it
and when I started Mariadb, it failed with:
    mariadb.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
so it looks like it is using that file.

According to this document:
    https://www.tecmint.com/change-default-mysql-mariadb-data-directory-in-linux/
on systems running SELinux, you must also do:
    semanage fcontext -a -t mysqld_db_t "/server/var/mysql(/.*)?"
    restorecon -R /server/var/mysql
Both of these commands executed with no errors.

Started mariadb and, guess what, it's still using /var/lib/mysql/.
Checked the systemd service configuration file:
    /usr/lib/systemd/system/mariadb.service
and it says nothing about the systemdir.  The systemctl status says the
daemon commmand line is:
    /usr/libexec/mariadbd --basedir=/usr
Tried manually starting mariadb with the datadir specified on the
command line:
    super
    su - mysql
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql
and now it appears to be using /server/var/mysql, as log and
data files in that directory were updated to the current time.  When I
try to log in to either the root or movabletype accounts, however, I
get:
    ERROR 1275 (HY000): Server is running in --secure-auth mode, but
    'root'@'localhost' has a password in the old format; please change
    the password to the new format
Well, isn't that special.

Next, I tried restarting the daemon with:
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql \
        --old-passwords=TRUE --secure-auth=FALSE &
and now when I try to log in, I get:
    ERROR 1045 (28000): Access denied for user 'root'@'localhost'
        (using password: YES)

Then I tried starting the daemon with:
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql \
        --old-passwords=TRUE --secure-auth=FALSE --skip-grant-tables=TRUE &
This started OK, and now I can log in with:
    mysql -u root -p
    Enter password: [RETURN]
and:
    show databases;
gives us:
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | movabletype        |
    | mysql              |
    | test               |
    +--------------------+
    4 rows in set (0.001 sec)
which indicates we're in the right datadir.  Now try:
    SET PASSWORD FOR 'root'@'localhost' = PASSWORD('');
and get:
    ERROR 1290 (HY000): The MariaDB server is running with the
        --skip-grant-tables option so it cannot execute this statement
Glorious.  Now we end up at this document:
    https://www.digitalocean.com/community/tutorials/how-to-reset-your-mysql-or-mariadb-root-password
and try:
    FLUSH PRIVILEGES;
which brings forth:
    ERROR 1146 (42S02): Table 'mysql.servers' doesn't exist
This just gets better by the minute.  Now let's try the "new" command:
    ALTER USER 'root'@'localhost' IDENTIFIED BY '[root password]';
and it says:
    Query OK, 0 rows affected (0.000 sec)
Let's try setting the movabletype password while we're up.
    ALTER USER 'movabletype'@'localhost' IDENTIFIED BY '[mt password]';
That one was OK as well.  Log out of mysql.

Shut down the daemon with --skip-grant-tables and restart without it,

Now I can log in with root and the conventional root password.

I can also log in with movabletype and:
    use movabletype;
    show tables;
and confirm we're indeed in the database.

Stopped the daemon and restarted without the old password stuff:
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql &

Mother of babbling God.  Trying to log in gets us:
    ERROR 1275 (HY000): Server is running in --secure-auth mode, but
        'root'@'localhost' has a password in the old format; please
        change the password to the new format
Want to bet starting the server in --skip-grant-tables with
--old-passwords=TRUE set cause it to make the new passwords we set
in the old format?  Let's do it all over again without that.  Kill the
server and start with:
     /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql
        --skip-grant-tables=TRUE &
repeating the two ALTER commands above and exiting.  Kill this server.

Start again with just:
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql &
and now, at last, it lets me log in as either root or movabletype with
the corresponding passwords.

With the Mariadb server running, albeit manually started and our having
no idea why it's ignoring the datadir specification in /etc/my.cnf,
let's see if we can run the Movable Type administration application:
    http://aws2/wl/mt.cgi
I could have guessed:
    Got an error: Attempt to reload MT/Template/ContextHandlers.pm aborted.
    Compilation failed in require
OK, try:
    cd /server/bin/weblog
    find . -name ContextHandlers.pm -ls
which points us to:
    ./MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
so try:
    perl -c MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
and we get ye olde:
    Can't locate MT.pm in @INC (you may need to install the MT module)
        (@INC contains: /usr/local/lib64/perl5/5.32 /usr/local/share/perl5/5.32
        /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl
        /usr/lib64/perl5 /usr/share/perl5) at
        MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm line 11.
which I'm pretty sure means we've been torpedoed again by Perl's removal
of the current directory from the module search path.  The statement on
which it died was:
    use MT;
which is the parent directory under which Movable Type's code is kept.

Tried syntax checking this file with an explicit force of the library
directory:
    perl -I /server/bin/weblog/MTOS-4.23-en/lib -c \
        MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
and it now dies with:
    Base class package "Data::ObjectDriver::BaseObject" is empty
    (blither, blither, blither)
so maybe that Perl module isn't installed.  We've never had to install
it before, but let's give it a try.
    perl -MCPAN -e "shell"
    install Data::ObjectDriver::BaseObject
and it installed a whole bunch of stuff.  Now we try the syntax check
again and get:
    mutiple [sic] trigger registration in one add_trigger() call is
    deprecated. at /server/bin/weblog/MTOS-4.23-en/lib/MT/Entry.pm line 282.
Aren't you glad Movable Type is written in a scripting language that
respects its users' investment in developing programs in it and doesn't
open a trapdoor under them with every update, Microsoft-style?  All
right, let's try patching that.
    cp -p /server/bin/weblog/MTOS-4.23-en/lib/MT/Entry.pm \
        /server/bin/weblog/MTOS-4.23-en/lib/MT/Entry.pm_ORIGINAL
Patched the add_trigger() code to make two separate calls and tried
again.
    syntax error at MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
        line 6382, near "$f qw( min_score max_score min_rate max_rate
        min_count max_count scored_by )"
    Global symbol "$f" requires explicit package name (did you forget
        to declare "my $f"?) at MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
        line 6383.
Let's look at that one.
    for my $f (qw( min_score max_score min_rate max_rate min_count max_count scored_by )) {
And after another quarter hour goes down the rathole, we discover that
Perl now wants explicit parentheses around the for list, even though
qw is supposed to return a list.  Patch that.  Try again.
    syntax error at MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm
        line 8000, near "$f qw( min_score max_score min_rate max_rate
        min_count max_count scored_by )"
Patch that one.  Try again.  Another one at line 10268.  Patch it.  Try
again.
Yet another at line 15936.  Patch it.  Try again.
    MTOS-4.23-en/lib/MT/Template/ContextHandlers.pm syntax OK

Now, will it work?  Of course not!
    An error occurred
    Connection error: Access denied for user 'movabletype'@'localhost'
        (using password: YES)

After further review, it appears I set the Mariadb "movabletype"
password incorrectly.  Reset it to the proper value (my regular login
password).
    mysql -u root -p
    ALTER USER 'movabletype'@'localhost' IDENTIFIED BY '[login password]';

Now we get to the Movable Type administration log in page from:
    http://aws2/wl/mt.cgi
Log in with:
    User name: John Walker
    Password: 
and we get to the administration home page.  Tried various things:
    Manage/Blogs
    Enter Blog
    Manage Entries
    Preferences/General
    Tools/Activity Log
    Create/Entry        (Didn't actually try creating an entry)
    Create/Upload File  (Didn't actually upload a file)
So, at first glance, it appears to be working.

Note that at this point we're still running mariadbd from a command line
login explicitly setting "--datadir=".  We still haven't figured out why
it ignores the setting in its configuration file.  I may just patch
the systemd service definition to hammer in the --datadir option rather
than pour any more time down this rat hole.

I hear the talking heads are babbling about World War III.  After today,
I'm inclined to say, "Bring it on."
2 Likes

I find your narratives highly entertaining. (:

Note to self: Avoid AWS.

2 Likes

By the way, AWS just announced that starting on 2023-02-01 they’re going to charge ha’penny (US$ 0.005) per hour for every IPv4 address you have allocated.

Previously, you could have either the automatically assigned IPv4 address or one permanently allocated Elastic IP address assigned to an instance for free. If you bring your own IPv4 range to AWS, they remain free to use.

2 Likes

2023 October 21

Since the datadir= statement in /etc/my.cnf doesn't seem to do anything,
I decided to look into adding it to the command line in the service
definition, which is in:
    /usr/lib/systemd/system/mariadb.service
That file, which advises you not to edit it lest it be silently
overwritten by an update installation, contains the following:
    # MYSQLD_OPTS here is for users to set in /etc/systemd/system/mariadb@.service.d/MY_SPECIAL.conf
    ExecStart=/usr/libexec/mariadbd --basedir=/usr $MYSQLD_OPTS $_WSREP_NEW_CLUSTER
What's this mariadb@.service.d directory all about?  It doesn't exist,
so I created such a directory and file containing:
    --datadir=/server/var/mysql
And, guess what, it does nothing: still uses /var/lib/mysql.

On a guess, I renamed the directory /etc/systemd/system/mariadb.service.d.
Now, when I start the service, it says:
    Drop-In: /etc/systemd/system/mariadb.service.d
             - MY_SPECIAL.conf
    /etc/systemd/system/mariadb.service.d/MY_SPECIAL.conf:1:
        Assignment outside of section. Ignoring.
So, it found our file, but doesn't like it.  Let's try changing the
MY_SPECIAL.conf file to contain:
    MYSQLD_OPTS=--datadir=/server/var/mysql
On "systemctl stop mysqld" it says:
    Warning: The unit file, source configuration file or drop-ins of
        mariadb.service changed on disk.
        Run 'systemctl daemon-reload' to reload units.
so, I did:
    systemctl daemon-reload
now the stop works, and status confirms it's stopped.  Tried the start
again and got the "Assignment outside of section. Ignoring." bullshit
once more.  Let's try this in MY_SPECIAL.conf:
    [Service]
    MYSQLD_OPTS=--datadir=/server/var/mysql
Now, there's no error message, but it doesn't do a damned thing.
According to:
    https://mariadb.com/kb/en/systemd/
and:
    https://www.flatcar.org/docs/latest/setup/systemd/environment-variables/
you're supposed to use something like:
    [Service]
    Environment=MYSQLD_OPTS=--datadir=/server/var/mysql
OK, that changed the configuration statement to:
    /usr/libexec/mariadbd --basedir=/usr --datadir=/server/var/mysql
and we do indeed seem to be using that directory and can get into
the root and movabletype users.  But at startup, it still says:
    Database MariaDB is probably initialized in /var/lib/mysql
        already, nothing is done.
Cleared out /var/lib/mysql:
    cd /var/lib/mysql
    rm -rf *
Tried to start mariadb.  Now it crashes with:
    Database MariaDB is not initialized, but the directory
    /var/lib/mysql is not empty, so initialization cannot be done.
All right, it looks like the rm -rf * missed two leading dot files.
    rm .*
Now the start works, and it's filled up /var/lib/mysql with a virgin
database again, but it is connecting to /server/var/mysql from the
daemon.

Cleaned up cruft from /etc/my.sql, stopped and restarted daemon, and
other than the blither about /var/lib/mysql, it looks OK.

Set MariaDB to start automatically at boot time:
    super
    systemctl enable mariadb
    systemctl is-enabled mariadb

Rebooted.  Mariadb started automatically.  Movable Type access to its
database appears to work.

Installed:
    dnf install whois

Made a backup AMI:
    Fourmilab Backup 2023-10-21 (L2023)  ami-0b12c898e1e054b1c
        /           snap-06073e5bc08237ff3
        /server     snap-03beb94f6ce388d38
This the first backup of this system, so I made it in reboot mode to
make it super squeaky clean.

Installed crontab for root.  This is the same as on the production
server.

Checklist for putting Linux 2023 into production.
    1.  Shut down HTTPD on production Fourmilab instance.
    2.  Copy /server/log directory contents to aws2.  Note that this
        includes the Webalizer database (/server/log/webalizer).
    3.  Copy ~/web/serverstats/summary directory to aws2.  This captures
        recent changes to the access statistics pages.
    4.  Copy /server/pub/hackdiet directory to aws2.
    5.  Copy /server/pub/hosting directory to aws2.  This is important
        to capture updates to statistics even if the content has not
        changed.
    6.  Copy /server/var/hbproxy directory to aws2.
    7.  Stop production Fourmilab instance.
    8.  Disassociate Elastic IP 52.28.236.0 from production instance.
    9.  Associate Elastic IP to aws2.
   10.  Change fourmilab.ch IPv6 AAAA record to
        2a05:d014:d43:3101:94aa:a276:e035:6a2a.
   11.  Reboot aws2 to associate Elastic IP address with all server
        processes.
   12.  Test sending mail from new server.
   13.  Test Bacula connection from Pallas, run full backup.
   14.  Detach /o file system and comment out in /etc/fstab.

To recursively compare directories with rsync:
    rsync -rvnc --delete ${SOURCE}/ ${DEST}
Where:
    -r : recurse into directories
    -v : list the files
    -n : do not change anything
    -c : compare the contents via checksum not not mod-time & size (use -a otherwise)
    --delete : look for a symmetrical, not a uni-directional difference
Source:
    https://unix.stackexchange.com/questions/57305/rsync-compare-directories

Maybe tomorrow we'll try cutting over to the new server.

Couple things:

  • Your long lines are clipping because a style class is applying overflow-wrap: break-word and <pre> can’t do that. You don’t appear to be using markdown’s code blocks (using ``` marker lines) for your text, so I’m not sure how to fix that. Or may you should be using code blocks.

  • You were struggling with section names for a mysql/mariadb config file. Not sure what possessed you to use systemd’s [Service], but you probably should have used [mysqld] or one of the MariaDB documented variants.

1 Like

2023 October 22

Installed the /etc/rc.d/rc.local from the production server.  This will
be used to start our local daemons at boot time.

Copied the /etc/systemd/system/rc-local.service file from the
Scanalyst site to the same location on aws2.  This file was created to
allow enabling the rc-local service.  Let's try it.
    systemctl enable rc-local
    Created symlink /etc/systemd/system/multi-user.target.wants/rc-local.service
        /etc/systemd/system/rc-local.service.
and "systemctl status rc-local" reports:
     Loaded: loaded (/etc/systemd/system/rc-local.service; enabled; preset: disabled)
enabled ... disabled.  What the Hell, it's systemd.

Tried starting rc-local manually:
    systemctl start rc-local
and it appears to have worked:
    /server/init/servers status
    hotbits (pid 77411) is running...
    bacula-fd (pid 77426) is running...
    waisserver (pid 77436) is running...

Patched /server/init/functions to get around the idiot:
    warning: consoletype is now deprecated, and will be removed in the near future!
message.

Rebooted to see if our local daemons start at boot time.

They did.  It looks like rc-local is working.

Now it's time for the big gulp: migrating the production site to the
new platform.  This will proceed as planned in the checklist in
yesterday's log.

1.  Shut down HTTPD on production Fourmilab instance.
        systemctl stop httpd    #   Stopped at 2023-10-22 13:35 UTC
        Stopped Flogtail log monitor.
    
2.  Copy /server/log directory contents to aws2.  Note that this
    includes the Webalizer database (/server/log/webalizer).
        #   On aws
        super
        cd /server/log
        tar cfvJ ~/tmp/log.tar.xz .  # This takes a long time
        md5 ~/tmp/log.tar.xz
            731F0F4A460BA20D081EF68E00892E48  ~/tmp/log.tar.xz
        du -sh .
            1.3G	.
        #   On Hayek
        scp -p aws:~/tmp/log.tar.xz aws2:~/tmp
        #   On aws2
        super
        systemctl stop httpd
        /server/init/servers stop
        md5 ~/tmp/log.tar.xz
        731F0F4A460BA20D081EF68E00892E48 ~/tmp/log.tar.xz
        cd /server/log
        rm -rf *
        tar xfvJ ~/tmp/log.tar.xz
        du -sh .
            1.3G	.
        rm ~/tmp/log.tar.xz
        #   On aws
        super
        rm ~/tmp/log.tar.xz
        
3.  Copy ~/web/serverstats/summary directory to aws2.  This captures
    recent changes to the access statistics pages.
        #   On aws
        cd ~/web/serverstats/summary/
        tar cfvJ ~/tmp/sstat.tar.xz .
        md5 ~/tmp/sstat.tar.xz
            6CD074F962676EE5E9E9C3253AFBF596 ~/tmp/sstat.tar.xz
        du -sh .
            35M	.
        #   On Hayek
        scp -p aws:~/tmp/sstat.tar.xz aws2:~/tmp
        #   On aws2
        cd ~/web/serverstats/summary/
        rm *
        md5 ~/tmp/sstat.tar.xz
            6CD074F962676EE5E9E9C3253AFBF596  ~/tmp/sstat.tar.xz
        tar xfvJ ~/tmp/sstat.tar.xz
        du -sh .
            35M	.
        rm ~/tmp/sstat.tar.xz
        #   On aws
        rm ~/tmp/sstat.tar.xz
        
4.  Copy /server/pub/hackdiet directory to aws2.
        #   On aws
        super
        cd /server/pub/hackdiet
        tar cfvJ ~/tmp/hdiet.tar.xz .
        md5 ~/tmp/hdiet.tar.xz
            F1D0425F72A4C3DD833B6B2400E8A59A  ~/tmp/hdiet.tar.xz
        du -sh .
            3.3G	.
        #   On Hayek
        scp -p aws:~/tmp/hdiet.tar.xz aws2:~/tmp
        #   On aws2
        super
        cd /server/pub/hackdiet
        rm -rf *
        md5 ~/tmp/hdiet.tar.xz
            F1D0425F72A4C3DD833B6B2400E8A59A  ~/tmp/hdiet.tar.xz
        tar xfvJ ~/tmp/hdiet.tar.xz
        du -sh .
            3.3G	.
        rm ~/tmp/hdiet.tar.xz
        #   On aws
        rm ~/tmp/hdiet.tar.xz

5.  Copy /server/pub/hosting directory to aws2.  This is important
    to capture updates to statistics even if the content has not
    changed.
        #   On aws
        super
        cd /server/pub/hosting
        #   No compression due to large image and video content
        tar cfv ~/tmp/hosting.tar.xz .
        md5 ~/tmp/hosting.tar.xz
            08EEADA8C3DB5DBBBCB1FD00C61E5D9D  ~/tmp/hosting.tar.xz
        du -sh .
            13G	.
        #   On Hayek
        scp -p aws:~/tmp/hosting.tar.xz aws2:~/tmp
        #   On aws2
        super
        cd /server/pub/hosting
        rm -rf *
        md5 ~/tmp/hosting.tar.xz
            08EEADA8C3DB5DBBBCB1FD00C61E5D9D  /server/home/kelvin/tmp/hosting.tar.xz
        tar xfv ~/tmp/hosting.tar.xz
        du -sh .
            13G	.
        rm ~/tmp/hosting.tar.xz
        #   On aws
        rm ~/tmp/hosting.tar.xz
    This took way too long.  See comment below about how to do it the 
    next time.

6.  Copy /server/var/hbproxy directory to aws2.
        #   On aws
        cd /server/var/hbproxy
        tar cfv ~/tmp/hbproxy.tar .
        md5 ~/tmp/hbproxy.tar
            F8E1DF80AE7BDD0E51E562EB3863178A  ~/tmp/hbproxy.tar
        du -sh .
            8.1M	.
        #   On Hayek
        scp -p aws:~/tmp/hbproxy.tar.xz aws2:~/tmp
        #   On aws2
        cd /server/var/hbproxy
        rm *
        md5 ~/tmp/hbproxy.tar.xz
            F8E1DF80AE7BDD0E51E562EB3863178A  ~/tmp/hbproxy.tar
        tar xfv ~/tmp/hbproxy.tar
        du -sh .
            8.1M	.
        rm ~/tmp/hbproxy.tar
        #   On aws
        rm ~/tmp/hbproxy.tar

7.  Stop production Fourmilab instance.
        super
        shutdown -h now
            Status in Instances panel changed to Stopping.
            Status in Instances panel changed to Stopped.
        
8.  Disassociate Elastic IP 52.28.236.0 from production instance.
        On AWS Elastic IPs panel, selected Fourmilab: eipalloc-a5925fcc
            and performed Actions/Disassociate Elastic IP address.

9.  Associate Elastic IP to aws2.
        Logged out from aws2 instance on its automatic IP address.
        On Elastic IPs panel, selected Fourmilab again and did
            Actions/Associate Elastic IP address, selecting the
            Fourmilab L2023 instance, i-001f3b098ccbd4966.
        Back on the Elastic IPs panel, it shows the Fourmilab IP address
            so associated.
        On the Instances panel, after a refresh it shows 52.28.236.0
            as the public IP address.

10. Change fourmilab.ch IPv6 AAAA record to
    2a05:d014:d43:3101:94aa:a276:e035:6a2a.
        Changed for fourmilab.ch, ipv6.fourmilab.ch, server.fourmilab.ch,
            www.fourmilab.ch in AWS Route 53.  I actually did this while
            the big copy in step 6 was underway to save time since
            with the HTTP servers' being down on both sites, no
            outside queries will be misdirected.
        On Hayek, after accepting the new host key, "ssh aws" now
            gets to the new server.
    
11. Reboot aws2 to associate Elastic IP address with all server
    processes.
        super
        shutdown -r now
        Came up normally after the reboot.
        Home page access to https://www.fourmilab.ch/ works.
        Log in to Movable Type works.
        Secure access to HotBits works.
        Flogtail from Hayek shows 
        Access to hosted sites tested.  All work.
    
12. Test sending mail from new server.
        Mail -v REDACTED
            Sent mail from server.
        Mail didn't arrive.  Discovered Sendmail wasn't enabled and
        running.  Performed:
            super
            systemctl enable sendmail
            systemctl start sendmail
            systemctl status sendmail
                fourmilab systemd[1]: Starting sendmail.service - Sendmail Mail Transport Agent...fourmilab sendmail[6366]: My unqualified host name (fourmilab) unknown; sleeping for retry
                fourmilab sendmail [6366]: unable to qualify my own domain name (fourmilab) -- using short name
                fourmilab sendmail [6575]: starting daemon (8.17.1): SMTP+queueing@01:00:00
                fourmilab systemd[1]: sendmail.service: Can't open PID file /run/sendmail.pid (yet?) after start: No su>
                fourmilab systemd[1]: Started sendmail.service - Sendmail Mail Transport Agent.
            Newly sent mail arrives, and did not go into Spam bucket.

13. Test Bacula connection from Pallas, run full backup.
        In Bacula console on Pallas, tried status/client/AWS and it
            communicated OK with the file daemon
        Started incremental backup at 20:49 UTC.  It seems to be
            communicating OK.
        Backup completed at 22:53 UTC.
        Because we have a full backup scheduled for early next month,
            I decided an incremental was sufficient in this case.  The
            backup was so large because I forgot to delete the
            ~/tmp/hosting.tar.xz file, which was 13 Gb.  It's gone
            now.
        
14. Detach /o file system and comment out in /etc/fstab.
        I've decided to defer this until later in case we may need
        something from it to fix something on the new server.

In /server/init/functions, replaced two calls on "usleep 100000" with
"sleep 0.1" to get rid of deprecation warning about usleep.

Hacker's Diet style sheet appears to be screwed up.

OK, I forgot to remove the ~/web/.htaccess intended to block
non-Fourmilab access during the migration.  Now I'm seeing lots of 200
status in the Flogtail output and the style sheet problem cleared up.

We're back up and successfully serving user requests, both IPv4 and
IPv6, at 18:31 UTC.  The process of updating the hosting site files
took way too long.  Next time we need to be smarter about it, either
using rsync between the old and new machines or mounting a snapshot of
the old machine's hosting directory on the new and copying directly
from the mounted old filesystem to the new hosting directory.

Changed the instance name of the Linux 2 server to "Fourmilab (Linux 2
Legacy)" and the new server, now in production, to "Fourmilab".

The symbolic link from /home/kelvin to /server/home/kelvin was missing.
Added:
    super
    ln -s /server/home/kelvin /home/kelvin
Some cron jobs and other items count on this link to find things in
my ~/bin directory.

The /server/cron/RPKPcontrol.pl job was failing because it used the
hostname command as the hostname and source of the internal IPv4
address for RPKP control runs.  On the old server, this was the
AWS local hostname (like ip-172-31-25-145.eu-central-1.compute.internal),
but we'd set it to "aws" on the new server, so this broke.  I replaced
the hostname command with:
    ec2-metadata --local-hostname
which returns the local hostname of the current instance.  You can also
get the local IPv4 address with --local-ipv4, but I just left in the
code that digs it out of the local hostname.  Tested control run
generation and it works.

The /server/var/units/units_cur_local Python program, run by the cron
job /server/cron/Units_Currencies was failing because its "#!" line
at the top specified /usr/bin/python, which is absent on this system,
with only /usr/bin/python3 installed.  Fortunately, the program is
compatible with both Python 2 and 3, so I just changed the interpreter
path and now it's working.

The /server/cron/HotBitsCheck.pl job was failing because it was using
the hostname as the domain name for the HotBits server it was querying.
I changed it to get the server name from: "ec2-metadata --public-hostname".

The /server/cron/RPKP_Daily_Reports cron job was failing because two of
the Perl programs it invokes:
    /server/log/rpkp/analysis/rp.pl
    ~/cgiexe/RPKPreport.pl
ran afoul of Perl's no longer looking for library modules in the
current directory.  I added:
    use lib ".";
statements in the header of both programs.  I ran the job and both 
reports it produced were correct.  I also tested the custom log report 
query page, and it's working OK.

The /server/cron/yoursky_elements/UpdateYourskyJPLOrbitalElements cron
job was crashing with a shared library problem when trying to run the
FORTRAN program that extracts data from the JPL database, requiring me
to recompile the contents of the /server/cron/yoursky_elements/Translate_JPL
directory.  This was torpedoed by compile errors:
    Rank mismatch between actual argument at (1) and actual argument
        at (2) (rank-1 and scalar)
It looks like the GCC "purity of essence" Kode Kiddies have gotten
loose on gFortran now, and are spewing errors for sloppy FORTRAN
practices that have worked for more than sixty years.  To get rid of
this, you have to specify:
    -fallow-argument-mismatch
on the gfortran command line.  I added it to FFLAGS in the Makefile,
only to discover that it never uses that setting on the actual command
line.  Once I fixed that, it recompiled correctly.  While I was at it,
I added -Bstatic to the command line so the executables will be static
linked and more likely to survive shared library torpedoes in future
updates.

Manually reran the translate and HTML generation steps for asteroids
and they worked correctly.  Then I moved on to the periodic comet
data, and found that
/server/cron/yoursky_elements/GenHTML/Yoursky_elements_pcomets.pl
fails with:
    Use of uninitialized value $1 in uc at
        ../GenHTML/Yoursky_elements_pcomets.pl line 129, <> line 3896.
It turns out this is due to a comet, the enigmatic "471P/" with a
blank name in the database.  I fixed the code to treat such comets as
having an index letter of " ", which sorts them at the top of the
Object Catalogue table.
1 Like

If you use a ``` code block, long items get put into their own little scrolling box within a scrolling page. I find this ugly and confusing and more difficult to read
since you can only see a small part of the document at a time. I’ll just try to manually break lines where the truncation might lose something significant.

The “documented” way to specify the location of the database file via the datadir= statement in the [mysqld] section of my.cnf does not do a damned thing. I am sure that it is reading that file, because other settings in it do take effect. The only way I could find to make it use my database directory location was via a command line option on mariadbd, and the only way to make that happen was to pass in an Environment= setting in Systemd. I am coming to detest Systemd with a loathing approaching that I have for WordPress.

1 Like

Blaming configuration woes with MariaDB/MySQL config file BS on SystemD is simply ridiculous. MySQL/MariaDB config files are complicated piles of crap with a bunch of section override rules that try to offer “outs” to every possible legacy configuration. (Yes, the [mysqld] section can be overridden, and probably is in your case.)

As it happens, I rather like SystemD’s config file directory hierarchy and override architecture, so that you can upgrade services, and get any new baseline config, without disturbing overrides in /etc/. Vastly superior to the MariaDB junk. You might want to read the SystemD docs on how it prioritizes and applies configuration, since you are now going to rely on it. (You are whining, John. Not a good look for an über-geek.)

1 Like

Fair enough.

Well, maybe, but I’m hardly alone. I just did a Google search for “hate systemd” and got “About 364’000 results”. (Hit counts are about all I use Google for, as Brave search doesn’t give you a count.)

Oh, yes. But those people go to great lengths to use distributions (like Devuan) that excise systemd completely. Sadly for them, there don’t appear to be any cloud providers in that camp.

1 Like