State of Project Segfault: August 2023

Welp, this was a fun month. We started with IN Node crumbling to its last legs, and had many important changes like the US Node migration and Status VPS decommission, while we ended with Soleil Levant having disk failure. However, we still managed to bring forth some nice improvements to the project as usual!

Server News

Soleil Levant's drive failure and resulting downtime

Basically, we had a drive failure on the night of the 17th, got a replacement on the 18th and the services have been running since.

Most services on Soleil survived with a few databases on DatabaseVM having non-important tables corrupted.

Large parts of the Synapse DB had severe corruption, requiring us to restore to the backup we had from the 13th (this was done on the 22nd of August, so about a week old).

For more details, see the blog post we made about the downtime.

Once again, we are extremely disappointed at the magnitude and lengthiness of the downtime and will start implementing preventive measures after the dust settles.

The Status VPS has been decommissioned

The Status VPS has been the VPS used to host our uptime-kuma page and simplelogin instance ever since hebergnity died.

However, we felt it was pretty much useless since it served basically no purpose for the money we paid for it (4.2EUR).

This, alongside the funds we needed for our US node migration made us cancel it, in favour of moving the status VPS to the homeserver of a friend of Midou and Arya (at no cost). We also migrated the simplelogin instance to Pizza1, our EU VPS.

The US Node saga

TL;DR: We are on racknerd with 4.5GB RAM, 100 GB SSD, 3 cores now.

Since the beginning of the month, we had been planning the US Node migration, originally thinking of moving it to Hetzner.

However, around the second week when we were ready to do the migration, Hetzner refused to accept the KYC proof from neither Arya nor MrLeRien.

In a haste, thanks to devrand's pestering, MrLeRien got a BuyVM 1024-slice (1GB RAM, 20GB Storage, 1 Core), forgetting the fact that US Node even with the current specs (2GB RAM, 40GB Storage, 1 "premium AMD" core) was struggling.

Thankfully, we realized this early and started thinking about just running it for a month and then moving to another provider. Then, we realized midou paid for 6 whole months, and that buyvm can't do refunds.

Thankfully, after a quick support ticket, BuyVM confirmed that cancelling the VPS would return the funds to the account (so that we could use the funds as balance to pay for other slices on the service like Pizza1)

The $21 we got as credit will last for around 3 months.

Now, after all this, we were still confused what provider to go to, considering how expensive the VPS space is in the USA.

Thankfully, after some asking and browsing, we stumbled across Racknerd and the ever-useful racknerdtracker.com.

We decided to go with the 4.5GB KVM VPS (4.5GB RAM, 100GB SSD, 3 cores of a Xeon slightly newer than that of Soleil Levant), which we paid for an entire year (only option).

This cost us 44.5EUR, which amounts to around 3.7EUR a month, which is a really good deal if you ask me :)

The migration thankfully, unlike others we had this month, was surprisingly smooth, and had little downtime.

Average IN Node moving systems 3 times in a week :)

TL;DR: We are now on an Acer Aspire 7 A715-75G with i5 9th gen, 12 gb ram, 256 GB SSD.

As the month started, Arya noticed an issue with his personal laptop, wherein the fans literally didn't run, and wanted to use the macbook for personal use till he could get a new laptop to replace it.

For this, IN Node was migrated to the old HP machine that used to run the services back before the Macbook.

This was frankly a pretty hasty migration and like really slow, and was almost immediately followed up by CPU lockups and random reboots.

At this point, Arya tried a lot of stuff to tame the VM from using the amount of resources it did, but to no avail.

At some point however, he just bit the bullet and moved it to his old laptop, an Acer Aspire 7 A715-75G, with an i5 9300H and 12 GB RAM, while moving the old server's SSD to prevent issues.

At this point there was still some issues, which was fixed by shutting down one of Arya's personal LXC containers which was acting weird.

However at the end, we had to move some non-privacyfrontend services that were hosted on IN back when we had a lot more ram to soleil and other nodes, which will be detailed under the Service News category as usual.

Rsync.net issues

Barely few days before the disk failure, we realized that the backups for DatabaseVM to rsync.net were broken. The first issue was because of the Debian 12 upgade which enabled PEP-0668 by default, breaking the libraries used for borgmatic.

After this was fixed however, we had another issue. When running under a cronjob, the rsync.net backups got randomly killed by the remote. This was due to ssh connection timeouts killing the ssh connection (borgbackup uses sftp) before the thing could finish.

We filed an issue about it on the 15th, and by the time Rsync.net responded, soleil already had the DB failure, which resulted in us having to use the initial backup Arya recreated on the 13th.

IPv6 stuff

Pizza1 now uses two IPv6 addresses, one for invidious/piped/other internal uses and another for the IPv6 that is in DNS which users directly interact through.

This is so that changing the IPv6 when google blocks our IP does not affect people who access our service through IPv6 and don't have a fast DNS resolver which updates as soon as we change the IPv6 on our authoritative resolver.

Service News

Akkoma now has open registrations

After a long time of manual approval on akkoma, we now allow people to register on their own.

We hope this'll make our akkoma instance more active and lively :)

Kbin messages being in the future in lemmy has been fixed

We were informed that posts from our Kbin instance were popping up as "from the future" in lemmy instances, which resulted in them always being pegged to the top.

While Arya did follow the instructions provided in the issue regarding it, he completely forgot that changing the timezone in just postgres wont change a thing since TZ was set to CEST in postgresql.conf. I discovered this while fixing the database after the Soleil disk failure.

Libretranslate has been moved to soleil

After IN Node's migration and the resulting lack of resources on the server, we have moved libretranslate to Soleil Levant.

Simplelogin is now on version 4.65 from 3.10

While migrating simplelogin from Status VPS to Pizza1, Arya took the oppurtunity to migrate simplelogin to the latest beta release (There aren't any stable 4.x releases I think and 3.x is terribly outdated).

You can now make use of the new features that were added to simplelogin in the past year or so!

Mediawiki registrations are now authentik-only

Why did we not do this earlier, I cannot answer, but due to extreme amounts of spam on our Mediawiki instance, we have deleted the users and restricted signup to users of the pubnix authentik system.

That's about it for this month. I know basically every topic was like a thesis onto itself, but eh we did a lot this month, and it can't be covered in a short article :)

Sorry for all the downtime, and see you next month :D

-Arya