Webchat outage November 6th, 2021 to November 8th, 2021

So before I explain what happened, let me say this: Had literally anyone contacted a moderator (who could have contacted me) or had I been contacted directly, instead of days this outage would have been hours at most.

I really don’t understand why you would see something is down for an extended period and not contact the person who could fix it or at least tell you why!

Now, that being said, here’s what happened:

The TLS certificates used by our sites are issued by Let’s Encrypt and must be renewed every few months. This is an automated process for the websites.

For the chat daemon and Kiwi’s internal web server, this process is a little different. certbot has to run a special script to restart these services and because it has certain requirements for such scripts it can’t do a lot of checking to ensure everything went okay.

The script first sends a signal to the chat daemon that tells it to reload the TLS certificates. This part went fine or you wouldn’t be reading this. :slight_smile:

Next it stops the Kiwi process beause the restart function in Kiwi does not work properly when being called via a script running via sudo. This part ran as it should.

Finally the script starts Kiwi after a 2 second delay. This is what failed. Turns out certbot killed the script because the process of stopping Kiwi takes too long and that necessary two second delay sent the script over its time limit.

Now, normally this wouldn’t be an issue because there is a script that is supposed to ensure Kiwi is always running by checking for it every few seconds. On the 6th, I was attempting to build the updated version of Kiwi that allows the web inteface to live separately from the websocket proxy it requires to translate between the HTTP protocol and the IRC protocol.

And this is where the massive fail happened: That building process is ongoing because I’ve been making several customizations to it, and the automated test script it creates is called kiwi which is the process name the auto-restart script looks for so it believed it was running when it actually wasn’t.

This shouldn’t happen again on the next renewal, but to be honest, I’m half-tempted to just throw Kiwi in the unsupported clients bin at this point as it causes more problems than it solves.

For those wondering, there are in fact only 3 clients on the “not supported” list at the time I’m writing this:

Client Reason
AdiIRC Does not properly close connections when receiving a Terminate Now command from Windows during the shutdown process. This causes TLS errors and memory leaks in the IRCd that forces restarts ocassionally to keep it from crashing.
The Lounge It’s a self-hosted web client similar to kiwi that if misconfigured causes it to keep connections open when it shouldn’t, or worse, it becomes a public interface and we do not need nor want unofficial public web interfaces, period.
Mibbit To use this properly requires special configuration on the IRCd’s side, and it’s not worht the hassle, plus see previous comments regarding unofficial web interfaces.

I’m not saying you can’t use these clients, but if you have issues with them no effort will be made to assist you. You’re on your own.