incident 26 January 2024 - postmortem

Yesterday was down for about 25 minutes. As this kind of incident has not happened before, would like to share what happened.

Yesterday around 18.20 UTC all clients in gameroom got disconnected from the server, including me.
Ping to server replied but could I not ssh into the VPS (in anymore.
The control panel in transip worked, (I could even ssh into the server through that control panel) but the most intresting thing I saw was the sudden network traffic drop to 0 (see picture attached). And of course, the CPU usage and server load was close to 0.

So the initial verdict was that there is something wrong with the service provides (transip) network.
I started to go through different logs in the server and found nothing.
When the incident had lasted for about 20 minutes, I decided to restart the VPS .... and it worked! was back online.

So I started to dig again in the logs and got frustrated because there was nothing useful there.
Eventually, I wrote to transip support and got an answer (1 hour after the incident started!):

Dear customer, Due to an incident with the VPS firewall, your VPS is currently less available. Our engineers are doing everything they can to resolve the problem as quickly as possible. If you do not want to wait for that, you can turn the VPS firewall off and on again in your TransIP control panel as a workaround. We would also like to keep you informed via Our apologies for the inconvenience. Yours sincerely, TransIP

So - it was a problem on their side. I don't recall an incident so big on their side in past 7 years or so - they have been actually a decent hosting partner. Nevertheless, if those kind of incidents continue to happen, we might need to find a new partner.

Best Regards,
Marten Meikop founder
