[Resolved] Network instability

Monitoring has alerted me to some short network outages, the first of which lasted from around 22:06 to 22:12 this evening. This appears to have been caused by a problem on the BGP link between the Servology network and our primary upstream transit provider. The connection broke with a “Hold Timer Expired” notification sent by our router, which generally means the BGP process at the far end of the link was not responding for some reason.

As there has been more than one such event, I have temporarily shut down our IPv4 session with that transit provider and all IPv4 connectivity is via our backup transit provider for now. I will keep an eye on the situation and reestablish our primary IPv4 BGP session when it seems safe to do so.

[02:00] This incident appears to have been due to outages of the transit provider router. My nagios monitoring installation regularly pings that router’s loopback IP address (i.e. not the interface address we use to connect to it) from a third-party hosting provider’s network, and saw two periods when that router address did not answer pings, corresponding to the two outages we saw. It has now been several hours without nagios noticing any further outages so I have resstablished the BGP session, returning the network configuration to normal.