Reason for outage 2020.06.29 – Rack D08

Customer impact: Servers hard down for customers in Rack D08

Outage Detail: Early monday morning we experienced a power failure in Rack D08 , which resulted in all servers within this rack falling offline. The failure was caused by a faulty Automatic Transfer Switch which ensures power redundancy within racks. The root cause of why the ATS failed is still being investigated.

Actions taken:
Reset ATS and restarted failed machines.

Possible improvements:
ATS consists of two separate 16A power banks. Consider moving half of the machines to the other power bank in order to reduce risk of potential fallout due to a failure and reduce impact if there is a failure.

Important switch upgrades

Due to urgent maintenance, we will be upgrade the software on the majority of our switches.
The process will begin at 00:10 friday 12th of june . It is expected for the upgrade process to last about two hours.

Once the switch you are terminated on is upgraded we expect about 5 minutes of downtime.

This post will be updated as necessary after the upgrade.