Reason for outage 2020.06.29 – Rack D08

Customer impact: Servers hard down for customers in Rack D08

Outage Detail: Early monday morning we experienced a power failure in Rack D08 , which resulted in all servers within this rack falling offline. The failure was caused by a faulty Automatic Transfer Switch which ensures power redundancy within racks. The root cause of why the ATS failed is still being investigated.

Actions taken:
Reset ATS and restarted failed machines.

Possible improvements:
ATS consists of two separate 16A power banks. Consider moving half of the machines to the other power bank in order to reduce risk of potential fallout due to a failure and reduce impact if there is a failure.