Note, not all guests on mantle were rebooted. Downtime was approximately 15 minutes. The cause was (my own) human error; I plugged the keyboard into the wrong server when I started the reboot; we were able to cancel the reboot before all guests went down, then we brought up the guests again.
September 2010 Archives
So, according to some metrics over the last two days we had 3 hours of downtime. But it was spread over two days, so it really should count for more.
So, here's what svtix said about the matter:
In consideration of the downtime experienced in our SVTIX data center on Septem\ ber 13 and 14, I am crediting your account for three days of service. This wil\ l be applied to your current invoice.
Now, this seems to be how most of my competitors do it, too. At best, they give you a symbolic apology.
the thing is that if I had taken the sla payout from my last network outage, and instead of giving those credits, I had spent the money on a new router and a secondary, redundant upstream, this problem would not have been a big deal at all. Customers would not have experienced downtime.
So yeah, while an SLA is a good way of estimating the cost of a problem and aligning the interests of the owner with the interests of the customers wrt. downtime, I think that when the company is in 'full growth' mode like prgmr.com is, it might hurt more than it helps, by removing some of the working capital that would have otherwise paid for infrastructure upgrades.
We apologize for any inconvenience this might have caused.We will update this post again when we have more information.
There seems to be a problem with one of the XO devices that one of our transport lines into SVTIX is connected to.
XO moved our port to another line card yesterday as it was rebooting itself.
The new line card we were allocated today had frozen and needed a restart and thus is the reason for today's outage.
We have been working very closely with XO in the past two days and they think that today's issue is not related to yesterday's issue. They think that a card reload was need in order to push all the recent configurations to the router and they say believe we should not be seeing this anymore. We have however asked XO to escalate today's event to their tier 3 support for further investigation.
This is the most up-to-date information we have on the situation. We certainly hope that XO have truly fixed the issue.
They should be getting back to us with a final confirmation later today or early tomorrow.
Please be advised that these issues are affecting every carrier in SVTIX that is connected through this XO device, it is not isolated to EGI's network only.