|
At the end of my last post, I exclaimed "Vive la Fiber!" In light of this morning's second major fiber failure, that comment is now extremely ironic. The fiber connection did not "vive" (live long). It did not "vive" much at all.
Compounding the problem, our cell phone alerts didn't go off for various reasons - thus we didn't discover the problem until after 9:00am Eastern (6:00am local time).
After spending 30 minutes ensuring that it was indeed a problem with the fiber connection, we had our ISP manually move our web traffic back onto our four older T3 circuits. That process finished around 10:20am Eastern.
We are now working on four different issues to try and get back to the level of reliability that you expect from us.
1.) We are waiting for the cable operator - AboveNet - to find the problem and fix it. We assume there was another break in the cable somewhere, but we are waiting for confirmation.
2.) We will never put all of our traffic on just one AboveNet fiber link again. Despite the obvious benefits of increased speed and bandwidth, the reliability is not there and reliability trumps everything. This is a case of "Fool me twice." Once the problem is found and fixed, we will work with our ISP to configure things so that if the Fiber connection fails again, things will automatically switch over to the T3 circuits immediately. The older T3 circuits are very expensive, but we are now going to keep them around to ensure reliability.
3.) We are expanding the number of people that get cell phone alerts whenever there's a problem with our website. We will also be testing those cell phone alerts more frequently.
4.) We are making changes to ensure that our Status blog is always available even when the rest of the website is offline. You should be able to visit http://blogs.stockcharts.com/status at any time and see our site status.
As you can see from our Pingdom Monitoring report, we've been down 14 hours 38 minutes this month. This is our worst month by far for outages. Despite all evidence to the contrary, we are working hard regain your trust on these issues. I will post additional information as it becomes available.