Copy of
Server Alert
We had an email problem overnight 13th June that we were aware of and has now been rectified. This was because of problems with the DNS resolvers in the datacenter and we are now using different resolvers. We apologize for another problem caused by our provider "The Planet". Please be advised that we are currently in the process of copying all customer sites to a new provider (http://www.eApps.com) . If your site has been copied it will be visible at http://mirror.yourdomain.com. We will switch to the new hosting provider as soon as all sites are copied and validated. This should then resolve the problems we have had with our current datacenter. Unfortunately we cannot copy passwords across to the new server, so will have to contact users individually to provide passwords for the sites and email. Please email chris@karisto.co.uk with any queries. Thanks.
Statement from Karisto Limited :
As you are aware from reading our web site there was an explosion in the datacenter electrical room that provides power to our server and to the distibution network to which it is attached. We were aware of this as soon as the server went offline and we posted our server information page at www.karisto.co.uk as soon as was practically possible, along with a link to the information being released from the datacenter management company. In essence this was all the information that we had at our disposal.
I am aware that many customers left messages on Monday, but unfortunately the volume of requests meant that it was not possible for us to respond individually. Since all the information we had was on our server alert site, and by Monday morning on our main site as well, we put our effort into keeping this page updated. We also, unfortunately, had no estimates of the time to restore power. As you can imagine following an explosion more damage was uncovered as repairs were underway, then the backup generator that did bring servers back overnight on the first day failed when its breakers tripped and a replacement generator had to be found.
The cause of the explosion is currently unknown, but new components are being aquired now to rebuild the electrical room. This will take over a week to complete, after which there will be a short downtime when the new electrical room is installed. We would have liked to put information up on all customers' sites to alert customers and their site visitors to the situation, but we were unable to access the server, and since domain names point to specific IP addresses on the network it is not simple to redirect sites, particularly when the domain nameservers for many domains reside in the same datacenter. It took 24 hours just to put up the status page on our main .com site.
I am now attempting to get quotes from other providers to put together more robust and/or backup systems, but our pricing structure does not give us much scope for additional costs, although it is our aim to have some sort of contingency in place for any future event of this type. However, our experience is that each server outage has its own unique causes and it is not possible to guarantee 100% uptime without having a complete duplicate network and sophisticated monitoring software. Our position as a low cost but professional hosting service does not allow us to fund such a sophisticated response at present.
Again apologies for the downtime,
Regards
Chris Howland
Karisto Limited
Datacenter Updates :
June 4, 2008, 12:12am
"Testing of the H1 Phase 1 generator went remarkably well and faster than expected. We are now bringing customer servers online in batches."
June 3, 2008, 10:34pm
"We are continuing to test the new H1 Phase 1 generator. We will post additional information as testing progresses."
June 3, 2008, 9:30pm
"The new H1 Generator has arrived on site. For the next hour we will be pumping fuel out of the old generator and into the new one generator and performing tests on the new generator. More news will be posted once testing is complete."
June 3, 2008, 6:54pm
"Fixing the faulty breaker on the generator powering H1 Phase 1 was not successful. we have located a second generator that is currently being delivered to the facility. It is expected to arrive this afternoon and we will provide additional information regarding the new generator at that time."
June 3 2008, 2:13pm
"Because the transfer switch and distribution panel were damaged beyond repair, we are running H1 Phase I from a temporary generator, while Phase II is being powered by our permanent generator. We tested the temporary generator extensively prior to bringing it into service, and we did not find any indication of the faulty breaker.
Our facilities group is working with the generator contractors to repair the faulty breaker as soon as possible."
June 3 2008, 12:48pm
"Around 2:20 AM CDT, the backup generator being used to power H1 Phase | experienced an electrical issue resulting in service loss for Phase I; Phase II remains unaffected at this time. Our data center operations and facilities teams immediately began investigating the cause of the failure to restore power to the Computer Room Air Conditioner (CRAC) units and Power Distribution Units (PDUs) for Phase I.
The staff successfully tested the 2 megawatt generator without load, so they began powering up the CRAC units and PDUs to restore service to Phase I. While working through this power restoration, the generator's breakers were tripped by their internal electronics. The generator is rated to handle more than the load required to power the phase, and the generator itself is fully functional, but the breaker system must be replaced to guarantee stable power distribution.
We have attempted to locate a replacement generator and are evaluating the time necessary to repair the breakers on the current generator so we can restore power as quickly as possible. We do not have an ETA for power restoration, but we will be updating you hourly with our current status or sooner, as developments warrant. "
June 3 2008, 11:47am
"CRAC units are back online. The facilities and data center operations teams are verifying the stability of the generator, and they will restore power to the PDUs as quickly as possible."
June 3 2008, 8:55am
"Due to an issue with one of our backup generators, we've noted inconsistent power distribution to our CRACs (air conditioning units) and PDUs. Because these key components are fundamental to server racks, customers may note some downtime currently.
We have our data center operations and facilities teams checking the generators, CRACs, PDUs and racks to restore connectivity."
June 2 2008, 11.38pm
Dear Customer,
Late last night, I told you we hoped to have power to the 6,000 servers in Phase 2 of our H1 data center by midnight, with all servers up by early morning. I am glad to say we came close, just a few hours after sunrise. At this time, 100% of our servers in Phase 2 have power, and our technicians are working with customers on any remaining server issues. We are confident all remaining issues will be resolved shortly.
I also explained the significant challenge we faced in the other phase where the actual explosion occurred. Our team came up with a creative way to restore power quicker than the 4-5 day outage. We decided not to wait for equipment for the electrical room completely, opting instead for a temporary solution to get power to the 3,000 servers. That solution involves using generator power for the next 10 -12 days until all the new equipment arrives to rebuild the electrical room for Phase 1. I explained that we expected to have a temporary solution in place by midnight tonight, with servers powered up tomorrow. The good news is that as you read this letter, the power is restored, and the temporary solution is in effect. Within the next two hours, the remaining 3,000 servers have power. We have overstaffed our data centers again to help during this initial power up.
This now leaves us facing step two of this process, which requires getting all of the equipment delivered and then rebuilding the electrical room to its original standard. To make the cutover to the rebuilt electrical room, the operations group believed it would take a maintenance outage of 24-48 hours. I have good news on that front. It's not perfect, but at present we now believe the maintenance window will be just 4-6 hours. That's still too long, and we will continue this week to find ways to reduce the time. Given that there will be some outage for the cutover, we will execute this step at midnight on a Saturday, either June 7 or June 14. We want to pick the most appropriate time to minimize impact to you.
I must admit that I am amazed. We are almost 18 hours ahead of schedule with this phase, thanks to our great suppliers and of course the great folks working here at The Planet. This could never have happened without the help of both, and I want to thank all of them.
There is still more work to do, but the progress is terrific. We will continue to work any and all customer issues, and we face the challenge of putting the permanent power fix in place for Phase 1. Nonetheless, there is still good news based on what I told you last night.
As each hour passes, we learn more and more. Please give us the time to continue our planning. We will provide you with information as we have it.
Until tonight's update ...
Douglas J. Erwin
Chairman & Chief Executive Officer
June 2 2008, 5:15am
Dear Valued Customers:
As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service.
As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed.
There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running.
We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production.
Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues.
We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do.
We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed.
I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service.
I plan to have an audio update tomorrow evening.
Until then,
Douglas J. Erwin
Chairman & Chief Executive Officer
June 1 2008, 5:02am
Dear Valued Customers:
This evening at 4:55 in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room Thankfully, no one was injured. In addition, no customer servers were damaged or lost.
We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.
This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.
We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.
There is no impact in any of our other five data centers.
I am sorry that this accident has occurred and apologize for the impact.
Sincerely,
Douglas J. Erwin
Chairman & Chief Executive Officer