Our datacenter had a few problems last night.
This is what happened:
Quote:
Update March 20th 5:55am: Servers and services are coming back online as fast as we can possibly get them up. We do need to tend to certain machines that do not come back on reboot, but so far we are able to get service restored. The majority of our team has been on this issue since the first data center power outage, so we are pleased we are able to restore service much faster. Dedicated machines are next on our list to get back up. We have already started on those so they should be coming up as we move forward.
Update March 20th 5:15am: We are in a much better status than we were with the initial outage. We are able to tend to machines individually to get service restored. We are seeing services being restored slowly but surely, though it is a bit of a process.
Update March 20th 5:00am: Power has been restored once again, and we are currently running on our backup generators as we work to bring services back online. We understand the frustrations this is causing, as we are in the same boat as you. Please hang in there, and support us as we work to get everything up from this second data center power outage. We sincerely apologize for the inconvenience.
Update March 20th 4:30am: We are experiencing another power disruption in our Irvine datacenter. Our teams are currently investigating; further details to come.
Update March 20th 3:50am: It looks like all services affected by the outage should be restored by now. If you are still experiencing downtime, please contact support, as it is most likely not related to this issue. Thanks once again for your patience while we’ve worked to resolve this!
Update March 20th 3:00am: We are still clearing up a few of the remaining issues that were caused as a result of the outage, and continue to grow closer to resolving this completely. Fortunately, most services and sites should be up and running by now. We will post a final update once we’re fully back online.
Update March 20th 2:16am: Most sites and services should be back up now, save for a few remaining issues we are still working on clearing up. Thanks for your patience!
Update March 20th 1:35am: The rack we were experiencing issues with in the previous update have been resolved, and services should be back up in the next few moments. We are continuing to work toward a final resolution toward this outage, and will update again shortly.
Update March 20th 12:19am: Services are still slowly but surely coming online. We’ve had one rack of machines giving us a particularly tough time, and are working on that now. We should have more fixed in the next few moments.
Update March 19th 11:41pm: Services are slowly coming up, and addressing each one as we get the next fixed. Also, our data center has extended the UPS maintenance window to 3am PDT.
Update March 19th 10:56pm: Services are still coming up, however, some hosts require manual intervention. We’re seeing some sporadic outages as well, and are working to get those fixed ASAP!
Update March 19th 9:35pm: Ok, we see a lot more sites and servers coming online, so this is good news! We’re still monitoring things, to ensure they come back online smoothly.
Update March 19th 9:15pm: We’re still working away on this, and are now seeing some services restored. You may see some intermittent outages for a bit while we work out the kinks, but services are coming back online slowly.
Update March 19th 8:30pm: We’re having some success resurrecting part of the problem areas of our network. This does not mean hosts are up just yet, but it does mean we’re getting closer.
Update March 19th 7:55pm We are still getting the new network device in place and configured. We’re doing everything we can to work as fast as possible and restore service ASAP.
Update March 19th 7:19pm: Our data center provider is also initiating emergency maintenance on the UPS system that caused the initial power outage. This maintenance is scheduled to be complete at 11:55pm pacific time. We don’t anticipate problems to stem from this, but are bracing for issues, should they pop up.
Update March 19th 6:30pm: As of now we have two network devices, that handle traffic internal to the data center, that are down. One main one, and one backup. Our vendor has said, that BOTH devices were fried during the power outage, and are suggesting we RMA BOTH of them. We are in the process of deploying a spare to that location and we estimate that will take 1-2 hours. We will continue to update as we get more information.
Update March 19th 5:55pm: We are still working on this problem, however it’s proving fairly difficult to resolve. We do have power in our datacenter and most of it is up, however the lingering issue is a routing table we’ve been unable to restore. We’ll continue to update here.
Update March 19th 5:34pm: At this time, we’re still working on the problem, and are going through the many troubleshooting steps needed, however these steps take a bit of time to complete. We’re working as fast as possible to get things sorted out ASAP.
Update March 19th 5:10pm: While we’re continuing to work on bringing all the services back online, we’re happy to report that about 90% of DreamObjects are now available again, and we confirmed that there is no data loss. We appreciate your continued patience while we are working on the rest.
Update March 19th 4:58pm: Despite the diligent work from our technical teams, we are still not fully back up with our services in this location. We are very sorry, and will update this post again with any news we may receive.
Update March 19th 4:28pm: I’m afraid we are still not fully back up at the moment but continue working on it. We are very sorry for the inconvenience this delay is causing and appreciate your patience.
Update March 19th 4:08pm: We are slowly but surely bringing all services back online, including DreamObjects. We are very sorry for the inconvenience this delay is causing.
Update March 19th 3:47pm: We are still working on bringing back all affected services. You can check if you are affected by this power disruption by looking up where your services are hosted in the DreamHost Web Panel. Please note, DreamObjects are also located in this data center. Again, we apologize for the inconvenience.
Update March 19th 3:27pm: After the power came back up, we were able to bring back part of our services. We are slowly but surely bringing all of them back up, so your sites may take a little while longer to start working again. We appreciate your patience during this process.
Update March 19th 3:07pm: We now have power back in the data center. We are investigating the cause, as it affected the whole facility and not just us. We are also working on making sure that all services are coming back online as expected. Your sites may take a little while longer to come back up.
You can check if you are affected by this power disruption by looking up where your services are hosted in the DreamHost Web Panel. Please note, DreamObjects are also located in this data center. Again, we apologize for the inconvenience.
http://www.dreamhoststatus.com/2013/03/ ... irvine-ca/