Event Store logo

sss https://eventstore.org Menu

Developer Blog

Murphy

  |   Written by: Greg Young   |   News

As all of us who have released production software before know, if something totally rare and unseen will happen it almost always happens the day of release. In our shared career histories, we’ve personally seen many such examples: routers going bad, disks failing, servers with bad memory, admins blocking the ports you use on the firewalls, admins switching from two-phase to three-phase commit on the DTC (why did we increase in latency by 40%?!), and of course we all love dealing with multiple levels of managed providers. Literally, we have seen all sorts of fun stuff (and we’d love to hear your stories on Twitter – @eventstore)!.

Yesterday we pushed the source for Event Store 2.0.0 so some early adopters can start looking through/testing with/using. We are currently working on making sure the binaries are stable before releasing them along with the NuGet packages. We also released eventstore.org. As we discussed yesterday this is one of our clusters that we have automated for power failures/network partitions. It does about 1,000 power failures per day to help make sure you don’t have a problem when/if you happen to lose power.

At about 0300 local time last night we ended up losing a node. The data hard drive actually went bad – who would have thought such a thing would happen on the day you release?! Windows was not so happy either - the box entered a zombie state. Once we were able to get it to boot up about all we could get it to do was…

…telling us that the disk is reporting device errors. Uh oh…

However the cluster continued running throughout the night (it actually processed another 3GB of writes before we got to it). The power outage test stopped pulling power because it would break the quorum which is what the replication model is based on. We have other tests that actually break the quorum on purpose.

This is the reason that we have the clustering. A single catastrophic failure did not bring down the cluster. A five node cluster can handle two catastrophic failures. Remember that you will never run out of things that can go wrong!

When we came in the morning, we replaced the hard drive and turned the appliance back on. It took about 15–20 minutes to get fully synchronized again, but then happily joined the cluster again the and the power off test resumed.

As for the busted hard drive, we have a feeling that pulling the power on it a few hundred times a day for weeks on end may not have been the intended use but luckily since we’re using an Event Store appliance, the send home warranty covers it. Even more luckily, it does not have far to travel!


Subscribe to the Event Store blog

Get the latest news and tutorials when they are released

You might also like

    Event Store mentioned in 3 Gartner Reports as Event Driven Architecture vendor

      |   Written by: Dave Remy   |   News

    For the past few years now Gartner Group has been mentioning the critical importance of Event Driven Architecture to the modern enterprise (one of the top 10 trends). Event Store are pleased to be listed in 3 recent reports by Gartner as one of the vendors, along with Amazon Web Services, Confluent, Google, IBM, and others who are offering Event Driven Architecture enabling technology. The three Gartner reports are (Note viewing the full reports require...

    Read article

    Video of Ouro’s 2nd Birthday

      |   Written by: Dan Leech   |   News

    In September we held a party in London to celebrate the company—and more importantly our mascot, Ouro—turning two years old. At the party we showed off version 3.0.0 of Event Store. We also gave the stage to four of our community members to share their experience of developing their projects with Event Store. The video of this event is no longer available.

    Read article

    Event Store 3.0.0 - Configuration Changes

      |   Written by: James Nugent   |   News

    As we gear up to launch version 3.0.0 of Event Store at our annual birthday party (which you should totally come to!), we decided it would be a good idea to run a short series of articles describing some of the new features and changes for those who haven’t seen them. If there are things you want to know about in particular, please get in touch on Twitter, @eventstore ##Configuration Changes Since version 1 of...

    Read article