High availability Event Store allows you to run more than one node as a cluster. There are two modes available for clustering:
This document covers setting up Event Store with manager nodes and database nodes.
Each physical (or virtual) machine in an Event Store cluster typically runs one manager node and one database node. It is also possible to have multiple database nodes per physical machine, running under one manager node.
Manager nodes have a number of responsibilities:
Each database or manager node can have a variety of configuration sources. Each source has a priority and determines running configuration by evaluating each source and applying the option from the source with the highest priority.
From lowest to highest priority, the sources of configuration are:
You can determine the configuration of a node by passing the
-WhatIf flag to the process.
Event Store clusters follow a “shared nothing” philosophy, meaning that clustering requires no shared disks for clustering to work. Instead, several database nodes store your data to ensure it isn’t lost in case of a drive failure or a node crashing.
Event Store uses a quorum-based replication model, in which a majority of nodes in the cluster must acknowledge that a write has been committed to disk before acknowledging the write to the client. This means that to be able to tolerate the failure of n nodes, the cluster must be of size (2n + 1). A three-database-node cluster can continue to accept writes in the case that one node is unavailable, a five-database-node cluster can continue to accept writes in the case that two nodes are unavailable, and so forth.
A typical deployment topology consists of three physical machines, each running one manager node and one database node, as shown in figure 1. Each of the physical machines may have two network interfaces, one for communicating with other cluster members, and one for serving clients. Although it may be preferable in some situations to run over two separate networks, it is also possible to use different TCP ports on one interface.
Event Store uses a quorum-based replication model. When working normally, a cluster will have one database node which is known as a master, and the remaining nodes will be slaves. The master node is responsible for coordinating writes while it is the master. Database nodes use a consensus algorithm to determine which database node should be master and which should be slaves. Event Store bases the decision as to which node should be the master on a number of factors (some of which are configurable).
For database nodes to have this information available to them, the nodes gossip with other nodes in the cluster. Gossip runs over the internal (and optionally the external) HTTP interfaces of database nodes, and over both internal and external interfaces of manager nodes.
Manager and database nodes need to know about one another to gossip. To start this process, you provide gossip seeds or the addresses where it can find other nodes, to each node. When running with manager nodes, it normally uses the following approach:
The preferred method is via a DNS entry. To set this up, create a DNS entry for the cluster with an A record pointing to each member of the cluster. Each manager will then look up other nodes in the cluster during the startup process based on the DNS name. Since DNS only provides information about addresses, you need to use a consistent TCP port across the cluster for gossip.
This example shows the configuration for a three node cluster, running in the typical setup of one manager node and one database node per physical machine, with cluster discovery via DNS. Each machine has one network interface, therefore uses different ports for the internal and external traffic. All nodes, in this case, are running Windows, so the manager nodes will run as Windows services.
Figure 2 below shows the setup. The important points for writing configuration files are:
To configure the cluster correctly, there are a number of steps to follow:
cluster1.eventstore.localwith an A record for each node.
It depends on which DNS server is in use, but the eventual lookup should read:
nslookup cluster1.eventstore.local Server: 192.168.1.2 Address: 192.168.1.2#53 Name: cluster.eventstore.local Address: 192.168.1.11 Name: cluster.eventstore.local Address: 192.168.1.12 Name: cluster.eventstore.local Address: 192.168.1.13
All three nodes will be similar in configuration. The important configuration points are IP Addresses for internal and external interfaces, the ports for each endpoint, the location of the database file, the size of the cluster and the endpoints from which to seed gossip (in this case the local manager). We assume in this case that Event Store stores data on the _D:_ drive.
The configuration files are written in YAML, and reads as follows for the first node:
Filename : database.yaml
Db: d:\es-data IntIp: 192.168.1.11 ExtIp: 192.168.1.11 IntTcpPort: 1112 IntHttpPort: 2112 ExtTcpPort: 1113 ExtHttpPort: 2113 DiscoverViaDns: false GossipSeed: ['192.168.1.11:30777'] ClusterSize: 3
For each subsequent node, the IP Addresses change, as does the gossip seed (since it is the manager running on the same physical machine as each node).
Again, all three nodes will be similar in configuration. The important configuration points are the IP addresses for the internal and external interfaces, the ports for the HTTP endpoints, the log location, and the DNS information about other nodes. Another important piece of configuration is which database nodes the manager is responsible for starting. This is defined in a separate file (the watchdog configuration), the path to which is specified as
WatchdogConfig in the manager configuration.
The configuration files are written in YAML, and for the first node reads as follows:
IntIp: 192.168.1.11 ExtIp: 192.168.1.11 IntHttpPort: 30777 ExtHttpPort: 30778 DiscoverViaDns: true ClusterDns: cluster1.eventstore.local ClusterGossipPort: 30777 EnableWatchdog: true WatchdogConfig: c:\EventStore-Config\watchdog.esconfig Log: d:\manager-log
The watchdog configuration file details which database nodes the manager is responsible for starting and supervising. Unlike the other configuration files, the manager configuration uses a custom format instead of YAML. Each node for which the manager is responsible has one line in the file, which starts with a
# symbol and then details the command line options given to the database node when it is started. Under normal circumstances, this will be the path to the database node’s configuration file.
For the first node in the example cluster, the watchdog configuration file reads as follows:
# --config c:\EventStore-Config\database.yaml
Having written configuration files for each node, you can now deploy Event Store and configuration. Although it is possible to use relative paths when writing configuration files, it is preferable to use absolute paths to reduce the potential for confusion.
In this case, Event Store is deployed on each node in _c:\EventStore-HA-v3.0.1_, and the configuration files for that node are deployed into _C:\EventStore-Config_. No installation process is necessary, the packaged distribution can be unzipped, assuming they have been unblocked following download.
To allow for non-elevated users to run HTTP servers on Windows, you must add entries to the access control list using
netsh. By default, the manager node runs as
NT AUTHORITY\Local Service, so this is the user who must have permission to run the HTTP server.
The commands used to add these entries on node one are as follows (Run as an elevated user):
# Database Node Internal HTTP Interface netsh http add urlacl url=http://192.168.1.11:2112/ user="NT AUTHORITY\LOCAL SERVICE" # Database Node External HTTP Interface netsh http add urlacl url=http://192.168.1.11:2113/ user="NT AUTHORITY\LOCAL SERVICE" # Manager Node Internal HTTP Interface netsh http add urlacl url=http://192.168.1.11:30777/ user="NT AUTHORITY\LOCAL SERVICE" # Manager Node External HTTP Interface netsh http add urlacl url=http://192.168.1.11:30778/ user="NT AUTHORITY\LOCAL SERVICE"
You can install manager nodes as a Windows service, so they can start on boot rather than running in interactive mode. Each manager service is given an instance name, which becomes the name of the service (and part of the description for easy identification). The service is installed by default with a startup type of “Automatic (Delayed Start)”.
To install the manager node on machine 1, use the following command is:
C:\EventStore-HA-v3.0.1\> EventStore.WindowsManager.exe install -InstanceName es-cluster1 -ManagerConfig C:\EventStore-Config\manager.yaml
The service will then be visible in the services list, with a description of “Event Store Manager (es-cluster1)”.
To uninstall the manager node service in future, use the following command (where the instance name matches the name used during installation).
C:\EventStore-HA-v3.0.1\> EventStore.WindowsManager.exe uninstall -InstanceName es-cluster1
net start es-cluster1command.
net stop es-cluster1command.