Self Managed High Availability Clusters
Support for GreenArrow’s High Availability Cluster ends on June 30, 2024.
- Table of Contents
- Overview
- Failover
- Reboots
- IP Address Management
- Data Synchronization
Overview
This page provides additional technical details of how GreenArrow’s High Availability Cluster works, beyond what’s described in the High Availability Cluster Administration page.
This information is intended for experienced Linux systems administrators who are tasked with managing the cluster. The cluster manager should be comfortable working with DRBD and PostgreSQL, managing RPM packages, working with kernel modules, reading shell scripts and log files, and configuring a Linux server’s IP addresses.
Note for Managed Customers
As part of our Managed Services, we perform the tasks described on this page for you. You should only perform the tasks described on this page if pre-approved by, and scheduled with GreenArrow.
After-hours support for issues determined not to be Critical Malfunctions may be subject to our after-hours service rate. This includes being paged by our monitoring system as a result of unscheduled maintenance.
Failover
Failover is the process of switching which server is running GreenArrow’s production services. This might be done if, for example, you need to perform hardware maintenance on the server that’s currently acting as Primary.
Failover usually takes about a minute to complete, during which GreenArrow services are offline.
Here’s how to failover. All commands should be run as the root
user:
-
Demote the server that’s currently filling the Primary role so that it becomes a Secondary. This step is required because only one server can fill the Primary role at a time:
hvmail_make_secondary
Wait for the above command to finish executing before moving onto the next step.
If the
hvmail_make_secondary
command fails for some reason, it’s safe to correct whatever the underlying issue was, then run the command a second time. -
Promote the desired server to the Primary role:
hvmail_make_primary
-
Complete the steps in the “Diagnostics” section of the High Availability Cluster Administration page to verify that services are running normally.
Reboots
The hvmail_make_secondary
or hvmail_make_primary
script (described in this document’s “Failover” section) will need to be run as the root
user each time a server in the cluster is rebooted.
It’s safe to automatically run the hvmail_make_secondary
script at boot time, but caution should be used when automating the execution of the hvmail_make_primary
script. This is because only one server should be Primary at any given time.
IP Address Management
All of GreenArrow’s IPs should be assigned to whichever server is acting as the Primary at any given point. This means that your Linux distribution’s standard IP address configuration files should only be used for IPs that don’t move between servers. You should list all IPs that should be assigned to the server that’s acting as the Primary in the /etc/hvmail_primary_ips
file, along with each IP’s subnet mask in CIDR format.
Here’s an example of what that file would look like if you were assigning the 1.2.3.4
and 1.2.3.5
IPs, each with a /24
subnet mask (255.255.255.0
):
1.2.3.4/24
1.2.3.5/24
Be sure to synchronize any changes made to the /etc/hvmail_primary_ips
file between servers. For example, you might use scp
to copy the file between servers after each edit.
After adding or removing IPs in the configuration file, apply the changes manually by using the ip
command. This way you won’t create downtime by failing over to apply the changes. Here are a few examples:
-
To add
1.2.3.4/24
to theeth0
interface, run:ip addr add 1.2.3.4/24 dev eth0
-
To add
1.2.3.2
through1.2.3.100
to theeth0
interface, run:for i in $(seq 2 100); do ip addr add 1.2.3.$i/24 dev team1; done
-
To remove
1.2.3.4/24
from theeth0
interface, run:ip addr del 1.2.3.4/24 dev eth0
Remember also to configure (or remove) the IP Address VirtualMTAs.
Data Synchronization
A High Availability Cluster synchronizes its data using the following methods:
- PostgreSQL Streaming Replication
- DRBD
The following sections provide more details on each.
PostgreSQL Streaming Replication
GreenArrow’s PostgreSQL database is synchronized using PostgreSQL streaming replication. PostgreSQL 16 is used in recent installations. Some legacy installations use older PostgreSQL releases and can be migrated to 16 upon request.
As part of the initial configuration process, GreenArrow configures streaming replication by updating the following configuration files:
/var/hvmail/control/postgres.conf
/var/hvmail/postgres/default/data/pg_hba.conf
-
/var/hvmail/postgres/default/data/recovery.done
on the server filling the Primary role and/var/hvmail/postgres/default/data/recovery.conf
on the server filling the Secondary role.
The PostgreSQL Network Service page contains more details on GreenArrow’s PostgreSQL installation.
DRBD
GreenArrow’s non-PostgreSQL data is synchronized using DRBD 8.4.
DRBD’s configuration file is located at /etc/drbd.conf
.
DRBD Troubleshooting
If DRBD fails with a “Can not load the drbd module” error following a package update, then the kmod-drbd83
or kmod-drbd84
package may be incompatible with the running kernel.
There are two known ways to resolve this:
-
Downgrade to a
kmod-drbd83
orkmod-drbd84
package that’s compatible with the running kernel. -
Reboot so that you’re running a later kernel.
Here’s an example of how to upgrade other packages while skipping kmod-drbd84
:
yum update --exclude=kmod-drbd84