Instance Drain
- Table of Contents
- Introduction
- What data is moved
- Data that is not moved
- How to drain an instance
- Example Invocation
Introduction
Draining a GreenArrow instance is the process of moving the queue of undelivered email messages from one instance of GreenArrow to one or more other instance(s) of GreenArrow.
This is helpful for running a variable number of GreenArrow instances. When you have more instances provisioned than you currently need, you can drain & stop one or more of those instances. This procedure will move the data on the instance being decommissioned to one or more other instances that will remain operational.
What data is moved
- Email messages that are in-queue.
- This includes all messages, regardless of if those messages have received their first delivery attempt.
- Each message’s next retry time will be maintained and respected by the destination instance.
- All other queues of email messages waiting to be processed, including:
- Lite Bounce Processor & Full Bounce Processor bounces that are waiting to be processed.
- HTTP Submission API messages that are waiting to be processed.
Additionally, the drain process waits for all events to be delivered by the Event Delivery System. Events are not moved between GreenArrow instances, but the drain process waits for the event processor to finish delivering all events before it proceeds with moving email messages to other instances(s) of GreenArrow.
Data that is not moved
- Local log files.
- Email messages that have been delivered to local mailboxes.
- Engine statistics data.
- All GreenArrow Studio data.
- Any other data on the instance that was not explicitly listed in the “What data is moved” section above.
How to drain an instance
To drain an instance of GreenArrow, run the following greenarrow
command:
greenarrow: Usage:
greenarrow service drain [DESTINATIONS...]
This command is used to shutdown and drain this server of its local data.
Running this command will do the following:
1. Stop all GreenArrow services except for the event processor, Postgres, and Redis.
2. Wait for all in-queue events to be delivered.
3. Distribute data to the server address(es) given to this command, removing it
from the local system after it has been handed-off.
4. Stop the remaining GreenArrow services.
Refer to the following documentation for more information:
https://www.greenarrowemail.com/docs/greenarrow-engine/Server-Management-and-Backups/Instance-Drain
Application Options:
--bind-ip= specify the local ip address to bind when establishing outgoing HTTPS connections
--no-wait-on-events do not wait for events to deliver before draining GreenArrow
--max-event-wait= maximum duration to wait for events to deliver before exiting failure (e.g. 15s, 2m)
--concurrency= number of outgoing messages to send concurrently (default: 2*cpu, min 10, max 64)
Help Options:
-h, --help Show this help message
Terminating this process with SIGTERM
or SIGINT
will cause the drain to be cleanly shutdown, removing any messages
from the local system that were successfully distributed to a destination. Terminating this process with SIGKILL
, then
re-running it, could result in duplicate messages.
Exit Codes
The drain process will return with one of the following exit status codes:
0 |
Success. This instance has been successfully drained. It could now either be retired or restarted later. |
1 |
Unspecified “other” error condition. See the output on STDOUT and/or STDERR for more information. |
10 |
The maximum amount of time to wait for the events queue to drain expired. |
11 |
One or more messages could not be read from the local system. The messages that could be read were successfully submitted to the destination(s). See the output on STDOUT and/or STDERR for more information. |
12 |
One or more messages could not be distributed to destination systems. See the output on STDOUT and/or STDERR for more information. |
40 |
All messages were successfully drained, but an error was encountered while removing messages from the local system. There is a risk of message duplication if the drain is rerun or GreenArrow is restarted. |
41 |
Message from this system were partially drained, but an error was encountered while removing messages from the local system. There is a risk of message duplication if the drain is rerun or GreenArrow is restarted. |
42 |
The drain was terminated with |
If the drain exits with any of the above exit codes that are less than 40, it is safe to either rerun the drain or restart GreenArrow after the drain exits.
If the drain exits with an exit code greater than or equal to 40, it is considered unsafe to rerun the drain or
restart GreenArrow for risk of message duplication. This includes exit code 137 which may result from the drain
receiving the SIGKILL
signal.
Destination addresses
The greenarrow service drain
command accepts one or more addresses (either in the form of IP addresses or hostname, or
a mix thereof, with an optional port (e.g. example.com:8080
)) of other GreenArrow instances that are ready to accept
the drained data. The HTTPS port (443, unless otherwise specified in the destination) must be reachable from the
instance that is being drained.
Because this process uses HTTPS, one or more of the destinations could be a load balancer, which in turn distributes the request to one or more GreenArrow instances. This is useful in cases where it might not be trivial to know the list of instances ready and available within a cluster.
The IP used by the instance being drained (either its default outgoing IP or the one specified to the --bind-ip
parameter) must be matched by an IP or CIDR range specified to the accept_drain_from on each of the destination
instance(s).
If you’re going use a load balancer to distribute the drain requests, you must either add the load balancer’s IP address
to accept_drain_from (which can be dangerous if the load balancer is exposed to the public internet) or add the
load balancer’s IP or CIDR range to /var/hvmail/control/opt.engine.trusted_proxy_ips
. In the latter case, the load balancer
must add an X-Forwarded-For
or Client-IP
header with the original instance’s IP address (similar to
Processing Clicks and Opens Behind a Proxy Server).
The destination instance number of workers processing the incoming data can be tuned using incoming_drain_workers.
Data distribution
Drained data will be distributed evenly among the destinations provided. If a destination cannot be reached or the connection otherwise fails, that message will be sent to the next server. The failed destination will be tried again the next time it comes up in the round-robin.
Skip waiting for delivery of events
You may optionally provide the --no-wait-on-events
parameter if you do not wish to wait
for the event queue to be delivered prior to draining & stopping GreenArrow.
Exit status
You may rely on this command’s exit status to know whether or not the drain was successfully completed. The drain command will continue to try draining until success (or, for example, until it receives a TERM signal). If the command exits cleanly with status=0, the drain is completed and this instance is ready to be decommissioned.
If the drain process is interrupted, data that has not yet been successfully transmitted to one of the destination instances will remain on its current instance. No data is removed from the instance being drained until its reception is acknowledged by a destination instance.
If a drain process returns a failure, you may either: (a) re-run the drain command to try again to drain messages, or (b) re-start the MTA to continue serving the email messages that remain in the node.
Example Invocation
This is what an example invocation might look like:
greenarrow service drain 172.0.0.2 127.0.0.3 127.0.0.4