Instance Drain

Table of Contents
Introduction
What data is moved
Data that is not moved
How to drain an instance
Example Invocation

Introduction

Draining a GreenArrow instance is the process of moving the queue of undelivered email messages from one instance of GreenArrow to one or more other instance(s) of GreenArrow.

This is helpful for running a variable number of GreenArrow instances. When you have more instances provisioned than you currently need, you can drain & stop one or more of those instances. This procedure will move the data on the instance being decommissioned to one or more other instances that will remain operational.

What data is moved

Email messages that are in-queue.
- This includes all messages, regardless of if those messages have received their first delivery attempt.
- Each message’s next retry time will be maintained and respected by the destination instance.
All other queues of email messages waiting to be processed, including:
- Lite Bounce Processor & Full Bounce Processor bounces that are waiting to be processed.
- HTTP Submission API messages that are waiting to be processed.

Additionally, the drain process waits for all events to be delivered by the Event Delivery System. Events are not moved between GreenArrow instances, but the drain process waits for the event processor to finish delivering all events before it proceeds with moving email messages to other instances(s) of GreenArrow.

Data that is not moved

Local log files.
Email messages that have been delivered to local mailboxes.
Engine statistics data.
All GreenArrow Studio data.
Any other data on the instance that was not explicitly listed in the “What data is moved” section above.

How to drain an instance

To drain an instance of GreenArrow, run the following greenarrow command:

greenarrow: Usage:
  greenarrow service drain [DESTINATIONS...]

This command is used to shutdown and drain this server of its local data.

Running this command will do the following:

  1. Stop all GreenArrow services except for the event processor, Postgres, and Redis.

  2. Wait for all in-queue events to be delivered.

  3. Distribute data to the server address(es) given to this command, removing it
     from the local system after it has been handed-off.

  4. Stop the remaining GreenArrow services.

Refer to the following documentation for more information:

  https://www.greenarrowemail.com/docs/greenarrow-engine/Server-Management-and-Backups/Instance-Drain

Application Options:
      --bind-ip=           specify the local ip address to bind when
                           establishing outgoing HTTPS connections
      --no-wait-on-events  do not wait for events to deliver before draining
                           GreenArrow
      --max-event-wait=    maximum duration to wait for events to deliver
                           before exiting failure (e.g. 15s, 2m)
      --concurrency=       number of outgoing messages to send concurrently
                           (default: 2*cpu, min 10, max 64)
      --prof-port=         open this port number for go performance profiling

Help Options:
  -h, --help               Show this help message

Terminating this process with SIGTERM or SIGINT will cause the drain to be cleanly shutdown, removing any messages from the local system that were successfully distributed to a destination. Terminating this process with SIGKILL, then re-running it, could result in duplicate messages.

Exit Codes

The drain process will return with one of the following exit status codes:

0	Success. This instance has been successfully drained. It could now either be retired or restarted later.
1	Unspecified “other” error condition. See the output on STDOUT and/or STDERR for more information.
10	The maximum amount of time to wait for the events queue to drain expired.
11	One or more messages could not be read from the local system. The messages that could be read were successfully submitted to the destination(s). See the output on STDOUT and/or STDERR for more information.
12	One or more messages could not be distributed to destination systems. See the output on STDOUT and/or STDERR for more information.
40	All messages were successfully drained, but an error was encountered while removing messages from the local system. There is a risk of message duplication if the drain is rerun or GreenArrow is restarted.
41	Message from this system were partially drained, but an error was encountered while removing messages from the local system. There is a risk of message duplication if the drain is rerun or GreenArrow is restarted.
42	The drain was terminated with `SIGTERM` or `SIGINT` but the drain failed to cleanly shutdown. There is a risk of message duplication if the drain is rerun or GreenArrow is restarted.

If the drain exits with any of the above exit codes that are less than 40, it is safe to either rerun the drain or restart GreenArrow after the drain exits.

If the drain exits with an exit code greater than or equal to 40, it is considered unsafe to rerun the drain or restart GreenArrow for risk of message duplication. This includes exit code 137 which may result from the drain receiving the SIGKILL signal.

Destination addresses

The greenarrow service drain command accepts one or more addresses (either in the form of IP addresses or hostname, or a mix thereof, with an optional port (e.g. example.com:8080)) of other GreenArrow instances that are ready to accept the drained data. The HTTPS port (443, unless otherwise specified in the destination) must be reachable from the instance that is being drained.

Because this process uses HTTPS, one or more of the destinations could be a load balancer, which in turn distributes the request to one or more GreenArrow instances. This is useful in cases where it might not be trivial to know the list of instances ready and available within a cluster.

The IP used by the instance being drained (either its default outgoing IP or the one specified to the --bind-ip parameter) must be matched by an IP or CIDR range specified to the accept_drain_from on each of the destination instance(s).

If you’re going use a load balancer to distribute the drain requests, you must either add the load balancer’s IP address to accept_drain_from (which can be dangerous if the load balancer is exposed to the public internet) or add the load balancer’s IP or CIDR range to /var/hvmail/control/opt.engine.trusted_proxy_ips. In the latter case, the load balancer must add an X-Forwarded-For or Client-IP header with the original instance’s IP address (similar to Processing Clicks and Opens Behind a Proxy Server).

The destination instance number of workers processing the incoming data can be tuned using incoming_drain_workers.

Data distribution

Drained data will be distributed evenly among the destinations provided. If a destination cannot be reached or the connection otherwise fails, that message will be sent to the next server. The failed destination will be tried again the next time it comes up in the round-robin.

Skip waiting for delivery of events

You may optionally provide the --no-wait-on-events parameter if you do not wish to wait for the event queue to be delivered prior to draining & stopping GreenArrow.

Exit status

You may rely on this command’s exit status to know whether or not the drain was successfully completed. The drain command will continue to try draining until success (or, for example, until it receives a TERM signal). If the command exits cleanly with status=0, the drain is completed and this instance is ready to be decommissioned.

If the drain process is interrupted, data that has not yet been successfully transmitted to one of the destination instances will remain on its current instance. No data is removed from the instance being drained until its reception is acknowledged by a destination instance.

If a drain process returns a failure, you may either: (a) re-run the drain command to try again to drain messages, or (b) re-start the MTA to continue serving the email messages that remain in the node.

Example Invocation

This is what an example invocation might look like:

greenarrow service drain 172.0.0.2 127.0.0.3 127.0.0.4