SimpleMH Click and Open Tracking
- Table of Contents
- Overview
- Position of Open Tracking Image
- Dynamic Data
- Example Query
- Using HTTPS
- Event Tracking Metadata Storage
- Setting a User-Defined Link ID
- Skipping Click Tracking
Overview
GreenArrow Engine offers click and open tracking when the SimpleMH method is used to inject mail. SimpleMH’s click and open tracking facility can be turned on or off on a per-Mail Class basis.
Click tracking rewrites links into a URL that GreenArrow Engine’s HTTP server listens on.
Open tracking inserts tracking images into HTML emails. If images are loaded by the recipient, an open gets registered.
If you’d like to receive notifications about click and open events, the Event Notification System can do this for you.
Click and open data is stored in two tables in GreenArrow Engine’s PostgreSQL database. The data in these tables should be treated as read-only. Here are the table structures:
clickthrough_clicks Table
Column | Type | Description |
---|---|---|
id |
integer | Primary key for this table. |
urlid |
integer | Primary key of the clickthrough_urls entry this record corresponds to. |
clicktime |
integer | Time in seconds past the Unix epoch that the click occured. |
emailaddress |
character varying | Email address of the subscriber who clicked. |
html_or_text |
character(1) |
h for an HTML email, or t for a text email. |
email_code |
integer | Value contained in the X-GreenArrow-Click-Tracking-ID header, if present. |
email_code_text |
character varying | Value contained in the X-GreenArrow-Click-Tracking-ID header , if present. |
clickthrough_urls Table
Column | Type | Description |
---|---|---|
id |
integer | Primary key for this table. |
sendid |
character varying(100) | SendID of the message that was clicked or opened. |
listid |
character varying(100) | ListID of the message that was clicked or opened. |
url |
text | The original URL for links, or an empty string for opens. |
Position of Open Tracking Image
When SimpleMH open tracking is enabled, a tracking image will be inserted into the HTML part.
To control the position of the open tracking image, add <opentag/>
to your
HTML part where you’d like it positioned. This must be done before the closing
body tag (</body>
). The tracking image is inserted into the first of the
following within the HTML part:
- In place of the first
<opentag>
or<opentag/>
tag. If more than one of these tags exist, the others are kept in the HTML unaltered. - Immediately before the first
</body>
tag. - Appended to the end of the HTML.
Dynamic Data
When dynamic data is being used, and you have control over how the URL is structured, it’s possible to reduce database bloat by putting dynamic data after a question mark. Database entries for URLs containing query strings are truncated at the question mark. The question mark, and query string following it are encoded in the re-written URL. For example, the following URL:
http://server.example.com?query=string¶ms=included
Would be stored in SimpleMH’s database as:
http://server.example.com
The re-written URL would look like:
http://greenarrow.example.com/click/e72/HZGVmYXVsdDEwMDAxLHQxLGh0dHA6Ly93d3cuZHJoLm5ldA/qP3F1ZXJ5PXN0cmluZyZwYXJhbXM9aW5jbHVkZWQ/scd6c91ef45
As a result, if a URL that’s inserted into a campaign is distinct for each subscriber, and contains subscriber-identification data following a question mark, SimpleMH is able to process this efficiently, and create only a single row in the clickthrough_urls
table.
If a URL that’s inserted into a campaign is distinct for each subscriber, and contains subscriber-identification data that does not follow a question mark, then SimpleMH will insert a new row in the clickthrough_urls
table for each user. This can lead to database bloat.
Example Query
Here’s an example query that displays opens for SendID mm100525
:
SELECT * FROM clickthrough_clicks WHERE id = (SELECT id FROM clickthrough_urls WHERE sendid = 'mm100525' AND url = '');
The SendID is constructed in accordance to the Mail Class’s configured Statistic Report Grouping.
Using HTTPS
GreenArow’s Apache instance listens on TCP ports 80
(HTTP) and 443
(HTTPS) by default.
We recommend using HTTPS on port 443
for click and open tracking. TLS Certificate Configuration shows how to configure HTTPS.
If GreenArrow’s Apache instance is on the same server as another Apache instance that is bound to port 443
, one solution is to bind each Apache instance to a specific IP address. Instructions for doing with GreenArrow Engine’s HTTP server are in the HTTP Server Configuration Document.
Event Tracking Metadata Storage
By default, SimpleMH uses an internal database for tracking recipient email addresses and link URLs. When a click, open, unsubscribe, bounce or spam complaint is received, the data is retrieved from that internal database. For most usages of GreenArrow, this default behavior is acceptable and fast.
When using GreenArrow in a clustered configuration (such as Processing Events on Dedicated Servers), however, this behavior is not desireable – because events triggered due to messages delivered by one GreenArrow node might be processed on another GreenArrow node.
This is why we offer multiple options for SimpleMH Event Tracking Metadata Storage.
The system default Event Tracking Metadata Storage is set using default_event_tracking_metadata_storage.
However, some systems may have a legacy configuration
file /var/hvmail/control/opt.simplemh_stateless_event_handling
set to 1
. In that case, the
default is stateless
. The default_event_tracking_metadata_storage directive takes precedence over the legacy configuration file.
Local
This is the default mode for GreenArrow. With Local Metadata Event Tracking, event metadata (such as recipient email address and Click-Tracking-ID) is stored on disk on the GreenArrow node that delivered the email.
These events (clicks, opens, etc) must be processed by the same node that delivered the message.
This results in short click tracking links that minimally inflate your message size.
Stateless
With Stateless Metadata Event Tracking, SimpleMH can be configured to embed the email address, link URL, and other message metadata in the message itself instead of recording it in its database.
Stateless Metadata Event Tracking has two advantages:
- It allows you to offload event processing to another GreenArrow server.
- It reduces disk space requirements.
The downside of Stateless Metadata Event Tracking is that it causes the average email
size to increase since the message itself is used to store this extra information.
For clicks, opens, and unsubscribes the information is embedded into the link.
Configuring simplemh_compress_links can somewhat help with the length of link URLs.
For bounces, the information is inserted into the email as an
X-Mailer-Info-Extra
header.
Regardless of whether or not Stateless Metadata Event Tracking is configured, repeat bounce counting only takes into account the bounces for the particular server on which they are processed.
Stateless Metadata Event Tracking can be configured in the following ways:
-
On the individual message, provide the following header:
X-GreenArrow-EventTrackingMetadataStorage: stateless
-
On the Mail Class, set
Event Tracking Metadata Storage
toStateless Metadata Event Tracking
. -
For all mail classes / messages that don’t otherwise set
Event Tracking Metadata Storage
, set default_event_tracking_metadata_storage.
External
External Metadata Event Tracking is an option that lets you provide an external Postgres database
to which GreenArrow will connect for its Event Tracking Metadata. This external database is used for
the metadata needed to process engine_click
and engine_unsub
events. For other events (such as
engine_open
or scomp
), this mode is the same as Stateless Metadata Event Tracking.
This mode offers the following advantages over the other options:
-
The tracking metadata is centralized, so emails sent from one GreenArrow node can have its events processed on any other GreenArrow node that is configured to use the same External Metadata Event Tracking Metadata database connection.
-
Click tracking links are as short as possible – approximately 70 bytes plus the length of your domain name. This is regardless of the length of the destination URL. The links in this mode look like this:
https://example.com/click?Pz1HRlKdRPkW1b2Uk8tUfxIz1HRlKdRPkW1b2Uk8tUfxI903b7e002a
The downsides of External Metadata Event Tracking include:
-
You’re responsible for managing the external Postgres server, configuring replication for high availability, backing it up, and pruning its dataset.
-
If there’s a service disruption with your external Postgres server, no External Metadata Event Tracking clicks, opens, and unsubscribes will be processed.
To configure External Metadata Event Tracking:
- Create a Postgres Database Connection in GreenArrow using either the UI, API, or greenarrow.conf.
- Set external_metadata_event_tracking_database to that Postgres Database Connection.
- Add the following schema to your external Postgres server:
create table ga_tracking_data ( id uuid not null primary key, -- will be a random uuid date date not null, -- UTC date this entry was created data jsonb not null -- json blob containing the tracking data ); -- create an index for pruning create index ga_tracking_data__created_at_idx on ga_tracking_data (date);
External Metadata Event Tracking can be applied to messages in the following ways:
-
On the individual message, provide the following header:
X-GreenArrow-EventTrackingMetadataStorage: external
-
On the Mail Class, set
Event Tracking Metadata Storage
toExternal Metadata Event Tracking
.
If external_metadata_event_tracking_database is not configured, or if it is not reachable during
pre-delivery message processing, then messages that would otherwise use External Metadata Event Tracking
will
instead use Stateless Metadata Event Tracking
.
The total number of possible connections to the external_metadata_event_tracking_database
can be up to the sum of apache_max_clients + simplemh_max_servers + /var/hvmail/control/opt.simplemh.redis_num_workers
.
Pruning External Metadata
GreenArrow does not automatically prune data from the metadata tracking table.
To prune, ensure this index exists:
CREATE INDEX IF NOT EXISTS ga_tracking_data__created_at_idx ON ga_tracking_data (date);
Then, you can prune old data using whatever time line you’d like:
DELETE FROM ga_tracking_data
WHERE date < ((NOW() AT TIME ZONE 'UTC') - '90 days'::interval);
VACUUM VERBOSE ga_tracking_data;
Note that VACUUM
will not free disk space – it will instead mark the deleted rows available
for re-use.
Troubleshooting External Metadata Database Connections
Errors during delivery
When sending messages, if there is a problem with your External Metadata Database Connection, then GreenArrow will fall-back and generate the message using Stateless Metadata Event Tracking.
To find out why GreenArrow is falling-back to Stateless, and you use SMTP injection, you can run the following command to review the last 10 minutes of failures:
logdir_select_time --last '10 minutes' --dir /var/hvmail/log/simplemh \
| tai64nlocal \
| grep 'ERROR EXTERNAL METADATA'
If you inject using the HTTP Submission API (as opposed to SMTP), then you’ll want to run this command instead:
logdir_select_time --last '10 minutes' --dir /var/hvmail/log/simplemh2 \
| tai64nlocal \
| egrep 'ERROR EXTERNAL METADATA|PHP Warning'
Errors during event processing
If GreenArrow cannot connect to the External Metadata Database while processing a click,
the end-user will receive an error message. Additional details will be logged to
/var/hvmail/apache/logs/error_log
.
Run the following command to view only the most recent 50 errors communicating with your External Metadata Database:
cat /var/hvmail/apache/logs/error_log \
| grep 'ERROR EXTERNAL METADATA' \
| tail -n 50
These errors will look something like this:
[Tue Jun 25 20:02:17.947168 2024] [php:warn] [pid 40635] [client 127.0.0.1:37599] PHP Warning: pg_query_params(): Query failed: ERROR: syntax error at or near "f"\nLINE 1: SELECT data, id x f b FROM ga_test_table WHERE id IN! ($1, $...\n ^ in /var/hvmail/webapp/click/click.php on line 157
[Tue Jun 25 20:02:17.947186 2024] [php:notice] [pid 40635] [client 127.0.0.1:37599] error: ERROR EXTERNAL METADATA: original_url=http://test.localhost/click?Pu02-MavNR-G1CLkUQMi8dzvfQOOWJ9Td6BnAPAfsd5c2a01b original_domain=test.localhost request_domain=127.0.0.1: cannot query the external event tracking metadata database
The log message includes:
-
original_url
– This is our best attempt to determine what the original URL was that was clicked on. If you have a proxy in front of GreenArrow that modifies the request, the actual URL might have been different than this. -
original_domain
– The domain name that was used in the link that was clicked on. -
request_domain
– The domain that was issued as a request to GreenArrow directly. This may be different thanoriginal_domain
if you have a proxy in front of GreenArrow.
Setting a User-Defined Link ID
If you have a specific instance of a link that you’d like to track separately
from other instances of that same URL, you can add the HTML attribute
data-ga-linkid="linkid"
immediately after the href=
attribute.
The Link ID may have a maximum length of 100 UTF-8 characters. Link IDs longer than this maximum length are ignored. Lead and trailing whitespace is trimmed.
Links that have a User-Defined Link ID are tracked separately in the statstics screen and API.
For example:
<a href="https://example.com" data-ga-linkid="link123">Our Great Link</a>
If the link ID is not quoted, it will end at the first space or >
after the =
, for example:
<a href="https://example.com" data-ga-linkid=link123 data-something-else="foo">A</a>
<a href="https://example.com" data-ga-linkid=link123>B</a>
This data attribute is not removed from the HTML before delivery.
Skipping Click Tracking
If you have a link for which you’d like to skip click tracking
(so the URL will not be rewritten), you can add the HTML attribute
data-ga-notrack
immediately after the href=
attribute (or
immediately after the data-ga-linkid=
attribute if it is also being used).
For example:
<a href="https://example.com" data-ga-notrack>Our Great Link</a>
<a href="https://example.com" data-ga-linkid="link-123" data-ga-notrack>Our Great Link</a>
This data attribute is not removed from the HTML before delivery.