GreenArrow Email Software Documentation

Recover from Database Corruption

Introduction

As good of a database Postgres is, unfortunately data corruption can still happen (e.g. a disk is failing, or an unsafe shutdown occurred). Should this happen, this document is intended to help you recover from the situation.

To execute this procedure, you’ll need to be running GreenArrow 4.274.0 or later.

Recover a database with invalid unique indexes

GreenArrow uses a variety of unique indexes to track data. An example of this usage is in Studio’s subscribers tables. For mailing lists that are not using an alternate primary key, we have a unique index on the lowercased email address. If multiple subscribers somehow get into the mailing list (usually due to a database corruption), this would invalidate this unique index.

In cases like this, we recommend restoring from a backup and repairing the database corruption during the restore process.

To do this, you’ll follow the Restoring Backups procedure as normal, but STOP before running step 16 (this is the step that calls for you to run hvmail_ini_postgres --restore-from-backup .).

In its place, you’ll do the following:

  1. Execute hvmail_ini_postgres --restore-from-backup . --wait-before-creating-indexes to begin restoring the Postgres database.
  2. Wait for the above command to print schema and data loaded, type "continue" to continue restore.
  3. In another terminal/shell, connect to Postgres with the command /var/hvmail/postgres/bin/psql -U greenarrow greenarrow.
  4. Repair the database corruption. If your database corruption is in your subscriber tables, we provide a demonstration of how to do this in the “Create and execute stored procedure for repairing subscriber tables” section below.
  5. Type continue in the original terminal/shell to proceed with creation of indexes.
  6. Resume the Restoring Backups procedure, skipping step 16 and moving onto step 17.

Create and execute stored procedure for repairing subscriber tables

This query will create a stored procedure for repairing subscriber tables:

CREATE OR REPLACE FUNCTION s_remove_duplcate_emails_from_subscribers_table(
	in_table_name varchar
)
RETURNS BOOLEAN
AS $$
  BEGIN

    EXECUTE '
      CREATE TEMPORARY TABLE zzz_subscribers_table_fix
      AS
      SELECT
        mailing_list_id AS "mailing_list_id",
        LOWER(email) AS "lower_email",
        MIN(id) AS "id_to_keep"
      FROM
        ' || in_table_name || '
      GROUP BY 1,2
      HAVING COUNT(1) > 1;
    ';

    CREATE INDEX zzz_subscribers_table_fix_idx
    ON zzz_subscribers_table_fix (mailing_list_id, lower_email);

    EXECUTE '
      DELETE FROM ' || in_table_name || ' aa
      WHERE EXISTS (
        SELECT 1
        FROM zzz_subscribers_table_fix bb
        WHERE aa.mailing_list_id = bb.mailing_list_id
        AND LOWER(aa.email) = bb.lower_email
        AND aa.id <> bb.id_to_keep
      );
    ';

    DROP TABLE zzz_subscribers_table_fix;

    RETURN true;

  END
$$ LANGUAGE plpgsql;

It can be executed on individual subscriber tables like this:

SELECT s_remove_duplcate_emails_from_subscribers_table('s_subscribers_114');

Or it can be executed on all subscriber tables like this:

SELECT s_remove_duplcate_emails_from_subscribers_table(aa.subscribers_table_name)
FROM (
  ( SELECT 's_subscribers' AS subscribers_table_name )
  UNION ALL
  ( SELECT DISTINCT subscribers_table_name FROM s_mailing_lists WHERE subscribers_table_name IS NOT NULL )
) aa;

After completing the repair of all subscriber tables, you can remove the stored procedure:

DROP FUNCTION s_remove_duplcate_emails_from_subscribers_table(varchar);


Copyright © 2012–2024 GreenArrow Email