Disk Usage
- Table of Contents
- Overview
- The Report Hierarchy
- Performance Considerations
- Options
- Disk Space Reclamation Options
- Usage Categories
Overview
The greenarrow disk_usage
command is used to generate reports on GreenArrow’s disk usage.
Specific filesystem paths and PostgreSQL table names are shown by greenarrow disk_usage
reports when the --details option is used.
Deleting GreenArrow’s data without following procedures documented on this site can cause data corruption and render GreenArrow inoperable. Restoring GreenArrow to a working state after such deletions may incur an additional fee as discussed in our Modifications and Customizations page.
Here is an example of the default report’s output:
# greenarrow disk_usage
GreenArrow Disk Usage Report
Studio
Attachments 16KB
Bounces 112KB
Clicks 184KB
Content: Campaigns, Autoresponders, Web Forms 32KB
Imports and Exports 152KB
Opens 64KB
Sents 160KB
Spam Complaints 64KB
Subscriber Data 1008KB
Suppression Lists 72KB
Unsubscribes 112KB
Uploaded Images 28KB
Misc Studio Data 443MB
Engine
Archived Messages 16KB
Bad Addresses 68KB
Clicks and Opens 44KB
Delivery Attempt Logs 1005MB
Disk Queue 3MB
Incoming Email 168KB
Send Summary Files 368KB
SimpleMH Message Log 8KB
Time Summary Files 4MB
Misc Engine Data 3MB
General
Events 8KB
Redis 38MB
Web Server Logs 23MB
Other Logs 51MB
The Report Hierarchy
The reports that are generated by greenarrow disk_usage
have a three-layer hierarchy:
Product |
This is the section of the report for the named product. The products are:
|
Category |
A category of data, like |
Item |
A specific filesystem path or PostgreSQL table. Individual items are only shown when the --details option is on. |
Each product contains multiple categories, and each category includes one or more items.
Performance Considerations
The greenarrow disk_usage
command runs both PostgreSQL queries and disk usage commands like du
to gather its information. These operations are disk I/O intensive, so, depending on how much space your GreenArrow installation is using, and how fast the storage is, greenarrow disk_usage
commands could take a long time to complete.
To mitigate this issue, greenarrow disk_usage
streams its output, printing each usage figure as soon as it’s calculated. It’s safe to Ctrl-c
to cancel a report that’s still running.
Performance issues can be further mitigated by targetting a specific subsection of the report with the --area option.
Options
The options listed in sections below tell greenarrow disk_usage
what operations it is to perform and how to format its output. All of these options are optional, and multiple options may be specified in any order.
--details
The default report (shown in the Overview section) reports on disk usage on a category level. For example, the last line shows that “Other Logs” take up 48MB. Adding the --details option causes the report to indicate which filesystem paths and PostgreSQL tables make up the category. In some cases, small PostgreSQL tables are aggregated into a single tables using less than 1MB each
line for conciseness.
The detailed report is lengthy, so you may wish to supplement the --details option with one or more --area options.
Here’s an example:
# greenarrow disk_usage --details
GreenArrow Disk Usage Report
Studio
Attachments
table: s_attachments 16KB
TOTAL 16KB
Bounces
table: s_stat_bounces 112KB
TOTAL 112KB
Clicks
table: s_stat_clicks 128KB
table: s_links 56KB
TOTAL 184KB
Content: Campaigns, Autoresponders, Web Forms
table: s_contents 32KB
TOTAL 32KB
Imports and Exports
table: s_suppressed_address_imports 32KB
table: s_subscriber_imports 48KB
table: s_subscriber_import_progresses 40KB
files: /var/hvmail/var/studio-data/subscriber_imports 8KB
files: /var/hvmail/var/studio-data/subscriber_exports 0KB
files: /var/hvmail/var/studio-data/suppressed_address_imports 0KB
files: /var/hvmail/var/studio-data/organizations 24KB
TOTAL 152KB
Opens
table: s_stat_opens 64KB
TOTAL 64KB
Sents
table: s_stat_sents 160KB
TOTAL 160KB
Spam Complaints
table: s_stat_scomps 64KB
TOTAL 64KB
Subscriber Data
table: tables using less than 1MB each 1008KB
TOTAL 1008KB
Suppression Lists
table: s_suppression_lists 32KB
table: s_suppressed_addresses 40KB
TOTAL 72KB
Unsubscribes
table: s_stat_unsubs 112KB
TOTAL 112KB
Uploaded Images
table: s_images 16KB
table: pg_largeobject 8KB
files: /var/hvmail/var/studio-data/campaign_images 4KB
TOTAL 28KB
Misc Studio Data
table: s_us_zip_codes 5MB
table: tables using less than 1MB each 2MB
files: /var/hvmail/studio 436MB
files: /var/hvmail/var/studio-tmp 28KB
TOTAL 443MB
Engine
Archived Messages
table: archived_message 16KB
TOTAL 16KB
Bad Addresses
table: bounce_bad_addresses 56KB
table: bounce_repeat_tracker 8KB
files: /var/hvmail/var/simplemh-bad-addresses.cdb 0KB
files: /var/hvmail/var/bounce_processor_repeat_tracker.cdb 4KB
TOTAL 68KB
Clicks and Opens
table: clickthrough_urls 16KB
table: clickthrough_clicks 24KB
files: /var/hvmail/var/clickthrough-tracking-emaillist 4KB
TOTAL 44KB
Delivery Attempt Logs
files: /var/hvmail/log/ram-qmail-send 305MB
files: /var/hvmail/log/bounce-qmail-send 95MB
files: /var/hvmail/log/disk-qmail-send 605MB
TOTAL 1005MB
Disk Queue
files: /var/hvmail/qmail-disk/queue 3MB
TOTAL 3MB
Incoming Email
files: /var/hvmail/maildata 168KB
TOTAL 168KB
Send Summary Files
files: /var/hvmail/log/send-summary 368KB
TOTAL 368KB
SimpleMH Message Log
table: simplemh_message_log 8KB
TOTAL 8KB
Time Summary Files
files: /var/hvmail/log/time-summary 4MB
TOTAL 4MB
Misc Engine Data
table: eng_dd_read_state 1MB
table: tables using less than 1MB each 1MB
TOTAL 3MB
General
Events
table: events 8KB
TOTAL 8KB
Redis
files: /var/hvmail/data/redis 38MB
TOTAL 38MB
Web Server Logs
files: /var/hvmail/apache/logs 23MB
TOTAL 23MB
Other Logs
files: /var/hvmail/log/bounce-processor 9MB
files: /var/hvmail/log/config-agent 992KB
files: /var/hvmail/log/dd-dispatcher 68KB
files: /var/hvmail/log/dd-logreader 980KB
files: /var/hvmail/log/event-processor 996KB
files: /var/hvmail/log/httpd 300KB
files: /var/hvmail/log/logfile-agent 128KB
files: /var/hvmail/log/logfile-summary 4KB
files: /var/hvmail/log/logfile-writer 16KB
files: /var/hvmail/log/postgres 19MB
files: /var/hvmail/log/pure-authd-studio 4KB
files: /var/hvmail/log/pure-ftpd 4KB
files: /var/hvmail/log/qmail-pop3d 976KB
files: /var/hvmail/log/qmail-smtpd 9MB
files: /var/hvmail/log/qmail-smtpd2 4MB
files: /var/hvmail/log/qmail-smtpd3 4KB
files: /var/hvmail/log/redis 944KB
files: /var/hvmail/log/redis-np 108KB
files: /var/hvmail/log/rpc 24KB
files: /var/hvmail/log/rspawn-limiter 928KB
files: /var/hvmail/log/send-summary-queue 4KB
files: /var/hvmail/log/simplemh 800KB
files: /var/hvmail/log/simplemh2 884KB
files: /var/hvmail/log/smtp-sink 4KB
files: /var/hvmail/log/studio 948KB
files: /var/hvmail/log/studio-worker 912KB
TOTAL 51MB
--area
The --area
option restricts the report to a specific product or category. Areas take two forms:
-
A specific product. This can be
"Engine"
,"Studio"
or"General"
. -
A specific
"Product: Category"
combination, separated by a colon and optionally whitespace. For example,"Studio: Suppression Lists"
or"General: Events"
.
It is not currently possible to use the --area
option to target a specific filesystem path or PostgreSQL table.
If you include multiple --area
options, then all sections that match any --area
option will be printed, but no section will be printed more than once. For example, greenarrow disk_usage --area "Studio" --area "Studio: Suppression Lists"
will print the report for all of Studio and the “Suppression Lists” category will only be shown once.
--json
The --json
option causes the report to be shown using JSON pretty-print formatting.
Here’s an example of a JSON encoded report without --details turned on:
# greenarrow disk_usage --area "Engine" --json
{
"GreenArrow Disk Usage Report": {
"Engine": {
"Archived Messages": {
"total": {
"disk_used": 16,
"disk_used_human": "16KB"
}
},
"Bad Addresses": {
"total": {
"disk_used": 68,
"disk_used_human": "68KB"
}
},
"Clicks and Opens": {
"total": {
"disk_used": 44,
"disk_used_human": "44KB"
}
},
"Delivery Attempt Logs": {
"total": {
"disk_used": 1029608,
"disk_used_human": "1005MB"
}
},
"Disk Queue": {
"total": {
"disk_used": 3420,
"disk_used_human": "3MB"
}
},
"Incoming Email": {
"total": {
"disk_used": 168,
"disk_used_human": "168KB"
}
},
"Send Summary Files": {
"total": {
"disk_used": 368,
"disk_used_human": "368KB"
}
},
"SimpleMH Message Log": {
"total": {
"disk_used": 8,
"disk_used_human": "8KB"
}
},
"Time Summary Files": {
"total": {
"disk_used": 4172,
"disk_used_human": "4MB"
}
},
"Misc Engine Data": {
"total": {
"disk_used": 2688,
"disk_used_human": "3MB"
}
}
}
}
}
In the above report, GreenArrow Disk Usage Report
is a hash with a key for each product. Each product is a hash which in turn contains a hash named total
that is structured as shown below:
total
hash
|
Here’s an example of a JSON encoded report with --details turned on:
# greenarrow disk_usage --area "Engine" --json
{
"GreenArrow Disk Usage Report": {
"Engine": {
"Archived Messages": {
"total": {
"disk_used": 16,
"disk_used_human": "16KB"
}
},
"Bad Addresses": {
"total": {
"disk_used": 68,
"disk_used_human": "68KB"
}
},
"Clicks and Opens": {
"total": {
"disk_used": 44,
"disk_used_human": "44KB"
}
},
"Delivery Attempt Logs": {
"total": {
"disk_used": 1029636,
"disk_used_human": "1006MB"
}
},
"Disk Queue": {
"total": {
"disk_used": 3420,
"disk_used_human": "3MB"
}
},
"Incoming Email": {
"total": {
"disk_used": 168,
"disk_used_human": "168KB"
}
},
"Send Summary Files": {
"total": {
"disk_used": 368,
"disk_used_human": "368KB"
}
},
"SimpleMH Message Log": {
"total": {
"disk_used": 8,
"disk_used_human": "8KB"
}
},
"Time Summary Files": {
"total": {
"disk_used": 4172,
"disk_used_human": "4MB"
}
},
"Misc Engine Data": {
"total": {
"disk_used": 2688,
"disk_used_human": "3MB"
}
}
}
}
}
When both the --json
and --details
options are turned on, each usage category, such as Misc Engine Data
has a components
array added to it to show the usage for individual items.
Here’s how the components
array is structured:
components
array
|
--postgres-bloat
The --postgres-bloat
option causes the report to estimate the percentage of disk space used by a table and its indexes which is “bloat”. Bloat is empty space which was previously used by rows or index entries which have either been changed or deleted. Valid values range from 0% (no bloat) to 100% (all space is bloat).
The bloat figures are shown on a per-table basis, and individual tables are only shown in the detailed report, so specifying --postgres-bloat
always causes a detailed report to be shown, regardless of whether the --details option was explicitly used.
When the --json option is not used, the PostgreSQL bloat estimate is included in parentheses following each table name. For example, the s_subscriber_import_progresses
table below contains 15% bloat space, meaning that if we were able to recover 100% of the bloat space, 6KB (which is 15% of 40KB) would be freed:
table: s_subscriber_import_progresses 40KB (15%)
When the --json option is used, the PostgreSQL bloat estimate is included in two new keys - bloat
, which contains a floating point number and bloat_human
, which contains a string.
{
"type": "table",
"name": "s_subscriber_import_progresses",
"disk_used": 40,
"disk_used_human": "40KB",
"bloat": 15.0,
"bloat_human": "15%"
},
The report skips calculating the bloat percentage for some short-lived tables. It marks the tables that it has skipped by reporting the bloat as null
when the --json option is used and -
when the --json option is not used.
Please keep the following in mind when reviewing the PostgreSQL bloat estimates:
-
Calculating the bloat of PostgreSQL tables and indexes can be resource intensive, so expect reports to take longer to complete when these figures are calculated.
-
These are only estimates because
--postgres-bloat
uses techniques like querying the data from the last time each table was analyzed by PostgreSQL to make the estimates calculate more quickly. -
Some bloat is a good thing, so PostgreSQL reserves some space by design. For example, binary tree indexes attempt to keep 10% of their index pages free to reduce fragmentation. The above example’s 15% bloat figure is not unusual on a server that’s operating normally.
-
We don’t have a hard rule for what we consider to be problematic bloat, but the following examples may help:
- Usually, when PostgreSQL bloat has been an issue in the past, it’s been a situation where one, or a small subset of tables were taking up the majority of the disk space, and had bloat figures over 30%.
- Sometimes small tables get bloated, and it’s not worth addressing. For example, if a table is occupying 12MB of space, and is 75% bloated, it’s probably not worth investigating, because the best possible outcome is freeing 9MB of space.
Please contact GreenArrow technical support if you have any questions about how to interpret, or address the bloat figures that you see.
--postgres-only
The --postgres-only
option causes the report to only show PostgreSQL table entries. File entries are excluded.
--postgres-only
and --files-only
are mutually exclusive.
--files-only
The --files-only
option causes the report to only show file entries. PostgreSQL table entries are excluded.
--help
The --help
option prints a concise usage summary:
# greenarrow disk_usage --help
greenarrow: Usage:
greenarrow disk_usage [OPTIONS]
This command will generate a report of the disk space used by GreenArrow.
Application Options:
--details include extra details in the output
--area= show a specific area of disk usage
--json print JSON formatted output
--postgres-bloat estimate the amount of disk space that PostgreSQL uses in excess of its minimum possible size
--postgres-only only show the PostgreSQL table portions of the report
--files-only only show the filesystem path portions of the report
Help Options:
-h, --help Show this help message
Disk Space Reclamation Options
The Usage Categories section below lists the components that make up each category shown in the report.
Some of these components have disk space reclamation options. The larger a component, the more likely it is to have a documented disk space reclamation procedure. If the procedure has been publicly documented, it’s linked to from the Usage Categories section.
There are also some disk space reclamation procedures which we haven’t been documented up to this point, either because they’re infrequently used, or because implementing them requires advanced knowledge of GreenArrow’s internals. Please contact GreenArrow technical support if you believe that part of GreenArrow is using more disk space than it should, and would like to find out if we have any undocumented disk space reclamation methods available.
When disk space is reclaimed from ordinary files, the results can be seen immediately by re-running the report.
When disk space is reclaimed from PostgreSQL tables, re-running the report usually shows that the table’s size is unchanged. This is because when data is deleted from a PostgreSQL table, the table itself continues to occupy the same amount of space that it did before. PostgreSQL simply marks the space that was used by the deleted data as being available for new data. The report will show this reclaimed space as bloat when the --postgres-bloat option is used.
For example, suppose you have a PostgreSQL table that’s occupying 2GB, and you free 1GB of space in it. The disk usage report will still show that the table is using 2GB of data. If you later add 1GB of data to that same table, the disk usage report will continue to show that the table is using 2GB of space, because the 1GB of space that was freed by PostgreSQL earlier gets reused by the new data.
Usage Categories
The following sections show the hierarchy of files and PostgreSQL tables shown in the report and link to any relevant documentation.
Studio
The Studio portion of the report is only shown if either:
- Your GreenArrow license includes Studio.
- Studio’s total usage is at least 600MB. The reason for this threshold is that Engine and Studio have some shared code, so Engine-only installations typically have a few hundred megabytes of files that would be classified as belonging to Studio in the report. If there are more than 600MB of Studio files, that’s a sign that Studio was used at some point in the past - perhaps by a previous Studio license, and that the past license’s data is still present.
Attachments
s_attachments
table |
Campaign attachments are stored in this table. |
Bounces
s_stat_bounces
table |
This table’s data retention settings can be controlled by adjusting the Campaign Bounces data retention setting |
Clicks
s_stat_clicks
table |
This table’s retention settings can be controlled by adjusting the Campaign Clicks data retention setting. |
s_links
table |
This table is used to store the original URL, Stat ID, and a unique identifier for each URL used in a send that uses click tracking. |
Content: Campaigns, Autoresponders, Web Forms
s_contents
table |
This table is used to store the contents of campaigns, including their subjects, HTML versions, and text versions. |
Imports and Exports
s_suppressed_address_imports
table |
This table is used when importing Suppression Lists. |
s_subscriber_imports
table |
This table is used when importing subscribers. |
s_subscriber_import_progresses
table |
This table is used when importing subscribers. |
/var/hvmail/var/studio-data/subscriber_imports
files |
This folder is used when importing subscribers. |
/var/hvmail/var/studio-data/subscriber_exports
files |
This folder is used when exporting subscribers. |
/var/hvmail/var/studio-data/suppressed_address_imports
files |
This table is used when importing Suppression Lists. |
/var/hvmail/var/studio-data/organizations
files |
This folder is used to store files uploaded via FTP. Space can be reclaimed by logging in with your FTP account and deleting files. |
Opens
s_stat_opens
table |
This table’s retention settings can be controlled by adjusting the Campaign Opens data retention setting. |
Sents
s_stat_sents
table |
This table’s data retention settings can be controlled by adjusting the Campaign Recipient Data retention setting. |
Spam Complaints
s_stat_scomps
table |
This table’s data retention settings can be controlled by adjusting the Campaign Spam Complaints data retention setting. |
Subscriber Data
This category includes all tables used to store Studio’s subscriber records. Most mailing lists are stored in their own table which is named using an s_subscribers_
prefix. The mailing list specific tables that use at least 1MB of space will appear in the report when the --details option is used. The category also contains:
s_subscribers
table |
This table stores subscriber data for mailing lists that are not large enough to have had their own table with an |
s_subscriber_statuses
table |
This table stores the status of each subscriber. |
s_pending_subscribers
table |
This table contains subscription requests pending confirmation. |
s_subscriber_recent_activities
table |
This table is used to temporarily store data on recent sends, clicks, and opens. |
tables using less than 1MB each
table |
This entry shows the combined usage of all tables that store Studio subscriber data, and which use less than 1MB of disk space. |
The space used by subscriber records in a mailing list may be reclaimed by deleting that mailing list. Think carefully about your decision before doing this, though. Deleting a mailing list, then re-creating it will prevent the default unsubscribe link, bounce processing, and spam complaint processing systems from deactivating subscribers on the new list, which in turn can cause subscriber engagement and deliverability issues.
The above tables are discussed in more detail in the Direct Database Access document.
Suppression Lists
s_suppression_lists
table |
This table stores the settings for each suppression lists, excluding representations of individual suppressed addresses. |
s_suppressed_addresses
table |
This table stores representations of email addresses on suppression lists. |
Unsubscribes
s_stat_unsubs
table |
This table’s data retention settings can be controlled by adjusting the Campaign Unsubscribes data retention setting. |
Uploaded Images
s_images
table |
Images used for campaigns and autoresponders are stored in this table. |
pg_largeobject
table |
The pg_largeobject table is used to hold large objects. This table isn’t always populated only by images, but they’re the category in this report that’s most often a contributor. |
/var/hvmail/var/studio-data/campaign_images
files |
Images used for campaigns and autoresponders are stored in this directory. |
Misc Studio Data
This category includes all tables whose name have an s_
prefix and which aren’t listed elsewhere in the report. As a result, you may see tables listed in your report which aren’t listed below.
s_us_zip_codes
table |
This table contains data on US zip codes and is used by the Segmentation Builder. |
tables using less than 1MB each
table |
This entry shows the combined usage of all tables that meet the following criteria:
|
/var/hvmail/studio
files |
All files in |
/var/hvmail/var/studio-tmp
files |
All files in |
Engine
Archived Messages
archived_message
table |
Sample messages get recorded in this table when a Mail Class has the Archive a Sample of Messages option turned on. |
Bad Addresses
bounce_bad_addresses
table |
This table stores the addresses that are eligible for Bad Address Suppression. |
bounce_repeat_tracker
table |
This table is used by the Bounce Processor to determine when to deactivate subscribers for repeated bounces. |
/var/hvmail/var/simplemh-bad-addresses.cdb
files |
This table is used by the Bad Address Suppression. It contains a subset of the |
/var/hvmail/var/bounce_processor_repeat_tracker.cdb
files |
This file is used by the Bounce Processor to determine when to deactivate subscribers for repeated bounces. |
Clicks and Opens
clickthrough_urls
table |
The original URLs used in SimpleMH click tracking are stored here. |
clickthrough_clicks
table |
Data on each SimpleMH click and open that takes place is stored here. The command |
/var/hvmail/var/clickthrough-tracking-emaillist
files |
See SimpleMH and Studio Remote List Email Address Retention for details on what this directory contains and how to control its retention settings. |
Delivery Attempt Logs
/var/hvmail/log/ram-qmail-send
files |
Delivery attempt logs for GreenArrow’s ram-queue. |
/var/hvmail/log/bounce-qmail-send
files |
Delivery attempt logs for GreenArrow’s bounce-queue. |
/var/hvmail/log/disk-qmail-send
files |
Delivery attempt logs for GreenArrow’s disk-queue. |
Delivery attempt log data retention settings can be adjusted using the hvmail_set log_disk_space command.
Disk Queue
/var/hvmail/qmail-disk/queue
files |
Disk-queue messages are stored here. The usage of this directory can be reduced by doing any of the following:
|
Incoming Email
/var/hvmail/maildata
files |
Mailboxes used for incoming email are stored here. |
Send Summary Files
/var/hvmail/log/send-summary
files |
These files are used by Send Statistics. The Troubleshooting Disk Space Issues document contains a section on moving |
SimpleMH Message Log
simplemh_message_log
table |
This table can optionally be used to log messages as they pass through SimpleMH. See Logging all SimpleMH Messages. |
Time Summary Files
/var/hvmail/log/time-summary
files |
Files used by Dynamic Delivery Statistics |
Misc Engine Data
tables using less than 1MB each
table |
This entry shows the combined usage of all GreenArrow PostgreSQL tables that meet the following criteria:
|
General
The General categories are shared by Engine and Studio.
Events
events
table |
Individual events recorded by the Event Notification System are stored here until they’re delivered. |
Redis
/var/hvmail/data/redis
files |
This folder contains Redis data. |
Web Server Logs
/var/hvmail/apache/logs
table |
This folder contains web server logs. |
Other Logs
The Other Logs
category contains one entry for each file or folder within /var/hvmail/log
that isn’t counted elsewhere.