Some useful, non-obvious Postgres patterns
There’s nothing earth-shattering here, but I can recall that each of these principles were not obvious to me at some point in time. Perhaps some of them are not obvious to you right now.
- Always define explicit
ON DELETEsemantics - If in doubt, use
ON DELETE SET NULL - Mutually exclusive columns
- Prohibit hidden nulls in
jsonbcolumns - Declare your
updated_atcolumnsNOT NULLto make sorting easier - Use the
citextextension for email addresses
Always define explicit ON DELETE semantics
When you create a foreign key constraint using REFERENCES,
the default behaviour for ON DELETE is NO ACTION.
That means deletes on the foreign table
will fail if they break any referring rows.
Sometimes that’s okay and what you want to happen.
But plenty of times it won’t be
and in those cases it tends to bite you
when it’s least convenient.
For that reason,
it’s prudent to always make your ON DELETE clauses explicit,
even if the action is NO ACTION or RESTRICT.
Doing that as a rule
means there will never be occasions
when you forget to declare them by accident,
leading to unexpected DELETE failures
down the line.
You can enforce the rule with a crude grep-based lint check in your CI pipeline:
#!/bin/sh
MISSING_ON_DELETE=$(git grep -i references -- '*.sql' | grep -iv 'on delete')
if [ "$MISSING_ON_DELETE" != "" ]; then
echo "ON DELETE not declared on foreign keys:"
echo "$MISSING_ON_DELETE"
exit 1
fi
Of course,
any failure here might just be a newline
that was inserted before the ON DELETE clause.
But it’s not too much hardship to fixup your pull requests
so that ON DELETE is always on the same line as REFERENCES.
The payoff is that you’ll never end up in situations
where deleting data accidentally fails
due to the default semantics.
If in doubt, use ON DELETE SET NULL
Following from the previous point,
any time there’s a shred of doubt
about which ON DELETE behaviour is correct,
consider opting for ON DELETE SET NULL.
The case for ON DELETE CASCADE is often obvious,
if the foreign row has clear ownership of referring data.
Things become more uncertain when the decision is between
ON DELETE SET NULL and ON DELETE RESTRICT or ON DELETE NO ACTION.
The fallout from erring towards ON DELETE SET NULL
is usually easier to deal with
in those cases.
For instance, consider the extreme case of a GDPR request from a user to delete all their data. You have a legal obligation to ensure everything gets deleted, even if it means leaving some nulls behind in other tables. Any left-behind nulls can be handled according to your business logic. Sometimes it’s fine to leave them in place and just code round them. Other times you might want an automated process that surfaces them for human intervention, or perhaps there are other users in your system that ownership can be transferred to. Failing deletes are typically less preferable to those alternatives.
Mutually exclusive columns
It can happen in a data model that you want an either/or relationship between two (or more) columns. When that scenario arises, you should enforce it with a constraint so it’s physically impossible for clients to create conflicting data.
To give a concrete example, imagine a customisable navigation menu where each item can link to either a URL or some entity from the database. The simplified table might look like this:
CREATE TABLE menu_items (
id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
position integer NOT NULL,
entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
url text
);
Here entity_id and url are mutually exclusive,
and you want to guarantee
they can’t both be set on a single row.
A CHECK constraint that counts non-null values
can do that:
CREATE TABLE menu_items (
id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
position integer NOT NULL,
entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
url text,
CONSTRAINT ck_menu_items_mutex CHECK (
(entity_id IS NOT NULL)::integer +
(url IS NOT NULL)::integer <= 1
)
);
Perhaps you want to add a third type of menu item later,
linking to records from a views table:
ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL;
If so, you should amend the constraint at the same time:
ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL,
DROP CONSTRAINT ck_menu_items_mutex,
ADD CONSTRAINT ck_menu_items_mutex CHECK (
(entity_id IS NOT NULL)::integer +
(view_id IS NOT NULL)::integer +
(url IS NOT NULL)::integer <= 1
);
In this way you’ve guaranteed that mutual exclusivity is upheld between the relevant columns.
Prohibit hidden nulls in jsonb columns
When storing data as jsonb,
it’s common to declare the column as NOT NULL
so clients aren’t forced to perform a null check
before dereferencing:
data jsonb NOT NULL DEFAULT '{}',
However there’s a lurking footgun here
because the string 'null' is valid JSON
that will evaluate to null when parsed.
You can preclude this from happening
with a constraint:
data jsonb NOT NULL DEFAULT '{}',
CONSTRAINT ck_data_not_null CHECK (
data <> 'null'
)
Now clients are freed from the burden
of null-checking data
before dereferencing it.
Declare your updated_at columns NOT NULL to make sorting easier
It’s common practice
to add created_at and updated_at columns to tables
if you want to track when rows were last changed.
The definition of created_at is clear:
created_at timestamptz NOT NULL DEFAULT now(),
It’s tempting to define updated_at as nullable
so that clients can easily determine
whether a given row has been updated since creation:
updated_at timestamptz,
However doing that has implications on sort order,
if you ever want to sort rows
in order of last-updated on the frontend.
For example,
if you SELECT with ORDER BY updated_at DESC,
the rows with a null updated_at
will be sorted to the beginning of the result set.
Typically that’s not what your users want to see.
Adding NULLS FIRST to your SELECT doesn’t help either,
because what users really want
is an interleaved sort order of last-updated-or-created.
For that, you can define updated_at as NOT NULL too:
updated_at timestamptz NOT NULL DEFAULT now(),
Now when you ORDER BY updated_at DESC,
rows will be returned in an order that’s helpful to users.
But what if you also wanted to use the updated_at column
to determine whether or not a row has been updated?
That’s best handled with a separate, generated column:
is_updated boolean GENERATED ALWAYS AS (
updated_at > created_at
) STORED,
This leans on the fact that now() returns the time
from the start of the current transaction.
So when new rows are created,
it’s guaranteed that the default values
for created_at and updated_at
will always be the same.
Now clients can use the is_updated column
to check whether rows have been updated
and still have a user-friendly natural sort order
for updated_at.
Use the citext extension for email addresses
Occasionally users have their CAPS LOCK key pressed
without realising.
So if you need to identify them
using an email address that they typed,
you should ignore case
everywhere you make a comparison.
The traditional way to do case-insensitive string comparison in SQL
is by calling LOWER() on both operands.
That’s fine,
but it’s easy to forget
and if you do forget,
it isn’t always spotted right away.
Fortunately there’s a Postgres extension that means you don’t have to remember to do explicit case-insensitive comparisons;
CREATE EXTENSION citext;
With that in place,
you can declare your email address columns as type citext:
email_address citext NOT NULL,
There’s no need to call LOWER()
when working with this data now,
the extension will automatically ignore casing
on your behalf.