Automating MySQL EXPLAIN checks
About a month ago,
we had an outage
caused by a slow-running query in MySQL.
This particular slow query
wasn’t spotted when it was deployed
because it depended on data
inserted by client browsers
and the related preference in Firefox
was not enabled at that point.
A few weeks after it shipped,
the client pref was flipped on
and as the table grew,
it slowed down MySQL increasingly
until the whole of Firefox Accounts
became unresponsive.
And of course,
because Sod’s Law
is one of the fundamental forces of nature,
this happened late on a Friday night.
There were some complicating factors
that slowed down diagnosis,
but it’s also fair to say
we could have caught it at source
with an EXPLAIN of the offending query
during code review.
Because of that,
I decided to try and automate
EXPLAIN checks for our MySQL queries.
My broad objective was to write a script
that extracts individual queries from stored procedures,
turns them into EXPLAIN statements
and then feeds those into MySQL.
The results would then be used
to emit warnings and fail pull requests
when bad smells like filesort or a full table scan
are detected.
Identifying procedure names
Our stored procedures are versioned and the old versions continue to live in the tree so that reverse migrations can be applied if necessary. The first hurdle, then, is to identify which stored procedure versions are currently active in the codebase.
This proves to be as simple as
grepping for CALL statements
in our JavaScript code,
then using awk and cut
to pull out the name
of the called procedure:
grep 'CALL ' lib/db/mysql.js | awk -F 'CALL +' '{print $2}' | cut -d '(' -f 1
Similarly,
the relevant source file for each procedure
can be found by grepping
for CREATE PROCEDURE statements.
Here git grep is used
because it includes the file name
in the output:
git grep "CREATE PROCEDURE `*$PROCEDURE" | cut -d ':' -f 1
Extracting SELECTs
The next job
is to extract queries
from those stored procedures
and turn them into valid EXPLAIN statements.
For the first cut,
I am ignoring
INSERT, UPDATE and DELETE,
and just focusing on SELECT instead.
If that proves valuable,
it should be pretty straightforward
for a future pull request
to transform other query types
to an equivalent SELECT
and build an EXPLAIN from that.
For this step I wanted to parse the SQL but none of the off-the-shelf parsers in npm are able to process our stored procedures. That’s okay though because we don’t actually need a complete parser, we just need a solution that’s good enough to extract individual queries from our codebase.
Looking at the code, the majority of our stored procedures conform to some basic assumptions:
- There are never multiple queries on a single line.
CREATE PROCEDUREand its matchingEND;start at column 1.- Arguments to procedures are named either
inXxxorxxxArg. - SQL comment delimiters never appear inside string literals.
That means we can process our SQL files line by line
and use simple regular expressions
to pull out individual SELECT queries:
const COMMENT = /--.+$/
const CREATE_PROCEDURE = /^CREATE PROCEDURE `?([A-Z]+_[0-9]+)/i
const END_PROCEDURE = /^END;$/i
const SELECT = /^\s*SELECT/i
function extractSelects (path, procedure) {
let isProcedure = false, isSelect = false
const src = fs.readFileSync(path, { encoding: 'utf8' })
const lines = src.split('\n')
return lines
.reduce((selects, line) => {
line = line.replace(COMMENT, '')
if (isProcedure) {
if (END_PROCEDURE.test(line)) {
isProcedure = isSelect = false
} else {
if (isSelect) {
selects[selects.length - 1] += ` ${line.trim()}`
} else if (SELECT.test(line)) {
selects.push(line.trim())
isSelect = true
}
if (line.indexOf(';') !== -1) {
isSelect = false
}
}
} else if (procedure) {
const match = CREATE_PROCEDURE.exec(line)
if (match && match.length === 2 && match[1] === procedure) {
isProcedure = true
}
} else {
isProcedure = CREATE_PROCEDURE.test(line)
}
return selects
}, [])
.map(select => purgeUnbalancedParentheses(select))
}
function purgeUnbalancedParentheses (select) {
const openingCount = select.split('(').length
const closingCount = select.split(')').length
if (openingCount < closingCount) {
for (let i = 0; i < closingCount - openingCount; ++i) {
const index = select.lastIndexOf(')')
select = select.substr(0, index) + select.substr(index + 1)
}
} else if (openingCount > closingCount) {
for (let i = 0; i < openingCount - closingCount; ++i) {
const index = select.indexOf('(')
select = select.substr(0, index) + select.substr(index + 1)
}
}
return select
}
This is good enough to work
for 90% of our queries.
That’s fine
because currently we automate EXPLAIN checks
for 0% of our queries,
so it’s still a significant improvement
even though it’s not perfect.
Replacing arguments
In many cases, the code above returns a string that looks like this:
SELECT * FROM table WHERE column = columnArg;
Here columnArg is the name of
an argument to the stored procedure
the query comes from.
Running it through MySQL
outside that context
will fail,
because it doesn’t know
what columnArg is.
So for the EXPLAIN to work,
we must replace the argument name
with a literal value.
And the literal value
needs to be of the correct type
in order for MySQL to perform
valid analysis.
To fix this, we can rewind to the beginning of the script and add a step that inserts a bunch of known values into the database. Then we can write a function that replaces arguments by looking up the argument name in a dictionary of known values and making the appropriate replacement:
function replaceArgs (query) {
return query
.replace(/([ \(])`?in((?:[A-Z][A-Za-z]+)+)`?/g, replaceArg)
.replace(/([ \(])`?([a-z]+(?:[A-Z][A-Za-z]+)*)Arg`?/g, replaceArg)
}
function replaceArg (match, delimiter, arg) {
arg = arg.toLowerCase()
let value = KNOWN_ARGS.has(arg) ? KNOWN_ARGS.get(arg) : ''
if (Buffer.isBuffer(value)) {
value = `UNHEX("${value.toString('hex')}")`
} else if (typeof value === 'string') {
value = `"${value}"`
}
return `${delimiter}${value}`
}
Interpreting results
All that’s left now
is to prefix each SELECT with EXPLAIN,
run it through MySQL
and interpret the results.
There are two columns of interest
in the EXPLAIN result set,
type and Extra.
These can be set to many values,
but I’m mostly interested in
four particular bad smells:
If
typeisALL, it means the query is performing a full table scan.If
typeisindex, it means the query is doing a full index scan.If
Extraincludesusing filesort, it means an expensive sort operation is involved.If
Extraincludesusing temporary table, it means a temporary table is being created.
With those in mind,
the following function
can be used to turn
the EXPLAIN results
into an array of warnings:
const TYPE_FULL_TABLE_SCAN = /^all$/i
const TYPE_FULL_INDEX_SCAN = /^index$/i
const EXTRA_FILESORT = /filesort/i
const EXTRA_TEMPORARY_TABLE = /temporary/i
function warn (explainRows) {
return explainRows.reduce((warnings, row) => {
if (TYPE_FULL_TABLE_SCAN.test(row.type)) {
warnings.push('full table scan')
} else if (TYPE_FULL_INDEX_SCAN.test(row.type)) {
warnings.push('full index scan')
}
if (EXTRA_FILESORT.test(row.Extra)) {
warnings.push('filesort')
}
if (EXTRA_TEMPORARY_TABLE.test(row.Extra)) {
warnings.push('temporary table')
}
return warnings
}, [])
}
Ignoring procedures
The last thing I want to add is a mechanism for ignoring specific stored procedures. This is so we can turn it on in CI without it failing for queries that we know can’t be parsed yet. Then, any new query that causes the script to fail will get a big red ❌ against its respective pull request.
Assuming the existence of a file called .explain-ignore,
containing one procedure name per line,
we can parse it into a JS Set instance
like so:
function parseIgnoreFile () {
return new Set(
fs.readFileSync('.explain-ignore', { encoding: 'utf8' })
.split('\n')
.map(procedure => procedure.trim())
.filter(procedure => !! procedure)
)
}
Now we can filter our list of procedures and skip processing the ones that we want to ignore.
The final script
The end result of all this is a script that I merged into our database repo today. You can see the final script in the pull request. It’s also running in CI and the output looks like this:
$ npm run explain
> fxa-auth-db-mysql@1.120.0 explain /home/travis/build/mozilla/fxa-auth-db-mysql
> node scripts/explain-warn
Found 0 warnings and failed to explain 0 queries.
The command "npm run explain" exited with 0.