Automating MySQL EXPLAIN checks

17 September 2018

About a month ago, we had an outage caused by a slow-running query in MySQL. This particular slow query wasn’t spotted when it was deployed because it depended on data inserted by client browsers and the related preference in Firefox was not enabled at that point. A few weeks after it shipped, the client pref was flipped on and as the table grew, it slowed down MySQL increasingly until the whole of Firefox Accounts became unresponsive. And of course, because Sod’s Law is one of the fundamental forces of nature, this happened late on a Friday night. There were some complicating factors that slowed down diagnosis, but it’s also fair to say we could have caught it at source with an EXPLAIN of the offending query during code review. Because of that, I decided to try and automate EXPLAIN checks for our MySQL queries.

My broad objective was to write a script that extracts individual queries from stored procedures, turns them into EXPLAIN statements and then feeds those into MySQL. The results would then be used to emit warnings and fail pull requests when bad smells like filesort or a full table scan are detected.

Identifying procedure names

Our stored procedures are versioned and the old versions continue to live in the tree so that reverse migrations can be applied if necessary. The first hurdle, then, is to identify which stored procedure versions are currently active in the codebase.

This proves to be as simple as grepping for CALL statements in our JavaScript code, then using awk and cut to pull out the name of the called procedure:

grep 'CALL ' lib/db/mysql.js | awk -F 'CALL +' '{print $2}' | cut -d '(' -f 1

Similarly, the relevant source file for each procedure can be found by grepping for CREATE PROCEDURE statements. Here git grep is used because it includes the file name in the output:

git grep "CREATE PROCEDURE `*$PROCEDURE" | cut -d ':' -f 1

Extracting SELECTs

The next job is to extract queries from those stored procedures and turn them into valid EXPLAIN statements. For the first cut, I am ignoring INSERT, UPDATE and DELETE, and just focusing on SELECT instead. If that proves valuable, it should be pretty straightforward for a future pull request to transform other query types to an equivalent SELECT and build an EXPLAIN from that.

For this step I wanted to parse the SQL but none of the off-the-shelf parsers in npm are able to process our stored procedures. That’s okay though because we don’t actually need a complete parser, we just need a solution that’s good enough to extract individual queries from our codebase.

Looking at the code, the majority of our stored procedures conform to some basic assumptions:

There are never multiple queries on a single line.
CREATE PROCEDURE and its matching END; start at column 1.
Arguments to procedures are named either inXxx or xxxArg.
SQL comment delimiters never appear inside string literals.

That means we can process our SQL files line by line and use simple regular expressions to pull out individual SELECT queries:

const COMMENT = /--.+$/
const CREATE_PROCEDURE = /^CREATE PROCEDURE `?([A-Z]+_[0-9]+)/i
const END_PROCEDURE = /^END;$/i
const SELECT = /^\s*SELECT/i

function extractSelects (path, procedure) {
  let isProcedure = false, isSelect = false

  const src = fs.readFileSync(path, { encoding: 'utf8' })
  const lines = src.split('\n')

  return lines
    .reduce((selects, line) => {
      line = line.replace(COMMENT, '')

      if (isProcedure) {
        if (END_PROCEDURE.test(line)) {
          isProcedure = isSelect = false
        } else {
          if (isSelect) {
            selects[selects.length - 1] += ` ${line.trim()}`
          } else if (SELECT.test(line)) {
            selects.push(line.trim())
            isSelect = true
          }

          if (line.indexOf(';') !== -1) {
            isSelect = false
          }
        }
      } else if (procedure) {
        const match = CREATE_PROCEDURE.exec(line)

        if (match && match.length === 2 && match[1] === procedure) {
          isProcedure = true
        }
      } else {
        isProcedure = CREATE_PROCEDURE.test(line)
      }

      return selects
    }, [])
    .map(select => purgeUnbalancedParentheses(select))
}

function purgeUnbalancedParentheses (select) {
  const openingCount = select.split('(').length
  const closingCount = select.split(')').length

  if (openingCount < closingCount) {
    for (let i = 0; i < closingCount - openingCount; ++i) {
      const index = select.lastIndexOf(')')
      select = select.substr(0, index) + select.substr(index + 1)
    }
  } else if (openingCount > closingCount) {
    for (let i = 0; i < openingCount - closingCount; ++i) {
      const index = select.indexOf('(')
      select = select.substr(0, index) + select.substr(index + 1)
    }
  }

  return select
}

This is good enough to work for 90% of our queries. That’s fine because currently we automate EXPLAIN checks for 0% of our queries, so it’s still a significant improvement even though it’s not perfect.

Replacing arguments

In many cases, the code above returns a string that looks like this:

SELECT * FROM table WHERE column = columnArg;

Here columnArg is the name of an argument to the stored procedure the query comes from. Running it through MySQL outside that context will fail, because it doesn’t know what columnArg is. So for the EXPLAIN to work, we must replace the argument name with a literal value. And the literal value needs to be of the correct type in order for MySQL to perform valid analysis.

To fix this, we can rewind to the beginning of the script and add a step that inserts a bunch of known values into the database. Then we can write a function that replaces arguments by looking up the argument name in a dictionary of known values and making the appropriate replacement:

function replaceArgs (query) {
  return query
    .replace(/([ \(])`?in((?:[A-Z][A-Za-z]+)+)`?/g, replaceArg)
    .replace(/([ \(])`?([a-z]+(?:[A-Z][A-Za-z]+)*)Arg`?/g, replaceArg)
}

function replaceArg (match, delimiter, arg) {
  arg = arg.toLowerCase()

  let value = KNOWN_ARGS.has(arg) ? KNOWN_ARGS.get(arg) : ''

  if (Buffer.isBuffer(value)) {
    value = `UNHEX("${value.toString('hex')}")`
  } else if (typeof value === 'string') {
    value = `"${value}"`
  }

  return `${delimiter}${value}`
}

Interpreting results

All that’s left now is to prefix each SELECT with EXPLAIN, run it through MySQL and interpret the results. There are two columns of interest in the EXPLAIN result set, type and Extra. These can be set to many values, but I’m mostly interested in four particular bad smells:

If type is ALL, it means the query is performing a full table scan.
If type is index, it means the query is doing a full index scan.
If Extra includes using filesort, it means an expensive sort operation is involved.
If Extra includes using temporary table, it means a temporary table is being created.

With those in mind, the following function can be used to turn the EXPLAIN results into an array of warnings:

const TYPE_FULL_TABLE_SCAN = /^all$/i
const TYPE_FULL_INDEX_SCAN = /^index$/i
const EXTRA_FILESORT = /filesort/i
const EXTRA_TEMPORARY_TABLE = /temporary/i

function warn (explainRows) {
  return explainRows.reduce((warnings, row) => {
    if (TYPE_FULL_TABLE_SCAN.test(row.type)) {
      warnings.push('full table scan')
    } else if (TYPE_FULL_INDEX_SCAN.test(row.type)) {
      warnings.push('full index scan')
    }

    if (EXTRA_FILESORT.test(row.Extra)) {
      warnings.push('filesort')
    }

    if (EXTRA_TEMPORARY_TABLE.test(row.Extra)) {
      warnings.push('temporary table')
    }

    return warnings
  }, [])
}

Ignoring procedures

The last thing I want to add is a mechanism for ignoring specific stored procedures. This is so we can turn it on in CI without it failing for queries that we know can’t be parsed yet. Then, any new query that causes the script to fail will get a big red ❌ against its respective pull request.

Assuming the existence of a file called .explain-ignore, containing one procedure name per line, we can parse it into a JS Set instance like so:

function parseIgnoreFile () {
  return new Set(
    fs.readFileSync('.explain-ignore', { encoding: 'utf8' })
      .split('\n')
      .map(procedure => procedure.trim())
      .filter(procedure => !! procedure)
  )
}

Now we can filter our list of procedures and skip processing the ones that we want to ignore.

The final script

The end result of all this is a script that I merged into our database repo today. You can see the final script in the pull request. It’s also running in CI and the output looks like this:

$ npm run explain
> fxa-auth-db-mysql@1.120.0 explain /home/travis/build/mozilla/fxa-auth-db-mysql
> node scripts/explain-warn
Found 0 warnings and failed to explain 0 queries.
The command "npm run explain" exited with 0.