Three ways to shoot yourself in the foot with Google Cloud Run

8 December 2024

This post is an expansion of a comment I made on HN recently and continues my Extreme Learning series. Previously I discussed how I’ve managed to break production with Redis, with PostgreSQL and with healthchecks. Now I’ll show you how I did it with Cloud Run too.

For context, we migrated from Managed Instance Groups to Cloud Run at work last year. Cloud Run promises to simplify your production infrastructure and mostly we found that to be true. But we also discovered some hidden gotchas that can catch you out if you’re not paying attention.

1. Use websockets without changing the default request timeout

By default, Cloud Run terminates inbound TCP connections after 5 minutes. If you’re doing anything that uses a long-running connection, communicating via a websocket for example, you’ll want to change that setting. Otherwise at best you’ll have lots of reconnection overhead and at worst you’ll have weird bugs in production that nobody can reproduce in local.

The upper limit on connections in Cloud Run is one hour, so you’ll need proper reconnection logic on clients if you’re running longer than that. (but you should have proper reconnection logic regardless of course, because you’re working over the internet)

If you’re changing this setting in the GCP web console, it should look like this:

Screenshot of the request timeout setting in GCP

Or if you’re using Terraform, you should set timeout = "3600s" in the cloud_run_v2_service resource.

2. Ignore the difference between first and second generation environments

Cloud Run has two separate execution environments, each with its own tradeoffs. The first generation environment emulates Linux (imperfectly) and has faster cold starts. The second generation runs on real Linux and has faster CPU and faster network throughput. If you don’t specify a choice, it defaults to first generation.

For our part, we valued faster network throughput so opted for second generation. That setting looks like this in the web console:

Screenshot of the execution environment setting in GCP

For terraform we set execution_environment = "EXECUTION_ENVIRONMENT_GEN2".

3. Don’t pay attention to autoscaling settings

With the default settings in place, Cloud Run will autoscale up when CPU usage reaches 60% and scale down to zero when there’s no traffic. You may want to change both of those depending on your usage.

In our application, CPU usage turned out to be a suboptimal metric for scaling up, so we tweaked the maximum number of concurrent requests to compensate. We also have periods of very low activity at weekends, so set the minimum instance count to 1 to eliminate cold starts.

Here are those settings in the web console:

Screenshot of the maximum concurrent requests setting in GCP

Screenshot of the minimum instances setting in GCP

In terraform those settings are max_instance_request_concurrency = 50 and scaling { min_instance_count = 1 }.