Requirements
Let’s explore what’s needed for this project.
User Story
The user story for this piece of functionality is shown in Figure 6-1.

Figure 6-1. Project 2 user story
Elaborated Requirements
This project also has the following specific requirements:
- Suggestions should be presented when a user types three or more characters.
- 95% of suggestion requests should return suggestions in less than 500 ms as, anything longer than this may be perceived as slow.
- The solution should be reliable and low cost.
- The solution should scale to thousands of simultaneous requests without desegregation.
Solution
What is required here is a reliable and scalable solution for looking up skills from a list of tags. Given the relatively small size of the data and the need for rapid response times, you’ll keep this data in an in-memory trie data structure.
Summary of Services
Here is a summary of the Google Cloud services you will be using in this solution.
Cloud Storage
In Chapter 2, you collected tags from BigQuery’s Stack Overflow dataset and stored them as an object in Cloud Storage. Now, you’ll retrieve that object and use it to populate the skills that your service will use to generate suggestions for the user.
Cloud Run
You previously used Cloud Functions as the runtime for your application. However, while Cloud Functions are great for occasionally running code triggered by events, they are not intended to be used for long-running services. As the service will need to set up the trie data structure in memory, you don’t want to have to do that each time there is a request. Instead, the requirement is for a long-running service, or at least one that can handle a large number of requests once started.
As you want a service that is long-running and can scale dynamically, you will use Cloud Run. In Cloud Run, instances are referred to as services rather than functions in Cloud Functions.
Cloud Run is the underlying technology of the Cloud Function you used in Chapter 5. Here, using it directly gives us more control of the container and how it runs. Specifically, you can scale the service to handle thousands of simultaneous requests.
If Cloud Run was a means of transport, it would be like a rental car; you have more flexibility than a taxi, but you have to drive it yourself. However, you still don’t have to worry about the maintenance and upkeep of the car.
Cloud Run can scale in different ways:
Multiple instances
Cloud Run automatically scales up the number of container instances based on the number of requests by monitoring an internal request queue. It can also scale down to zero when no requests are pending. This follows the 12-factor principle of favoring scaling horizontally rather than vertically.
Concurrency
For languages with good concurrency support like Go or Java, it is possible to have multiple requests handled by a single instance rather than in Cloud Functions, where a function handles a single request at a time.
Resources
As with Cloud Functions, you can vertically scale an instance, allocating more memory and CPU.
However, Cloud Run cannot scale infinitely and there are limits on the number of instances and the amount of memory and CPU available. For example:
- Concurrency is limited to a maximum of 1,000 simultaneous requests per instance.
- Memory is limited to 32 GB per instance.
- File system is limited to 32 GB per instance.
- CPU is limited to 8 vCPUs per instance.
- The number of instances is limited to 100 per region.
- For larger CPU and memory, the number of instances is limited to a lower amount, and this varies depending on the capacity in the Google Cloud region.
See Cloud Run Quotas and Limitations documentation for more details.
A single Cloud Run request is limited to 60 minutes of execution time. However, when Cloud Run does not receive requests, it will throttle down to 0 CPU and will terminate the instance after 60 minutes of inactivity.
Although Cloud Run does have limits, they are generous, and it should be possible to build many services within the restrictions. Cloud Run is a great service to use if you can; it is cost-effective since you are allocating resources directly from Borg only when you need them and not paying for them when you don’t.
Tip
When I first used Cloud Run, I tried to deploy a containerized version of the Ghost blogging platform with it, thinking if it did not receive much traffic, it would scale to zero, and this would be a cost-effective way of running it.
However, my Ghost instance had a significant startup time, upward of a minute. When the instance terminated after inactivity, the next request would be met with a “Preparing Ghost” message while it started up again. This is understandable, as Ghost was designed to run on a server as a long-running task and not a serverless platform. While Cloud Run is great for many use cases, it is not suitable for all applications.
However, if you reach limitations or if you are using an existing application that does not fit with the constraints of Cloud Run, it may be necessary to consider a lower-level service like GKE Autopilot. You will have an opportunity to look at this option in Chapter 14. In this case, even if Cloud Run scales down and requires a new instance to serve requests, the instance should be ready quickly, and the user should not notice a significant impact.