As in Chapter 5 with Cloud Functions, this Cloud Run service is currently using a default service account with broad permissions.
Cloud Run is also allowing unauthenticated invocations of the service. This may be OK for testing, but in a production environment, you would want to secure the service, and you will see how to do that in Chapter 11.
However, ultimately, the combination of risks means you have code that can be called by anyone on the internet using a service account with permissions that could do damage if code with security vulnerabilities was accidentally or maliciously deployed.
For safety, you can create a new service account with the minimum permissions required to run the service. In this case, that will be permission to read the object from Cloud Storage and nothing more. This is the principle of the least privilege which was not one of the original 12 factors, as those principles did not have much to say about security. However, security was emphasized when the 12 factors were revisited, and the principle of the least privilege is a good practice recommended by all cloud providers.
The total time from the initiation of the request until the receipt of the response
In this case, the total time is 227 ms, which is good, as it is below the target of 500 ms for the service. The connect time of 77 ms depends on your network conditions and isn’t something your service has control over. The processing time of 150 ms is the actual time spent by the skill-service handling your request, which is an indicator of the service’s performance.
However, this is just one query. You can test the performance of the service by sending multiple requests sequentially to see how the response varies.
Improving Performance
You have deployed to Cloud Run using the default settings, which are:
- 1 vCPU
- 512 MB memory
- Concurrency: 80
- Minimum instances: 0
- Maximum instances: 100
There are two things to note. The first is that the Container startup latency is approximately two seconds. This means, if a container is not running, it takes about two seconds to start a new container. What you could do is set the minimum number of instances to 1 instead of 0 so that there is always one container running.
You may also notice that the CPU of the container gets high reaching above 80%. This means that the container is not able to process requests as fast as it could. Any requests that come in while the container is busy are queued and processed when the container is free.
You could increase the number of CPUs for a container from 1 to 2 or reduce the concurrency from 80 to 40 to reduce the number of requests that are processed at the same time.
The beauty of Cloud Run is that you can change these settings without redeploying the service. You can change the settings of the service using the gcloud run services update command.