By default, the Cloud Function will have used the default service account for the project service-ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com. This is not a great idea, as it gives broad access to all resources in the project.
You can create a new service account that has the minimum permissions required for the function to work and no more. Specifically, the service account needs the following permissions:
- Execute BigQuery queries using the bigquery.jobUser predefined role.
- Write to Cloud Storage using the storage.objectAdmin predefined role, as it will need to be able to both create new objects and delete previous ones.
Terraform Implementation
As you can see, although the solution is simple, there are still many steps to set it up.
In Chapter 12, you will see how Terraform can be used to fully automate this type of deployment; in the Appendix A, the deployment for the whole of Skills Mapper has been automated. For now, here is a peek at how to do the same with Terraform.
To use this Terraform implementation, you need to have Terraform installed and configured, and you also need to have created a Google Cloud project, as described in Chapter 4. Use a new project so as not to conflict with what you have set up in this chapter.
The reason to introduce this here is that you may be getting put off by all the gcloud commands. They are useful for learning but not essential. When you want to move to a reproducible environment, Terraform will come to your rescue.
Evaluation
Now let’s look at how the solution will scale and how much it will cost.
How Will This Solution Scale?
The scaling of this solution is not a great concern, as it is a single task that runs weekly. It is also very unlikely that there will be a significant change in the number of tags to retrieve from the Stack Overflow dataset.
However, if you did want to schedule the task more frequently or even add tasks to collect data from other sources, you could easily do so by adding more Cloud Functions and changing the frequency of the Cloud Scheduler jobs.
How Much Will This Solution Cost?
The costs of this solution are very close to zero (and I mean close). The cost will likely be less than $0.01 per month:
- Cloud Storage data is charged at $0.026 per GB/month. This solution uses less than 1 MB of storage, so the cost is negligible.
- Cloud Functions are charged at $0.0000002 per GB/s. This solution uses less than 256 MB of memory for less than a minute per month, so the cost is negligible.
- Cloud Scheduler is charged at $0.01 per 100,000 invocations. This solution uses less than five invocations per month, so the cost is negligible too.
- BigQuery queries are charged after the first 1TB of data is scanned per month. This solution uses less than 10 MB of data per month, so there will be no cost.
- You will also be charged for moving around small amounts of data between services, but again, this is negligible.
This is the type of service that makes a lot of sense in a cloud native environment. A task that may previously have needed a dedicated server can now be run for virtually nothing.
Summary
You have built a solution that can be highly reliable and will run for minimal cost. This service should be able to sit in the background running for years uninterrupted, if needed.
The following are Google Cloud Services used in the solution:
- gcloud CLI is used for interacting with the Google Cloud API.
- bq is used for working with BigQuery at the command line.
- gsutil is used for working with Cloud Storage.
- BigQuery is used for querying the Stack Overflow public dataset.
- Cloud Storage is used as a simple way of storing the list of tags.
- Cloud Functions is used as a high-level abstraction to run code serverlessly.
- Cloud Scheduler is used as the mechanism scheduling runs of the job.
In the next project, you will take the list of tags that this service has provided and make it available for a user to select skills from.