In this initial project, you will write the first part of the Skills Mapper application. You will be introduced to some of the higher-level abstractions in Google Cloud and be shown how you can solve a real-world problem at a minimal cost.

You will learn how to solve the requirement in three ways:

  • Manually, using the gcloud CLI alone
  • Automated, using a Cloud Function and Cloud Scheduler
  • Fully automated, using the Cloud Function and Terraform to deploy

Note

The code for this chapter is in the tag-updater folder of the GitHub repository.

Requirements

Let’s dive into the requirements for this project.

User Story

The user story for this piece of functionality can be written as shown in Figure 5-1.

Figure 5-1. Project 1 user story

Elaborated Requirements

This project also has the following specific requirements:

  • The list of skills should include technologies, tools, and techniques, and be comprehensive and unambiguous.
  • Although new skills emerge frequently, it is not every day so limiting updates to weekly is sufficient.
  • The solution should be reliable, require minimal maintenance, and be low cost.
  • The resultant list of skills should be easy to consume by future services.

Solution

Maintaining a list of technical skills is a big undertaking. Fortunately, Stack Overflow is already doing that by maintaining a crowdsourced list of over 63,000 tags, terms which are used to categorize questions. Google Cloud provides all Stack Overflow data, including tags, as a public dataset in BigQuery, the enterprise data warehouse service.

To obtain an up-to-date list of technical skills, you can extract them from the public dataset of BigQuery directly.

With cloud native solutions, we favor simplicity. The simplest way is to store a list of terms in a file. Cloud Storage is the Google Cloud service for storing object data like this. If you store a file in Cloud Storage, it will be easily consumable by other services.

You need a small amount of code to extract Stack Overflow tags from the BigQuery dataset and to store the resultant list of skills as a file in Cloud Storage. Cloud Functions is an effective way of running this type of glue code, as you only pay for the short amount of time the code is running. This is a serverless solution, meaning it is a fully managed service with no servers to maintain.

You need to update the list of skills once a week. Cloud Scheduler is a fully managed service that runs jobs on a schedule. You can use this to schedule the execution of a Cloud Function. You can use this to create a new list of skills every week and retry if there is a failure.

Leave a Reply

Your email address will not be published. Required fields are marked *