Setup a dataflow to write messages from pub/sub subscription to BigQuery table in Google Cloud Platform (GCP)
Google Cloud Dataflow is data processing service that can be used for streaming and batch applications. Users can setup pipelines in Dataflow to integrate and process large datasets.
With pub/sub, users can setup dataflow pipelines to write messages from a pub/sub topic or subscription to a BigQuery table.
IoT Cloud Tester application provides an easy interface to setup a dataflow to write messages from pub/sub topic to a BigQuery table in Google Cloud Platform.
To setup a dataflow to write messages from pub/sub topic to BigQuery table,
In the 'Dataflow' tab, click on 'Create Job' tab.
Enter the job name
Select 'Pub/Sub Topic to BigQuery' option
Get the list of topics and select one
Get the available cloud storage buckets for the project and select one.
Enter the file name. This file is used by the dataflow.
Setup the Dataset and Table to be use to write the messages from the pub/sub topic. Note that the table schema should match the pub/sub topic message structure.
Dataflow Job 'subscription_to_bq' is created immediately with pending status.
A post request is made to GCP to create the dataflow job. In this case, we're using the pre-build template PubSub_Subscription_to_BigQuery to write pub/sub subscription messages to BigQuery.
POST https://dataflow.googleapis.com/v1b3/projects/second-inquiry-315605/locations/asia-east1/templates:launch?gcsPath=gs://dataflow-templates/latest/PubSub_Subscription_to_BigQuery HTTP/1.1
Server response for job creation.
The newly created dataflow job can be viewed in the Google console.
Now let us see the dataflow in action. We'll have a device publish message to that subscription and verify that the data is written to the BigQuery table.
Below device 'dev_23992' is publishing to topic 'environment'. We have a subscription 'subscription_628331' to topic 'environment'. Above we also have setup a dataflow job to write that subscription messages to the 'environment_subscription' table in the device_data dataset in BigQuery.
We can verify in BiqQuery that the streaming subscription messages are written to the table.