BigQuery 102-Practical Use Cases

Sean Zhang
3 min readAug 5, 2022

--

If you find this article useful, please follow me. I write articles about skills you use in your daily work.

102.1 — I want my table back

To retrieve a table at a timestamp, use SYSTEM_TIME:

To restore a table:

To get the table back to the prod environment:

If the table has been deleted:

102.2 I need to control access to the tables I own.

When you give a service account dataset level, users have access to all

the tables in the dataset.

— To grant on a table level:

Click Table, Share Table instead of Share dataset

— If there are certain columns you don’t want to share with the user:

Use SELECT to filter columns and use WHERE to filter rows.
Save result: Set a destination table for query results.
Downside: This derived table is a PoT snapshot table, so there is a lag.

— You can create a view instead

After you created the view, if the user does not have access to the original table, they cannot retrieve data from the view. So you will have to grant the view to run the query on your behalf following the steps below:Go to original table 1. Click “Share” — Authorized Views — Column level permission (E.g. PII)— Click Through Schema, Set, Data Catalog — Policy Tags.2. Create tags like ‘high (e.g. SSN)’, ‘medium(e.g. Customer Name)’3. Go back to table, select schema, add policy tag to the table.Note: IAM like BigQuery Admin will be overriden by Policy Tags.To make that tag’s data avaialble to a user:— Data Catalog — Add members to a tag with Fine-Grained Reader role.

102.3 Querying is slow or generates a lot of costs.

Usually, there is a trade-off between a normalized table and a de-normalized table. Normalized tables use the Snowflake/Star schema, which requires a lot of joins thus increasing the cost of querying. De-normalized tables (Storing as many fields as possible in one table)require fewer joins and usually generate less cost in BigQuery.

BigQuery offers better approaches: line. item, partitions, and clusters.

For nested structure in big query, here is the definition on Google Doc(It is really not as intuitive):

To create a column with nested data, set the data type of the column to RECORD in the schema. A RECORD can be accessed as a STRUCT type in standard SQL. A STRUCT is a container of ordered fields.To create a column with repeated data, set the mode of the column to REPEATED in the schema. A repeated field can be accessed as an ARRAY type in standard SQL.A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY. An array cannot contain another array directly. For more information, see Declaring an ARRAY type.

Full Use Cases as Below:

References:

“BigQuery: Converting Normalized Datasets to Alternative Schema.” BigQuery: Converting Normalized Datasets to Alternative Schema, 15 Nov. 11AD, https://www.youtube.com/watch?v=XKpXBvdOOM4&list=PLuJRcdtonlDCzqZZ-t5-yoxcO97_I6P6B.

--

--