logo

Cheatsheet - BigQuery (bq)

The bq tool is a Python-based command-line interface for Google BigQuery, part of the Google Cloud SDK.

Initial Setup & Authentication

First, ensure you have authenticated and set your project.

# Authenticate with your user account (opens a browser window)
gcloud auth login

# Set your default Google Cloud project
gcloud config set project [YOUR_PROJECT_ID]

# Verify bq is working (shows your default project and location)
bq show --format=prettyjson

Core Syntax

Most commands follow this pattern: bq [COMMAND] [RESOURCE].

Global Flags (use after bq):

  • --location=[LOCATION]: Specify a region (e.g., us-central1, EU). Overrides the default.
  • --project_id=[PROJECT_ID]: Run the command in a different project.
  • --format=[FORMAT]: Change output format (e.g., json, csv, prettyjson).

1. Querying Data

This is the most common use case for the bq tool.

# Run a query using standard SQL (the default)
bq query 'SELECT station_id, name FROM `bigquery-public-data.san_francisco.bikeshare_stations` LIMIT 5'

# Use --nouse_legacy_sql for clarity and to avoid accidental legacy SQL usage
bq query --nouse_legacy_sql 'SELECT ...'

# Check a query's validity and estimate cost without running it (CRITICAL for cost control)
bq query --dry_run 'SELECT ...'

# Save query results to a new table (will fail if the table exists)
bq query --destination_table my_dataset.new_table 'SELECT ...'

# Overwrite an existing table with query results
bq query --replace --destination_table my_dataset.existing_table 'SELECT ...'

# Append query results to an existing table
bq query --append_table --destination_table my_dataset.existing_table 'SELECT ...'

# Run a query with named parameters for scripting
bq query --parameter='status:STRING:active' 'SELECT * FROM my_dataset.users WHERE status = @status'

2. Inspecting Resources (List, Show, Preview)

List - bq ls

# List all datasets in the current project
bq ls

# List all tables and views within a specific dataset
bq ls my_dataset

# List resources in another project
bq ls other_project:some_dataset

# List recent jobs (useful for finding job IDs)
bq ls -j -n 10

Describe - bq show

# Show detailed information and schema for a table
bq show my_dataset.my_table

# Show information about a dataset
bq show my_dataset

# Show full details about a specific job
bq show --job [JOB_ID]

Preview - bq head

# Show the first 100 rows of a table (default)
bq head my_dataset.my_table

# Show a specific number of rows
bq head -n 10 my_dataset.my_table

3. Managing Datasets

# Create a new dataset in your default location
bq mk my_new_dataset

# Create a dataset with a description and a specific location
bq mk --location=US --description="My cool dataset" my_other_dataset

# Delete an empty dataset (will prompt for confirmation)
bq rm my_dataset

# Force delete a dataset AND all tables within it (USE WITH CAUTION)
bq rm -r -f my_dataset

4. Managing Tables

# Create a table with an explicit schema
bq mk --table my_dataset.new_table name:STRING,signup_date:DATE,score:FLOAT64

# Create a partitioned and clustered table (common for performance)
bq mk --table my_dataset.logs \
      --time_partitioning_field event_timestamp \
      --clustering_fields user_id,event_type \
      event_timestamp:TIMESTAMP,user_id:STRING,event_type:STRING

# Delete a table (use -f to skip the confirmation prompt)
bq rm -f my_dataset.my_table

# Copy a table
bq cp my_dataset.source_table my_dataset.destination_table

# Copy a table to a different project and dataset
bq cp project_a:dataset_a.table_a project_b:dataset_b.table_b

# Update a table's description or expiration
bq update --description="Updated description" my_dataset.my_table
bq update --expiration 3600 my_dataset.my_table  # Expires in 1 hour

5. Loading Data (Ingestion)

bq load

Loads data from a local file or from Google Cloud Storage (GCS).

# Load a local CSV file into an existing table, skipping the header
bq load --source_format=CSV --skip_leading_rows=1 \
  my_dataset.my_table ./data.csv

# Load from GCS and auto-detect the schema to CREATE a new table
bq load --autodetect --source_format=CSV \
  my_dataset.new_table gs://my-bucket/data/transactions.csv

# Load a newline-delimited JSON file from GCS
bq load --source_format=NEWLINE_DELIMITED_JSON \
  my_dataset.logs_table gs://my-bucket/logs/log-*.json

# Overwrite all data in the destination table
bq load --replace ...

6. Exporting Data (Extraction)

bq extract

Exports data from a BigQuery table to Google Cloud Storage (GCS).

# Export an entire table to a single CSV file in GCS
bq extract my_dataset.my_table gs://my-bucket/exports/table.csv

# Export as compressed, newline-delimited JSON, sharded into multiple files
# The wildcard * is required when the data is larger than 1 GB.
bq extract --destination_format=NEWLINE_DELIMITED_JSON --compression=GZIP \
  my_dataset.big_table gs://my-bucket/exports/big_table-*.json.gz