Cheatsheet - BigQuery (bq)
The bq
tool is a Python-based command-line interface for Google BigQuery, part of the Google Cloud SDK.
Initial Setup & Authentication
First, ensure you have authenticated and set your project.
# Authenticate with your user account (opens a browser window)
gcloud auth login
# Set your default Google Cloud project
gcloud config set project [YOUR_PROJECT_ID]
# Verify bq is working (shows your default project and location)
bq show --format=prettyjson
Core Syntax
Most commands follow this pattern: bq [COMMAND] [RESOURCE]
.
Global Flags (use after bq
):
--location=[LOCATION]
: Specify a region (e.g.,us-central1
,EU
). Overrides the default.--project_id=[PROJECT_ID]
: Run the command in a different project.--format=[FORMAT]
: Change output format (e.g.,json
,csv
,prettyjson
).
1. Querying Data
This is the most common use case for the bq
tool.
# Run a query using standard SQL (the default)
bq query 'SELECT station_id, name FROM `bigquery-public-data.san_francisco.bikeshare_stations` LIMIT 5'
# Use --nouse_legacy_sql for clarity and to avoid accidental legacy SQL usage
bq query --nouse_legacy_sql 'SELECT ...'
# Check a query's validity and estimate cost without running it (CRITICAL for cost control)
bq query --dry_run 'SELECT ...'
# Save query results to a new table (will fail if the table exists)
bq query --destination_table my_dataset.new_table 'SELECT ...'
# Overwrite an existing table with query results
bq query --replace --destination_table my_dataset.existing_table 'SELECT ...'
# Append query results to an existing table
bq query --append_table --destination_table my_dataset.existing_table 'SELECT ...'
# Run a query with named parameters for scripting
bq query --parameter='status:STRING:active' 'SELECT * FROM my_dataset.users WHERE status = @status'
2. Inspecting Resources (List, Show, Preview)
List - bq ls
# List all datasets in the current project
bq ls
# List all tables and views within a specific dataset
bq ls my_dataset
# List resources in another project
bq ls other_project:some_dataset
# List recent jobs (useful for finding job IDs)
bq ls -j -n 10
Describe - bq show
# Show detailed information and schema for a table
bq show my_dataset.my_table
# Show information about a dataset
bq show my_dataset
# Show full details about a specific job
bq show --job [JOB_ID]
Preview - bq head
# Show the first 100 rows of a table (default)
bq head my_dataset.my_table
# Show a specific number of rows
bq head -n 10 my_dataset.my_table
3. Managing Datasets
# Create a new dataset in your default location
bq mk my_new_dataset
# Create a dataset with a description and a specific location
bq mk --location=US --description="My cool dataset" my_other_dataset
# Delete an empty dataset (will prompt for confirmation)
bq rm my_dataset
# Force delete a dataset AND all tables within it (USE WITH CAUTION)
bq rm -r -f my_dataset
4. Managing Tables
# Create a table with an explicit schema
bq mk --table my_dataset.new_table name:STRING,signup_date:DATE,score:FLOAT64
# Create a partitioned and clustered table (common for performance)
bq mk --table my_dataset.logs \
--time_partitioning_field event_timestamp \
--clustering_fields user_id,event_type \
event_timestamp:TIMESTAMP,user_id:STRING,event_type:STRING
# Delete a table (use -f to skip the confirmation prompt)
bq rm -f my_dataset.my_table
# Copy a table
bq cp my_dataset.source_table my_dataset.destination_table
# Copy a table to a different project and dataset
bq cp project_a:dataset_a.table_a project_b:dataset_b.table_b
# Update a table's description or expiration
bq update --description="Updated description" my_dataset.my_table
bq update --expiration 3600 my_dataset.my_table # Expires in 1 hour
5. Loading Data (Ingestion)
bq load
Loads data from a local file or from Google Cloud Storage (GCS).
# Load a local CSV file into an existing table, skipping the header
bq load --source_format=CSV --skip_leading_rows=1 \
my_dataset.my_table ./data.csv
# Load from GCS and auto-detect the schema to CREATE a new table
bq load --autodetect --source_format=CSV \
my_dataset.new_table gs://my-bucket/data/transactions.csv
# Load a newline-delimited JSON file from GCS
bq load --source_format=NEWLINE_DELIMITED_JSON \
my_dataset.logs_table gs://my-bucket/logs/log-*.json
# Overwrite all data in the destination table
bq load --replace ...
6. Exporting Data (Extraction)
bq extract
Exports data from a BigQuery table to Google Cloud Storage (GCS).
# Export an entire table to a single CSV file in GCS
bq extract my_dataset.my_table gs://my-bucket/exports/table.csv
# Export as compressed, newline-delimited JSON, sharded into multiple files
# The wildcard * is required when the data is larger than 1 GB.
bq extract --destination_format=NEWLINE_DELIMITED_JSON --compression=GZIP \
my_dataset.big_table gs://my-bucket/exports/big_table-*.json.gz