PlaybookBest Practices

The Snowflake Cost Optimization Playbook

A practitioner's guide to cutting Snowflake costs 30-50% with copy-paste SQL queries, warehouse sizing formulas, and configuration templates. Includes a detailed case study showing $847K → $428K annual savings.

November 25, 202535 min read
#snowflake#cost optimization#finops#data warehouse#sql#query optimization#warehouse sizing#storage management#resource monitors#account usage

Key Information

Type

Playbook

Category

Best Practices

Reading Time

35 min read

Last Updated

November 25, 2025

What You'll Learn

The only playbook with copy-paste SQL queries you can run in the next 10 minutes to find exactly where your Snowflake spend is going—and how to fix it.

Why Another Cost Optimization Guide?

Because most guides tell you to "right-size your warehouses" without telling you how. This playbook is different. Every recommendation comes with:

  • SQL you can copy-paste and run right now
  • Specific thresholds (not "it depends")
  • Configuration templates you can adapt
  • Real numbers from actual implementations

We've used this framework at 40+ companies, from Series A startups burning $15K/month to enterprises spending $2M+/year. The average result: 35-45% cost reduction within 8 weeks.

Quick Wins: Run These First

Before diving into the full framework, here are three queries that find the most common sources of waste. Run these in the next 10 minutes—most teams find $5-20K/month in quick wins.

Quick Win #1: Warehouses That Never Sleep

Warehouses with auto-suspend > 5 minutes or no auto-suspend are the #1 source of waste we see. This query finds them:

-- Find warehouses bleeding money while idle
SELECT
    warehouse_name,
    warehouse_size,
    auto_suspend,
    CASE
        WHEN auto_suspend IS NULL THEN '🔴 NEVER SUSPENDS'
        WHEN auto_suspend > 300 THEN '🟡 SUSPENDS AFTER ' || (auto_suspend/60) || ' MIN'
        ELSE '🟢 OK (' || auto_suspend || 's)'
    END as suspend_status,
    -- Estimate monthly waste from poor suspend settings
    CASE warehouse_size
        WHEN 'X-Small' THEN 1
        WHEN 'Small' THEN 2
        WHEN 'Medium' THEN 4
        WHEN 'Large' THEN 8
        WHEN 'X-Large' THEN 16
        WHEN '2X-Large' THEN 32
        WHEN '3X-Large' THEN 64
        WHEN '4X-Large' THEN 128
        ELSE 0
    END *
    CASE
        WHEN auto_suspend IS NULL THEN 720  -- Assume 720 wasted hours/month
        WHEN auto_suspend > 300 THEN 200    -- Assume 200 wasted hours/month
        ELSE 0
    END * 2 as estimated_monthly_waste_usd  -- $2/credit estimate
FROM snowflake.account_usage.warehouses
WHERE deleted IS NULL
ORDER BY estimated_monthly_waste_usd DESC;

🎯 Action: Set auto_suspend to 60 seconds for interactive warehouses, 300 seconds for ETL warehouses. This alone typically saves $3-8K/month.

Quick Win #2: Your Most Expensive Queries

Find the queries eating your budget. These are your optimization targets:

-- Top 20 most expensive queries (last 30 days)
SELECT
    query_id,
    query_type,
    warehouse_name,
    warehouse_size,
    user_name,
    ROUND(total_elapsed_time/1000, 2) as runtime_seconds,
    ROUND(credits_used_cloud_services, 4) as cloud_credits,
    -- Estimate query cost
    ROUND(
        (total_elapsed_time/1000/3600) *
        CASE warehouse_size
            WHEN 'X-Small' THEN 1
            WHEN 'Small' THEN 2
            WHEN 'Medium' THEN 4
            WHEN 'Large' THEN 8
            WHEN 'X-Large' THEN 16
            WHEN '2X-Large' THEN 32
            WHEN '3X-Large' THEN 64
            WHEN '4X-Large' THEN 128
            ELSE 1
        END * 2,  -- $2/credit
    2) as estimated_cost_usd,
    SUBSTRING(query_text, 1, 200) as query_preview
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
    AND warehouse_name IS NOT NULL
    AND total_elapsed_time > 0
ORDER BY estimated_cost_usd DESC
LIMIT 20;

🎯 Action: Identify the top 5 queries and investigate. Common fixes: add clustering keys, rewrite JOINs, materialize intermediate results.

Quick Win #3: Storage Bloat Check

Time travel and fail-safe can 3-4x your storage costs without you realizing:

-- Storage breakdown by type
SELECT
    ROUND(SUM(active_bytes)/POWER(1024,4), 2) as active_tb,
    ROUND(SUM(time_travel_bytes)/POWER(1024,4), 2) as time_travel_tb,
    ROUND(SUM(failsafe_bytes)/POWER(1024,4), 2) as failsafe_tb,
    ROUND(SUM(retained_for_clone_bytes)/POWER(1024,4), 2) as clone_retained_tb,
    ROUND(
        (SUM(time_travel_bytes) + SUM(failsafe_bytes) + SUM(retained_for_clone_bytes)) /
        NULLIF(SUM(active_bytes), 0) * 100
    , 1) as overhead_percentage
FROM snowflake.account_usage.table_storage_metrics
WHERE deleted IS NULL;

-- Tables with highest time travel overhead
SELECT
    table_catalog as database_name,
    table_schema,
    table_name,
    ROUND(active_bytes/POWER(1024,3), 2) as active_gb,
    ROUND(time_travel_bytes/POWER(1024,3), 2) as time_travel_gb,
    ROUND(failsafe_bytes/POWER(1024,3), 2) as failsafe_gb,
    ROUND((time_travel_bytes + failsafe_bytes) / NULLIF(active_bytes,0) * 100, 1) as overhead_pct
FROM snowflake.account_usage.table_storage_metrics
WHERE deleted IS NULL
    AND active_bytes > 0
ORDER BY (time_travel_bytes + failsafe_bytes) DESC
LIMIT 20;

🎯 Action: For staging/temp tables, set DATA_RETENTION_TIME_IN_DAYS = 0 and use TRANSIENT tables. For most production tables, 1-7 days of time travel is sufficient.

Step 1: The Complete Cost Audit

Now let's do a comprehensive audit. This section gives you full visibility into where every dollar goes.

1.1 Credit Consumption by Warehouse

Get a breakdown of compute costs by warehouse over the last 30 days:

-- Credit consumption by warehouse (last 30 days)
WITH daily_usage AS (
    SELECT
        warehouse_name,
        DATE_TRUNC('day', start_time) as usage_date,
        SUM(credits_used) as daily_credits
    FROM snowflake.account_usage.warehouse_metering_history
    WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
    GROUP BY 1, 2
)
SELECT
    warehouse_name,
    ROUND(SUM(daily_credits), 2) as total_credits,
    ROUND(SUM(daily_credits) * 2, 2) as estimated_cost_usd,  -- Adjust rate as needed
    ROUND(AVG(daily_credits), 2) as avg_daily_credits,
    COUNT(DISTINCT usage_date) as active_days,
    ROUND(SUM(daily_credits) / 30 * 365 * 2, 0) as annualized_cost_usd
FROM daily_usage
GROUP BY warehouse_name
ORDER BY total_credits DESC;

1.2 Credit Consumption by User

Identify which users/service accounts are driving costs:

-- Credit consumption by user (last 30 days)
SELECT
    user_name,
    COUNT(DISTINCT query_id) as query_count,
    ROUND(SUM(total_elapsed_time)/1000/3600, 2) as total_runtime_hours,
    COUNT(DISTINCT warehouse_name) as warehouses_used,
    -- Estimate cost based on runtime and warehouse size
    ROUND(SUM(
        (total_elapsed_time/1000/3600) *
        CASE warehouse_size
            WHEN 'X-Small' THEN 1 WHEN 'Small' THEN 2 WHEN 'Medium' THEN 4
            WHEN 'Large' THEN 8 WHEN 'X-Large' THEN 16 WHEN '2X-Large' THEN 32
            WHEN '3X-Large' THEN 64 WHEN '4X-Large' THEN 128 ELSE 1
        END * 2
    ), 2) as estimated_cost_usd
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
    AND warehouse_name IS NOT NULL
GROUP BY user_name
ORDER BY estimated_cost_usd DESC
LIMIT 25;

1.3 Hourly Usage Patterns

Understand when your warehouses are active. This informs scheduling and auto-suspend decisions:

-- Hourly usage patterns (what hours are warehouses active?)
SELECT
    HOUR(start_time) as hour_of_day,
    DAYNAME(start_time) as day_of_week,
    COUNT(*) as query_count,
    ROUND(SUM(credits_used_cloud_services), 2) as cloud_credits,
    COUNT(DISTINCT warehouse_name) as active_warehouses
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -14, CURRENT_TIMESTAMP())
    AND warehouse_name IS NOT NULL
GROUP BY 1, 2
ORDER BY
    CASE day_of_week
        WHEN 'Mon' THEN 1 WHEN 'Tue' THEN 2 WHEN 'Wed' THEN 3
        WHEN 'Thu' THEN 4 WHEN 'Fri' THEN 5 WHEN 'Sat' THEN 6 ELSE 7
    END,
    hour_of_day;

🎯 Key insight: If you see significant weekend/night usage for interactive warehouses, investigate. Often it's a stuck process or poorly scheduled job.

1.4 Query Performance Distribution

Understand the shape of your query workload:

-- Query duration distribution
SELECT
    warehouse_name,
    COUNT(*) as total_queries,
    COUNT(CASE WHEN total_elapsed_time < 1000 THEN 1 END) as under_1s,
    COUNT(CASE WHEN total_elapsed_time BETWEEN 1000 AND 10000 THEN 1 END) as "1s_to_10s",
    COUNT(CASE WHEN total_elapsed_time BETWEEN 10000 AND 60000 THEN 1 END) as "10s_to_1m",
    COUNT(CASE WHEN total_elapsed_time BETWEEN 60000 AND 300000 THEN 1 END) as "1m_to_5m",
    COUNT(CASE WHEN total_elapsed_time > 300000 THEN 1 END) as over_5m,
    ROUND(AVG(total_elapsed_time)/1000, 2) as avg_seconds,
    ROUND(MEDIAN(total_elapsed_time)/1000, 2) as median_seconds,
    ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time)/1000, 2) as p95_seconds
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
    AND warehouse_name IS NOT NULL
    AND query_type IN ('SELECT', 'INSERT', 'UPDATE', 'DELETE', 'MERGE', 'CREATE_TABLE_AS_SELECT')
GROUP BY warehouse_name
ORDER BY total_queries DESC;

🎯 If median is <1s but avg is >30s, you have a few expensive queries dragging up costs. Focus optimization there.

Step 2: Warehouse Optimization

Warehouse configuration is where most teams leave 20-40% savings on the table. Here's how to right-size.

2.1 The Warehouse Sizing Decision Matrix

Use this matrix to determine the right warehouse size:

  • X-Small (1 credit/hr): Ad-hoc queries, dbt runs < 50 models, light dashboards
  • Small (2 credits/hr): Standard BI dashboards, dbt runs 50-200 models
  • Medium (4 credits/hr): Heavy BI workloads, dbt runs 200-500 models, moderate ETL
  • Large (8 credits/hr): Complex transformations, large data volumes (>100GB processed)
  • X-Large+ (16+ credits/hr): Data science workloads, massive historical backfills, ML training

Important: Larger isn't always faster. A query scanning 10GB won't run faster on 4XL vs Medium. Match size to actual parallelization needs.

2.2 Analyze Current Warehouse Efficiency

This query shows whether your warehouses are correctly sized:

-- Warehouse efficiency analysis
WITH warehouse_stats AS (
    SELECT
        warehouse_name,
        warehouse_size,
        AVG(avg_running) as avg_concurrent_queries,
        AVG(avg_queued_load) as avg_queue_depth,
        AVG(avg_blocked) as avg_blocked
    FROM snowflake.account_usage.warehouse_load_history
    WHERE start_time >= DATEADD(day, -14, CURRENT_TIMESTAMP())
    GROUP BY 1, 2
),
query_stats AS (
    SELECT
        warehouse_name,
        COUNT(*) as query_count,
        AVG(total_elapsed_time) as avg_runtime_ms,
        SUM(CASE WHEN queued_overload_time > 0 THEN 1 ELSE 0 END) as queued_queries,
        AVG(queued_overload_time) as avg_queue_time_ms
    FROM snowflake.account_usage.query_history
    WHERE start_time >= DATEADD(day, -14, CURRENT_TIMESTAMP())
        AND warehouse_name IS NOT NULL
    GROUP BY 1
)
SELECT
    w.warehouse_name,
    w.warehouse_size,
    ROUND(w.avg_concurrent_queries, 2) as avg_concurrent,
    ROUND(w.avg_queue_depth, 2) as avg_queue_depth,
    q.query_count,
    ROUND(q.avg_runtime_ms/1000, 2) as avg_runtime_sec,
    q.queued_queries,
    ROUND(q.avg_queue_time_ms/1000, 2) as avg_queue_sec,
    CASE
        WHEN w.avg_concurrent_queries < 1 AND w.avg_queue_depth < 0.1
            THEN '⬇️ Consider downsizing'
        WHEN w.avg_queue_depth > 1 OR q.queued_queries > q.query_count * 0.1
            THEN '⬆️ Consider upsizing or multi-cluster'
        ELSE '✅ Appropriately sized'
    END as recommendation
FROM warehouse_stats w
LEFT JOIN query_stats q ON w.warehouse_name = q.warehouse_name
ORDER BY q.query_count DESC NULLS LAST;

2.3 Optimal Auto-Suspend Settings

Use these settings based on workload type:

  • Interactive/BI warehouses: 60 seconds (users won't notice 1-2s cold start)
  • ETL/Scheduled warehouses: 300 seconds (prevents thrashing on sequential jobs)
  • Data Science/Ad-hoc: 120 seconds (balance between cost and convenience)
  • Streaming ingestion: Consider always-on only if queries are truly continuous

Apply these settings:

-- Update warehouse auto-suspend settings
-- Interactive warehouse
ALTER WAREHOUSE BI_WAREHOUSE SET
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE;

-- ETL warehouse
ALTER WAREHOUSE ETL_WAREHOUSE SET
    AUTO_SUSPEND = 300
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE;

-- Create a new optimized warehouse from scratch
CREATE WAREHOUSE IF NOT EXISTS ANALYTICS_WH
    WAREHOUSE_SIZE = 'SMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 1
    SCALING_POLICY = 'STANDARD'
    COMMENT = 'Optimized for dashboard queries';

2.4 Multi-Cluster Configuration

Multi-cluster warehouses can reduce queuing but increase costs. Use them when:

  • Query queue time is consistently >5 seconds
  • You have concurrent user spikes (e.g., 9am dashboard rush)
  • Query patterns are unpredictable

Configuration:

-- Multi-cluster warehouse for BI workloads
ALTER WAREHOUSE BI_WAREHOUSE SET
    MIN_CLUSTER_COUNT = 1      -- Scale to zero cost when idle
    MAX_CLUSTER_COUNT = 3      -- Handle up to 3x normal concurrency
    SCALING_POLICY = 'STANDARD';  -- Add clusters when queued

-- For very predictable workloads, consider ECONOMY policy
-- (waits longer before scaling, but cheaper)
ALTER WAREHOUSE REPORTING_WH SET
    SCALING_POLICY = 'ECONOMY';

Step 3: Query Optimization

The most impactful optimization is fixing bad queries. Here's how to find and fix them.

3.1 Find Queries with Full Table Scans

Full table scans are often unnecessary and expensive:

-- Queries with high bytes scanned (potential full table scans)
SELECT
    query_id,
    user_name,
    warehouse_name,
    ROUND(bytes_scanned/POWER(1024,3), 2) as gb_scanned,
    ROUND(bytes_scanned/NULLIF(bytes_written_to_result,0), 0) as scan_to_result_ratio,
    ROUND(total_elapsed_time/1000, 2) as runtime_sec,
    ROUND(partitions_scanned/NULLIF(partitions_total,0) * 100, 1) as pct_partitions_scanned,
    SUBSTRING(query_text, 1, 300) as query_preview
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
    AND bytes_scanned > 10737418240  -- > 10GB scanned
    AND warehouse_name IS NOT NULL
    AND query_type = 'SELECT'
ORDER BY bytes_scanned DESC
LIMIT 20;

🎯 If pct_partitions_scanned is >50% on large tables, you likely need clustering keys or query rewrites.

3.2 Queries Missing Partition Pruning

Partition pruning is Snowflake's #1 performance feature. Make sure it's working:

-- Find queries where partition pruning isn't effective
SELECT
    query_id,
    query_type,
    warehouse_name,
    partitions_scanned,
    partitions_total,
    ROUND(partitions_scanned/NULLIF(partitions_total,0) * 100, 1) as pct_scanned,
    ROUND(bytes_scanned/POWER(1024,3), 2) as gb_scanned,
    ROUND(total_elapsed_time/1000, 2) as runtime_sec,
    SUBSTRING(query_text, 1, 200) as query_preview
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
    AND partitions_total > 100  -- Only look at partitioned tables
    AND partitions_scanned > partitions_total * 0.5  -- Scanning >50% of partitions
    AND query_type IN ('SELECT', 'INSERT', 'UPDATE', 'DELETE', 'MERGE')
    AND total_elapsed_time > 10000  -- > 10 seconds
ORDER BY partitions_scanned DESC
LIMIT 20;

3.3 Common Query Anti-Patterns to Fix

These patterns kill performance. Search your codebase for them:

❌ Anti-pattern 1: SELECT * when you only need a few columns

-- BAD: Scans all columns
SELECT * FROM orders WHERE order_date > '2024-01-01';

-- GOOD: Only scan needed columns (faster and cheaper)
SELECT order_id, customer_id, total_amount
FROM orders
WHERE order_date > '2024-01-01';

❌ Anti-pattern 2: Functions on filter columns (prevents pruning)

-- BAD: Function on column prevents partition pruning
SELECT * FROM events
WHERE DATE(event_timestamp) = '2024-01-15';

-- GOOD: Range filter allows pruning
SELECT * FROM events
WHERE event_timestamp >= '2024-01-15'
  AND event_timestamp < '2024-01-16';

❌ Anti-pattern 3: Cartesian joins from missing join conditions

-- BAD: Accidental cross join (explosive!)
SELECT a.*, b.*
FROM table_a a, table_b b
WHERE a.status = 'active';

-- GOOD: Explicit join condition
SELECT a.*, b.*
FROM table_a a
INNER JOIN table_b b ON a.id = b.a_id
WHERE a.status = 'active';

❌ Anti-pattern 4: ORDER BY without LIMIT

-- BAD: Sorts entire result set
SELECT * FROM large_table ORDER BY created_at DESC;

-- GOOD: Only sort what you need
SELECT * FROM large_table ORDER BY created_at DESC LIMIT 1000;

3.4 Identifying Repeated Expensive Queries

Same query running multiple times? Prime candidate for caching or materialization:

-- Find frequently repeated expensive queries
SELECT
    query_parameterized_hash,
    COUNT(*) as execution_count,
    SUM(total_elapsed_time)/1000 as total_runtime_sec,
    AVG(total_elapsed_time)/1000 as avg_runtime_sec,
    SUM(bytes_scanned)/POWER(1024,3) as total_gb_scanned,
    MIN(SUBSTRING(query_text, 1, 200)) as query_sample
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
    AND query_type = 'SELECT'
    AND total_elapsed_time > 5000  -- > 5 seconds
GROUP BY query_parameterized_hash
HAVING COUNT(*) > 10  -- Run more than 10 times
ORDER BY total_runtime_sec DESC
LIMIT 20;

🎯 If the same query runs 100+ times/day, consider materializing the results into a table refreshed on a schedule.

Step 4: Storage Optimization

Storage is $40/TB/month. Time travel and fail-safe can 2-4x that cost silently.

4.1 Time Travel Configuration by Table Type

Not all tables need 90 days of time travel. Use this guide:

  • Staging/temp tables: 0 days (use TRANSIENT tables)
  • Intermediate dbt models: 1 day
  • Production fact tables: 7 days
  • Critical dimension tables: 14-30 days
  • Financial/audit tables: 90 days (regulatory requirement)
-- Reduce time travel on staging tables
ALTER TABLE staging.raw_events SET DATA_RETENTION_TIME_IN_DAYS = 0;

-- Create transient staging schema (no time travel or fail-safe)
CREATE TRANSIENT SCHEMA IF NOT EXISTS staging_transient;

-- Move high-churn tables to transient
CREATE TRANSIENT TABLE staging_transient.events_raw
AS SELECT * FROM staging.events_raw;

-- Standard production table with 7 days
ALTER TABLE production.orders SET DATA_RETENTION_TIME_IN_DAYS = 7;

4.2 Find and Clean Up Orphaned Clones

Clones retain storage until the source changes. Old clones can accumulate costs:

-- Find tables with clone retention overhead
SELECT
    table_catalog,
    table_schema,
    table_name,
    ROUND(active_bytes/POWER(1024,3), 2) as active_gb,
    ROUND(retained_for_clone_bytes/POWER(1024,3), 2) as clone_retained_gb,
    CASE
        WHEN retained_for_clone_bytes > active_bytes * 0.5
        THEN '⚠️ High clone overhead'
        ELSE '✅ OK'
    END as status
FROM snowflake.account_usage.table_storage_metrics
WHERE deleted IS NULL
    AND retained_for_clone_bytes > 1073741824  -- > 1GB clone retention
ORDER BY retained_for_clone_bytes DESC
LIMIT 20;

-- Find all clones in the account
SELECT
    table_catalog,
    table_schema,
    table_name,
    clone_group_id,
    is_transient,
    created as clone_created
FROM snowflake.account_usage.tables
WHERE clone_group_id IS NOT NULL
    AND deleted IS NULL
ORDER BY created;

🎯 Drop old development/test clones that are no longer needed. Each clone can hold GB of retained storage.

4.3 Identify Unused Tables

Tables that haven't been queried in 90+ days are candidates for archival or deletion:

-- Tables not accessed in 90+ days
WITH last_access AS (
    SELECT
        database_name,
        schema_name,
        table_name,
        MAX(last_load_time) as last_loaded,
        MAX(last_altered) as last_altered
    FROM snowflake.account_usage.tables t
    WHERE deleted IS NULL
        AND table_type = 'BASE TABLE'
    GROUP BY 1, 2, 3
),
table_queries AS (
    SELECT DISTINCT
        database_name,
        schema_name,
        table_name
    FROM snowflake.account_usage.access_history,
    LATERAL FLATTEN(base_objects_accessed) f
    WHERE query_start_time >= DATEADD(day, -90, CURRENT_TIMESTAMP())
)
SELECT
    la.database_name,
    la.schema_name,
    la.table_name,
    la.last_loaded,
    la.last_altered,
    ROUND(tsm.active_bytes/POWER(1024,3), 2) as size_gb,
    CASE WHEN tq.table_name IS NULL THEN '⚠️ NOT QUERIED IN 90 DAYS' ELSE '✅ Active' END as status
FROM last_access la
LEFT JOIN table_queries tq
    ON la.database_name = tq.database_name
    AND la.schema_name = tq.schema_name
    AND la.table_name = tq.table_name
LEFT JOIN snowflake.account_usage.table_storage_metrics tsm
    ON la.database_name = tsm.table_catalog
    AND la.schema_name = tsm.table_schema
    AND la.table_name = tsm.table_name
WHERE tq.table_name IS NULL
    AND tsm.active_bytes > 104857600  -- > 100MB
ORDER BY tsm.active_bytes DESC
LIMIT 50;

4.4 Storage Cleanup Script

Run this periodically to identify storage optimization opportunities:

-- Comprehensive storage audit
WITH storage_summary AS (
    SELECT
        table_catalog as database_name,
        SUM(active_bytes) as active_bytes,
        SUM(time_travel_bytes) as tt_bytes,
        SUM(failsafe_bytes) as fs_bytes,
        SUM(retained_for_clone_bytes) as clone_bytes
    FROM snowflake.account_usage.table_storage_metrics
    WHERE deleted IS NULL
    GROUP BY 1
)
SELECT
    database_name,
    ROUND(active_bytes/POWER(1024,4), 3) as active_tb,
    ROUND(tt_bytes/POWER(1024,4), 3) as time_travel_tb,
    ROUND(fs_bytes/POWER(1024,4), 3) as failsafe_tb,
    ROUND(clone_bytes/POWER(1024,4), 3) as clone_retained_tb,
    ROUND((active_bytes + tt_bytes + fs_bytes + clone_bytes)/POWER(1024,4), 3) as total_tb,
    ROUND((tt_bytes + fs_bytes + clone_bytes)/NULLIF(active_bytes,0) * 100, 1) as overhead_pct,
    ROUND((active_bytes + tt_bytes + fs_bytes + clone_bytes)/POWER(1024,4) * 40, 2) as monthly_cost_usd
FROM storage_summary
ORDER BY total_tb DESC;

Step 5: Clustering & Materialization Strategy

Advanced optimization: use clustering to speed up queries and materialization to cache expensive computations.

5.1 When to Use Clustering

Clustering reorganizes data to improve query pruning. Use it when:

  • Table is >1TB
  • Queries consistently filter on the same columns
  • Partition pruning effectiveness is <70%
  • Table has high churn (lots of updates/inserts)

Do NOT cluster when:

  • Table is <100GB (overhead exceeds benefit)
  • Queries filter on many different columns
  • Table is append-only with natural time ordering

5.2 Find Tables That Need Clustering

-- Identify clustering candidates
WITH query_patterns AS (
    SELECT
        database_name,
        schema_name,
        table_name,
        COUNT(*) as query_count,
        AVG(partitions_scanned/NULLIF(partitions_total,0)) as avg_scan_ratio
    FROM snowflake.account_usage.query_history qh,
    LATERAL FLATTEN(base_objects_accessed) f
    WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
        AND query_type = 'SELECT'
        AND partitions_total > 100
    GROUP BY 1, 2, 3
)
SELECT
    qp.database_name,
    qp.schema_name,
    qp.table_name,
    qp.query_count,
    ROUND(qp.avg_scan_ratio * 100, 1) as avg_pct_partitions_scanned,
    ROUND(tsm.active_bytes/POWER(1024,3), 2) as size_gb,
    CASE
        WHEN qp.avg_scan_ratio > 0.5 AND tsm.active_bytes > 107374182400
        THEN '🎯 CLUSTERING CANDIDATE'
        WHEN qp.avg_scan_ratio > 0.3
        THEN '⚠️ Review query patterns'
        ELSE '✅ Pruning efficient'
    END as recommendation
FROM query_patterns qp
JOIN snowflake.account_usage.table_storage_metrics tsm
    ON qp.database_name = tsm.table_catalog
    AND qp.schema_name = tsm.table_schema
    AND qp.table_name = tsm.table_name
WHERE tsm.active_bytes > 10737418240  -- > 10GB
ORDER BY qp.query_count * qp.avg_scan_ratio DESC
LIMIT 20;

5.3 Implementing Clustering

-- Add clustering to a large fact table
ALTER TABLE production.events
CLUSTER BY (event_date, customer_id);

-- Check clustering status
SELECT
    table_name,
    clustering_key,
    total_constant_partition_count,
    average_overlaps,
    average_depth
FROM snowflake.account_usage.table_storage_metrics
WHERE clustering_key IS NOT NULL
    AND deleted IS NULL
ORDER BY average_depth DESC;

-- Reclustering happens automatically, but you can check progress:
SELECT SYSTEM$CLUSTERING_INFORMATION('production.events');

5.4 Materialized Views for Repeated Queries

If the same expensive aggregation runs repeatedly, materialize it:

-- Create materialized view for expensive aggregation
CREATE MATERIALIZED VIEW IF NOT EXISTS analytics.daily_revenue_summary
AS
SELECT
    DATE_TRUNC('day', order_date) as revenue_date,
    product_category,
    region,
    COUNT(*) as order_count,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value
FROM production.orders
WHERE order_date >= DATEADD(year, -2, CURRENT_DATE())
GROUP BY 1, 2, 3;

-- Snowflake auto-refreshes on query, or you can force refresh:
ALTER MATERIALIZED VIEW analytics.daily_revenue_summary REFRESH;

-- Check materialized view freshness and cost
SELECT
    name,
    refresh_mode,
    is_secure,
    text,
    ROUND(bytes/POWER(1024,3), 3) as size_gb
FROM snowflake.account_usage.materialized_views
WHERE deleted IS NULL;

⚠️ Materialized views have maintenance costs. Only use when query savings exceed maintenance overhead.

Step 6: Set Up Guardrails

Prevent cost surprises with resource monitors and alerts.

6.1 Create Resource Monitors

-- Account-level spending cap
CREATE RESOURCE MONITOR IF NOT EXISTS account_monthly_limit
WITH
    CREDIT_QUOTA = 10000  -- Monthly credit limit
    FREQUENCY = MONTHLY
    START_TIMESTAMP = IMMEDIATELY
    TRIGGERS
        ON 50 PERCENT DO NOTIFY
        ON 75 PERCENT DO NOTIFY
        ON 90 PERCENT DO NOTIFY
        ON 100 PERCENT DO SUSPEND;  -- Hard stop at limit

-- Apply to account
ALTER ACCOUNT SET RESOURCE_MONITOR = account_monthly_limit;

-- Warehouse-specific monitor (useful for shared environments)
CREATE RESOURCE MONITOR IF NOT EXISTS etl_warehouse_limit
WITH
    CREDIT_QUOTA = 2000
    FREQUENCY = MONTHLY
    START_TIMESTAMP = IMMEDIATELY
    TRIGGERS
        ON 75 PERCENT DO NOTIFY
        ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE ETL_WAREHOUSE SET RESOURCE_MONITOR = etl_warehouse_limit;

6.2 Query Timeout Settings

Prevent runaway queries from burning credits:

-- Set statement timeout at warehouse level (15 minutes)
ALTER WAREHOUSE ANALYTICS_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 900;

-- For critical production warehouses, you might want longer (2 hours)
ALTER WAREHOUSE ETL_PRODUCTION SET STATEMENT_TIMEOUT_IN_SECONDS = 7200;

-- Session-level timeout for specific users
ALTER USER analyst_user SET STATEMENT_TIMEOUT_IN_SECONDS = 600;

Case Study: Series B Fintech - $847K to $428K

Here's a real implementation of this framework at a Series B fintech company with 75 data team members.

Initial State

  • Annual Snowflake spend: $847,000
  • 15 warehouses (mostly Medium and Large)
  • 8TB active storage, 12TB with time travel/fail-safe
  • No resource monitors
  • Auto-suspend set to 10 minutes on all warehouses
  • No clustering on any tables

Week 1-2: Quick Wins ($127K annual savings)

  • Set auto-suspend to 60s on 8 BI warehouses → $4,200/month
  • Set auto-suspend to 300s on 5 ETL warehouses → $1,800/month
  • Killed 2 always-on dev warehouses → $2,400/month
  • Identified and fixed 3 stuck scheduled jobs → $2,100/month

Week 3-4: Warehouse Right-Sizing ($89K annual savings)

  • Downsized 4 over-provisioned warehouses (Large → Small) → $5,200/month
  • Consolidated 3 redundant warehouses into 1 multi-cluster → $2,200/month

Week 5-6: Query Optimization ($142K annual savings)

  • Fixed top 10 expensive queries (added filters, removed SELECT *) → $6,800/month
  • Implemented query result caching for repeated dashboard queries → $2,400/month
  • Materialized 5 expensive aggregations used by multiple dashboards → $2,700/month

Week 7-8: Storage & Clustering ($61K annual savings)

  • Reduced time travel from 90 to 7 days on non-critical tables → $1,600/month
  • Converted staging tables to transient → $900/month
  • Cleaned up old clones and unused tables → $700/month
  • Added clustering to 3 large fact tables → $1,900/month (via query cost reduction)

Final Results

  • New annual spend: $428,000
  • Total savings: $419,000/year (49.5% reduction)
  • Implementation time: 8 weeks
  • Query performance: 40% faster average
  • No negative impact on business operations

The key insight: 70% of the savings came from simple configuration changes (auto-suspend, warehouse sizing) that took less than a week to implement.

Implementation Checklist

Use this checklist to track your optimization progress:

Week 1: Audit & Quick Wins

  1. 1.
  2. Run the Quick Win queries from this playbook
  3. 2.
  4. Document current monthly spend by warehouse
  5. 3.
  6. Fix auto-suspend settings on all warehouses
  7. 4.
  8. Kill any always-on warehouses that can be on-demand
  9. 5.
  10. Set up basic resource monitors

Week 2: Warehouse Optimization

  1. 1.
  2. Run warehouse efficiency analysis
  3. 2.
  4. Identify over-provisioned warehouses
  5. 3.
  6. Downsize warehouses where queue depth is consistently 0
  7. 4.
  8. Evaluate multi-cluster for high-concurrency warehouses
  9. 5.
  10. Document changes and expected savings

Week 3-4: Query Optimization

  1. 1.
  2. Identify top 20 most expensive queries
  3. 2.
  4. Fix anti-patterns (SELECT *, functions on filters, etc.)
  5. 3.
  6. Add missing indexes/clustering where beneficial
  7. 4.
  8. Implement materialization for repeated expensive queries
  9. 5.
  10. Set appropriate statement timeouts

Week 5-6: Storage Cleanup

  1. 1.
  2. Audit time travel settings across all tables
  3. 2.
  4. Convert staging/temp tables to transient
  5. 3.
  6. Clean up unused clones
  7. 4.
  8. Archive or drop tables not queried in 90+ days
  9. 5.
  10. Document storage reduction

Week 7-8: Advanced & Governance

  1. 1.
  2. Implement clustering on large tables (>1TB) with poor pruning
  3. 2.
  4. Set up query tagging for cost allocation
  5. 3.
  6. Create warehouse-specific resource monitors
  7. 4.
  8. Document runbooks for ongoing optimization
  9. 5.
  10. Calculate final ROI and report to stakeholders

Get Expert Help

This playbook covers the fundamentals that work for most organizations. But every Snowflake environment is different. Your specific workloads, data volumes, and team structures create unique optimization opportunities.

Tabby Stack offers a free Snowflake Spend Analysis where we:

  • Audit your last 90 days of usage data
  • Identify your top 5 cost reduction opportunities
  • Provide specific SQL and configuration recommendations
  • Estimate realistic savings based on your actual workload

No sales pitch, no commitment—just a detailed technical analysis of where your Snowflake dollars are going and how to keep more of them.

→ Book your free analysis at tabbystack.com/contact