GuideBest Practices

Tabby Stack dbt Style Guide

Battle-tested SQL formatting, naming conventions, project structure, testing, and documentation standards from 8+ years in the dbt ecosystem. Includes a ready-to-use .sqlfluff config that enforces everything automatically.

February 17, 202615 min read
#dbt#SQL#SQLFluff#data modeling#analytics engineering#best practices#style guide
Contact Us

Key Information

Type

Guide

Category

Best Practices

Reading Time

15 min read

Last Updated

February 17, 2026

Your new analytics engineer just spent three days figuring out why some models use leading commas and others don't. Your data prep layer has five different naming conventions because five different people built it. Your PR reviews have become style debates instead of logic reviews.

This is the tax you pay for not having a style guide. PR reviews that should take 20 minutes drag into hours. Onboarding that should take a week takes three. Every model becomes a choose-your-own-adventure where the reader has to learn this file's conventions before they can understand this file's logic.

We've been in the dbt ecosystem since 2017-back when Fishtown Analytics was still our neighbor in Philadelphia. We've seen what works across dozens of implementations, and we've codified it here. These aren't theoretical best practices. They're battle-tested conventions that actually stick when real teams, real deadlines, and real production pressure show up.

What's in this guide:

  • SQL formatting, naming, project structure, testing, and documentation standards
  • A ready-to-use .sqlfluff config that enforces everything automatically
  • A PR checklist your reviewers can use today

Download the complete PDF and .sqlfluff config at the end.

General Principles

Four principles underpin everything in this guide. When a specific rule doesn't cover your situation, return to these.

1. Optimize for the reader, not the writer.

Code is read 10x more often than it's written. Every decision should prioritize someone reading the code for the first time-not minimizing your keystrokes.

This means: descriptive names over abbreviations (users_with_subscription_history not usr_sub_hist), explicit syntax even when optional, and generous whitespace between logical sections.

2. Only as complex as absolutely necessary.

Every line of code is a liability. Complexity should be earned, not assumed.

Before adding abstraction, ask: Can this be expressed more simply? Would a new team member understand this without explanation? Are you solving a real problem or a hypothetical future one?

3. Consistency beats "better."

A codebase that consistently uses one convention is easier to navigate than one that uses "better" conventions inconsistently.

The hierarchy: Project consistency beats team conventions. Team conventions beat this guide. An intentional, consistent deviation is far better than inconsistent adherence.

4. Make the right thing easy.

Structure your project so following conventions is the path of least resistance. Use SQLFluff to enforce formatting automatically. Put templates where developers will find them. Document conventions in the repo, not a wiki nobody checks.

When doing the right thing requires extra effort, people stop doing it.

Quick Reference

CategoryConvention
Keywordslowercase
Indentation2 spaces
Comma placementLeading
AliasesExplicit `as` keyword
Line length120 characters max
Model prefixes`base_`, `prep_`, `enr_`, `intg_`, `core_`
Base model naming`base_{source}__{table}`

Print this. Tape it to your monitor.

SQL Formatting Rules

Keywords and Functions

All SQL keywords, functions, data types, and literals: lowercase.

-- Good
select
    id
    , upper(trim(name)) as name_normalized
    , case when status = 'active' then true else false end as is_active
    , cast(created_at as date) as created_date
from users
where deleted_at is null

-- Bad
SELECT
    id,
    UPPER(TRIM(name)) AS name_normalized,
    CASE WHEN status = 'active' THEN TRUE ELSE FALSE END AS is_active,
    CAST(created_at AS DATE) AS created_date
FROM users
WHERE deleted_at IS NULL

This isn't about right or wrong-it's about picking one and enforcing it everywhere. We picked lowercase because it's easier on the eyes in long queries and most modern editors handle syntax highlighting anyway.

Indentation

Two spaces. Never tabs. Configure your editor once and forget it.

select
  column_1
  , column_2
  , case
      when condition_1 then 'value_1'
      when condition_2 then 'value_2'
      else 'default'
    end as derived_column
from table_name
where condition

Comma Placement

Leading commas-commas at the beginning of lines.

-- Good: Leading commas
select
    id
    , first_name
    , last_name
    , email
    , created_at
from users

-- Bad: Trailing commas
select
    id,
    first_name,
    last_name,
    email,
    created_at
from users

Here's what they don't tell you about leading commas: When you add a column, your git diff changes one line instead of two. When you comment out a column, it's one keystroke. When you're scanning a long select list, the commas create a visual gutter that's easier to follow. Small things that compound across thousands of lines.

No trailing comma after the last item. SQLFluff will catch this.

Aliasing

Always use the explicit as keyword. For columns and tables.

-- Good: Explicit aliasing
select
    u.id as user_id
    , u.email as user_email
    , o.id as order_id
from users as u
left join orders as o
    on u.id = o.user_id

-- Bad: Implicit aliasing
select
    u.id user_id
    , u.email user_email
    , o.id order_id
from users u
left join orders o
    on u.id = o.user_id

Implicit aliasing saves a few characters. Explicit aliasing saves every future reader from wondering "wait, is that a typo or an alias?"

Line Length

Keep lines under 120 characters. Break long lines at logical points:

select
    s.created_at as subscription_created_at
    , s.id as subscription_id
    , s.account_id
    , coalesce(
        s.annual_revenue,
        s.monthly_revenue * 12,
        0
      ) as estimated_arr
from subscriptions as s

Join Formatting

Align join conditions vertically when you have multiple:

select
    s.subscription_id
    , a.account_id
    , p.plan_name
from subscriptions as s
left join accounts as a
    on s.account_id = a.id
    and s.workspace_id = a.workspace_id
left join plans as p
    on s.plan_id = p.id
    and s.billing_interval = p.billing_interval

Where Clause Formatting

Each condition on its own line, aligned:

where
    s.created_at::date >= '2023-01-01'
    and s.plan_type <> 'internal'
    and s.status = 'active'

CTE Patterns

CTEs are where most dbt model logic lives. Get these right.

CTE Naming

Use descriptive snake_case names that explain the transformation:

with normalized_domains as (...)         -- Describes the transformation
, accounts_with_duplicates as (...)      -- Indicates intermediate state
, accounts_deduped as (...)              -- Final cleaned version
, subscription_events as (...)           -- Describes the data content

Bad CTE names: cte1, temp, data, final. Good CTE names tell you what's in the box without opening it.

CTE Structure

Three rules:

  1. 1.
  2. Start with with for the first CTE
  3. 2.
  4. Use leading comma and blank line before subsequent CTEs
  5. 3.
  6. End with a simple select * from final_cte
-- Pre-normalize domains for efficient joining
with normalized_domains as (
    select distinct
        domain
        , lower(trim(domain)) as domain_normalized
    from {{ ref('base_clearbit__company_domains') }}
)

, accounts_with_enrichment as (
    select
        a.id as account_id
        , a.company_name
        , a.domain
        , d.industry
    from {{ ref('base_app__accounts') }} as a
    left join normalized_domains as d
        on lower(trim(a.domain)) = d.domain_normalized
)

, accounts_deduped as (
    select *
    from accounts_with_enrichment
    qualify row_number() over (
        partition by account_id
        order by domain nulls last
    ) = 1
)

select * from accounts_deduped

Progressive Transformation Pattern

Structure CTEs to build progressively from raw data to final output:

  1. 1.
  2. Normalization CTEs - Prepare data for efficient joining
  3. 2.
  4. Base CTEs - Core data with initial joins
  5. 3.
  6. Enrichment CTEs - Add derived columns and business logic
  7. 4.
  8. Deduplication CTEs - Handle data quality issues
  9. 5.
  10. Final select - Output the cleaned dataset

This isn't arbitrary. When something breaks, you can comment out later CTEs and inspect intermediate states. Debug time drops from hours to minutes.

CTE Comments

Add a single-line comment above CTEs explaining purpose:

-- Pre-normalize company domains for efficient joining
with normalized_domains as (
    ...
)

-- Deduplicate accounts to ensure unique account_id
select *
from accounts_base
qualify row_number() over (...)

Naming Conventions

Model Prefixes

LayerPrefixExample
Source Mirror`base_``base_stripe__subscriptions`
Data Prep`prep_``prep_stripe__subscriptions`
Enrich`enr_``enr_subscription_events`
Integrate`intg_``intg_customer_subscriptions`
Core`core_``core_monthly_recurring_revenue`
Serve*(consumer name)*`retail_usa_daily_orders`

Base Model Naming

Base models follow the pattern: base_{source_schema}__{table_name}

base_stripe__subscriptions
base_stripe__invoices
base_app_public__users
base_salesforce__accounts
base_segmcore__tracks

The double underscore (__) separates the source schema from the table name. This matters when you're scanning a list of 200 models and need to find all Stripe sources fast.

Column Naming

Use snake_case with descriptive suffixes:

TypeSuffixExamples
Boolean flags`is_`, `has_``is_active`, `has_valid_paymcore_method`
Timestamps`_at``created_at`, `converted_at`, `churned_at`
Dates`_date``trial_start_date`, `renewal_date`
Counts`_count``seat_count`, `login_count`
Amounts`_amount``mrr_amount`, `discount_amount`
IDs`_id``user_id`, `account_id`
Percentages`_pct`, `_rate``churn_rate`, `conversion_pct`
Normalized`_normalized``email_normalized`, `domain_normalized`

These suffixes aren't just style-they're type hints. When you see _at, you know it's a timestamp. When you see is_, you know it's a boolean. Your BI tool and your analysts will thank you.

Table Aliases

Short, 1-2 character aliases in joins:

from subscriptions as s
left join accounts as a
    on s.account_id = a.id
left join users as u
    on a.owner_id = u.id
left join plans as p
    on s.plan_id = p.id

Common aliases: u for users, a for accounts, s for subscriptions, i for invoices, p for plans/products, e for events, o for orders.

Project Structure

Directory Layout

dbt/
├── models/
│   ├── source_mirror/       # 1:1 copies from source systems
│   │   ├── stripe/
│   │   ├── salesforce/
│   │   └── ...
│   ├── data_prep/           # Cleaned, renamed, typed
│   │   ├── stripe/
│   │   └── ...
│   ├── enrich/           # Source-specific business logic
│   ├── integrate/               # Cross-source combinations
│   ├── core/     # Org-wide canonical tables
│   └── serve/   # Consumer-specific delivery
│       ├── retail_usa/
│       └── finance/
├── macros/
│   ├── communication/     # Domain-specific macros
│   │   ├── advance_notice/
│   │   └── reminders/
│   └── utilities/
├── tests/                 # Singular tests
├── seeds/
├── snapshots/
├── docs/                  # Documentation
├── dbt_project.yml
└── .sqlfluff

Purpose-Driven Layer Architecture

Name your directories for what they do. Six directories, six purposes:

LayerDirectoryPurpose
Source Mirror`source_mirror/`1:1 source copy, changelog, track deletes
Data Prep`data_prep/`Field renaming, type casting, no business logic
Enrich`enrich/`Source-specific business logic
Integrate`integrate/`Combine data from different systems
Core`core/`Org-wide analytical tables
Serve`serve/`Consumer-specific delivery

Source Mirror

  • Schema: source_mirror
  • Purpose: 1:1 copy of the source table. Track schema changes, log deletes, maintain a full changelog. Nothing else.
  • Materialization (how dbt builds the model: as a view, table, or incremental refresh): View, or incremental (only processes new or changed rows) for high-volume sources
{{
    config(
        materialized='view',
        schema='source_mirror'
    )
}}

select
    *
    , _fivetran_synced as _loaded_at
    , _fivetran_deleted as _is_deleted
from {{ source('stripe', 'subscriptions') }}

Data Prep

  • Schema: data_prep
  • Purpose: Field renaming, type casting, null handling. No business logic. If you're writing a CASE statement, you've gone too far.
  • Materialization: View
{{
    config(
        materialized='view',
        schema='data_prep'
    )
}}

select
    id as subscription_id
    , cast(created_at as timestamp_ntz) as created_at
    , nullif(trim(plan_name), '') as plan_name
from {{ ref('base_stripe__subscriptions') }}

Enrich

  • Schema: enrich
  • Purpose: Source-specific business logic: deduplication, status derivation, temporal calculations. Each model still lives within a single source domain.
  • Materialization: View (table for expensive computations)
{{
    config(
        materialized='view',
        schema='enrich'
    )
}}

with subscription_events as (
    select
        subscription_id
        , plan_name
        , created_at
        , lead(created_at) over (
            partition by account_id order by created_at
          ) as next_evcore_at
    from {{ ref('prep_stripe__subscriptions') }}
)

select * from subscription_events

Integrate

  • Schema: integrate
  • Purpose: Combine data from different systems. This is where Stripe subscriptions meet Salesforce accounts meet product usage events. All combinations across systems live here and nowhere else.
  • Materialization: View (table for large cross-source joins)
{{
    config(
        materialized='view',
        schema='integrate'
    )
}}

select
    a.account_id
    , a.company_name
    , s.subscription_id
    , s.plan_name
    , u.last_login_at
from {{ ref('enr_salesforce__accounts') }} as a
left join {{ ref('enr_stripe__subscriptions') }} as s
    on a.stripe_customer_id = s.customer_id
left join {{ ref('enr_app__user_activity') }} as u
    on a.account_id = u.account_id

Core

  • Schema: core
  • Purpose: Org-wide analytical tables. Canonical definitions of customers, subscriptions, revenue. Governed, tested, documented. The tables your CFO trusts.
  • Materialization: Table (always)
{{
    config(
        materialized='table',
        schema='core',
        cluster_by=['account_id', 'created_at']
    )
}}

Serve

  • Schema: Named for the consumer, e.g. retail_usa_data, finance_reporting
  • Purpose: Consumer-specific views. Pre-filtered, pre-aggregated, shaped for a specific team, dashboard, or data sync to business tools.
  • Materialization: Table or incremental
{{
    config(
        materialized='table',
        schema='retail_usa_data',
        cluster_by=['store_id', 'order_date']
    )
}}

The Fast Lane

Sometimes data needs to move faster than six layers allow. A real-time inventory feed or a fraud-detection signal can't wait for a full pipeline run. For these cases, build a fast lane: a model that goes directly from Source Mirror to Serve, skipping the layers in between. Tag it, document why it exists, and revisit it quarterly. The fast lane is a conscious trade-off, not a shortcut.

Model Configuration

Config Block Placement

Config blocks go at the top of the file. Always.

{{
    config(
        materialized='table',
        cluster_by=['subscription_id', 'created_at']
    )
}}

-- Model logic starts here
with ...

Materialization Strategy

LayerDefaultOverride When
Source MirrorviewHigh-volume sources needing incremental loads
Data PrepviewLarge reference tables frequently joined
EnrichviewExpensive computations, complex window functions
IntegrateviewLarge cross-source joins with multiple aggregations
CoretableAlways tables. These are your governed, canonical models
ServetableIncremental for high-volume delivery targets

Clustering

Use cluster_by for tables with common filter/join patterns:

{{
    config(
        materialized='table',
        cluster_by=['account_id', 'created_at']
    )
}}

Choose clustering keys based on: frequently filtered columns, common join keys, date/time partitioning columns.

Post-Hooks

Use post-hooks for granting permissions or logging:

{{
    config(
        materialized='table',
        schema='core',
        post_hook=[
            "grant select on {{ this }} to role analyst",
            "{{ log_model_refresh(this.identifier) }}"
        ]
    )
}}

Schema Change Handling

In dbt_project.yml:

models:
  your_project:
    +on_schema_change: "sync_all_columns"

Testing Strategy

Test Hierarchy

  1. 1.
  2. Generic tests (column-level) - Quick validation with broad coverage
  3. 2.
  4. Singular tests (dataset-level) - Complex business logic validation

Generic Tests

Apply to primary keys and critical columns in schema.yml:

version: 2

models:
  - name: prep_subscriptions
    columns:
      - name: subscription_id
        description: "Unique identifier for the subscription"
        tests:
          - not_null
          - unique

      - name: created_at
        description: "Timestamp when the subscription was created"
        tests:
          - not_null

Every primary key gets not_null and unique. No exceptions. This is the minimum bar-the tests that catch 80% of data issues with 5% of the effort.

Singular Tests

Create SQL files in tests/ for complex validation:

tests/assert_core_mrr_data_quality.sql
-- tests/assert_core_mrr_data_quality.sql

with validation_checks as (
  select
    -- Check for duplicate records
    count(*) as total_records
    , count(distinct subscription_id) as unique_subscriptions

    -- Validate temporal consistency
    , sum(case when churned_at < created_at then 1 else 0 end) as invalid_dates

    -- Validate MRR calculation accuracy
    , sum(
        case
          when mrr_amount != (
            base_mrr + expansion_mrr - contraction_mrr
          )
          then 1
          else 0
        end
      ) as incorrect_mrr_totals

  from {{ ref('core_monthly_recurring_revenue') }}
)

-- Return rows only when failures are detected (dbt convention)
select
  'Data Quality Issues Found' as test_name
  , *
from validation_checks
where
  total_records != unique_subscriptions
  or invalid_dates > 0
  or incorrect_mrr_totals > 0

Test Naming Convention

tests/
├── assert_{model_name}_data_quality.sql
├── assert_{model_name}_outliers.sql
├── no_{business_rule_violation}.sql

Tests return 0 rows on success. Any returned rows indicate failure.

Documentation Standards

Schema.yml Structure

version: 2

models:
  - name: model_name
    description: |
      **Brief title or purpose**

      Detailed description of the model's business purpose.

      **Key Features:**
      - Feature 1
      - Feature 2

      **Business Questions Answered:**
      - Question 1?
      - Question 2?

    meta:
      owner: "Team or Person Name"
      tags: ["domain", "category", "tier-1"]
      business_domain: "Domain Name"

    columns:
      - name: column_name
        description: |
          Clear description of what this column represents.
          Include calculation logic for derived columns:
          Formula: column_a + column_b
        tests:
          - not_null
          - unique

Calculated Columns

Document the formula. Future you will thank present you.

- name: health_score
  description: |
    Composite customer health score (0-100 scale) combining:
    - Product usage frequency (30% weight)
    - Feature adoption breadth (25% weight)
    - Support ticket sentiment (20% weight)
    - NPS survey responses (25% weight)
    Scores < 50 indicate churn risk requiring intervention.

- name: days_to_convert
  description: |
    Calculated: Days between trial start and paid conversion.
    Formula: DATEDIFF(day, trial_started_at, converted_at)
    NULL if not yet converted.

Deprecation Notices

Mark deprecated models clearly:

- name: mart_revenue_legacy
  description: |
    **DEPRECATED - Use `core_monthly_recurring_revenue` instead**

    Legacy model maintained for backward compatibility.
    Scheduled for removal: 2025-Q2
  meta:
    tags: ["legacy", "scheduled-removal"]

Macro Conventions

One Macro Per File

Each macro in its own file with a matching name:

macros/
├── communication/
│   ├── advance_notice/
│   │   ├── email_attempt_dates.sql
│   │   ├── sms_attempt_dates.sql
│   │   ├── attempt_date_cols.sql
│   │   └── shipped_diff_cols.sql
│   └── reminders/
│       └── ...
└── utilities/
    └── explore_data.sql

Macro Naming

Use descriptive snake_case that indicates output:

{action}_{subject}_{output_type}.sql

email_attempt_dates.sql
shipped_diff_cols.sql
create_feature_registry_table.sql

Macro Structure

{% macro attempt_date_cols(prefix, channel_expr, channel_name) %}
  -- Emits attempt 1..3 dates like: {prefix}1_{channel_name}_date
  {% for n in [1, 2, 3] %}
  , to_char(
      max(case when {{ channel_expr }} and attempt_number = {{ n }} then contact_at end),
      'YYYY-MM-DD'
    ) as {{ prefix }}{{ n }}_{{ channel_name }}_date
  {% endfor %}
{% endmacro %}

Wrapper Macros

Create thin wrappers for common use cases:

-- email_attempt_dates.sql
{% macro email_attempt_dates(prefix) %}
  {{ attempt_date_cols(prefix, "activity_type = 'Email'", "email") }}
{% endmacro %}

-- sms_attempt_dates.sql
{% macro sms_attempt_dates(prefix) %}
  {{ attempt_date_cols(prefix, "activity_type = 'SMS'", "sms") }}
{% endmacro %}

SQLFluff Configuration

Here's the complete .sqlfluff config that enforces everything in this guide:

.sqlfluff
[sqlfluff]
templater = dbt
dialect = snowflake
exclude_rules = L034, L044
max_line_length = 120

[sqlfluff:templater:dbt]
profiles_dir = ~/.dbt
profile = your_profile_name
target = dev

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.types]
extended_capitalisation_policy = lower

[sqlfluff:rules:aliasing.table]
aliasing = explicit

[sqlfluff:rules:aliasing.column]
aliasing = explicit

[sqlfluff:rules:aliasing.expression]
aliasing = explicit

[sqlfluff:rules:layout.long_lines]
ignore_commcore_lines = true
ignore_commcore_clauses = true

[sqlfluff:rules:structure.column_order]
select_clause_elemcore_order_preference = any

[sqlfluff:indentation]
indcore_unit = space
indents = 2

[sqlfluff:layout:type:comma]
line_position = leading

[sqlfluff:rules:convention.select_trailing_comma]
select_clause_trailing_comma = forbid

[sqlfluff:rules:convention.terminator]
require_final_newline = true

Common Commands

# Lint all models
sqlfluff lint models/

# Lint specific layer
sqlfluff lint models/source_mirror/
sqlfluff lint models/data_prep/
sqlfluff lint models/enrich/
sqlfluff lint models/integrate/
sqlfluff lint models/core/
sqlfluff lint models/serve/

# Auto-fix linting issues
sqlfluff fix models/

# Fix specific file
sqlfluff fix models/data_prep/prep_stripe__subscriptions.sql

# Format SQL files
sqlfluff format models/

CI/CD Integration

Add SQLFluff to your pipeline:

.github/workflows/dbt-lint.yml
# .github/workflows/dbt-lint.yml
name: dbt Lint

on: [pull_request]

jobs:
  sqlfluff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install SQLFluff
        run: pip install sqlfluff sqlfluff-templater-dbt
      - name: Lint SQL
        run: sqlfluff lint models/ --dialect snowflake

Action: Add this workflow to your repo today. Takes 5 minutes. Style debates in PRs drop to zero.

Get the Complete Package

Everything in this guide-plus a one-page cheatsheet and PR reviewer checklist-is available for download:

Download the Tabby Stack dbt Style Guide (PDF + .sqlfluff config)

The PDF includes a complete example model showing all conventions working together, and a PR checklist you can paste into your review template.

When Standards Aren't Enough

This guide solves the consistency problem. But consistency is table stakes.

The harder problems-model sprawl that nobody can navigate, tests that don't catch real issues, Snowflake bills that grow 30% month-over-month with no visibility-require more than a style guide.

We help data teams get their dbt projects to a state where following the rules is the easy path and the architecture actually scales. If you've adopted standards but still feel like you're fighting your data stack, that's the work we do.