How It Works

Basecut doesn’t just export tables—it extracts connected subgraphs of your data by following foreign key relationships. This ensures every snapshot is self-contained and referentially valid.

The Core Algorithm

1. Start from Root Tables

You define one or more root tables with optional WHERE clauses:

version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'created_at > :since'
    params:
      since: '2024-01-01'

  - table: orders
    where: 'status = :status'
    params:
      status: 'completed'

Basecut executes these queries to get your seed rows—the starting points for extraction.

2. Follow Relationships Recursively

From each seed row, Basecut traverses foreign keys in both directions: Upstream (Parents): Follow foreign keys to referenced tables

orders.user_id → users.id
line_items.product_id → products.id

Downstream (Children): Follow reverse foreign keys to dependent tables

users.id ← orders.user_id
orders.id ← shipments.order_id

You control how far to traverse in each direction:

traverse:
  parents: 5 # Follow parent FKs 5 levels deep
  children: 10 # Follow child FKs 10 levels deep

Basecut also supports virtual foreign keys — application-enforced relationships that aren’t declared as database constraints. Define them in your config and they’re traversed exactly like real FK edges. See YAML Reference.

3. Ensure Referential Integrity

As Basecut discovers rows, it tracks dependencies:

Before including a row, ensure all its foreign key targets are included
If a parent row is missing, add it (even if it requires traversing beyond depth limits)
Result: zero broken foreign key references

4. Apply Limits and Filters

Prevent runaway extraction with safeguards:

limits:
  rows:
    per_table: 1000 # Max 1000 rows per table
    total: 50000 # Max 50,000 rows overall

If limits are hit, Basecut stops adding new rows but ensures already-included rows remain referentially valid.

Example Walkthrough

Given this schema:

users
+-- orders (users.id -> orders.user_id)
|   +-- line_items (orders.id -> line_items.order_id)
|   |   `-- products (line_items.product_id -> products.id)
|   `-- shipments (orders.id -> shipments.order_id)

With configuration:

version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'email = :email'
    params:
      email: 'alice@example.com'

traverse:
  parents: 3
  children: 10

Extraction process:

Seed: Find users row for alice@example.com → 1 row
Downstream L1: Find all orders where user_id = alice.id → 5 rows
Downstream L2: Find all line_items for those 5 orders → 23 rows
Upstream L1 (from line_items): Find all products referenced → 8 rows
Downstream L3: Find all shipments for those 5 orders → 5 rows

Result: 42 rows across 5 tables, all referentially valid. If you insert them into an empty database, no foreign key constraints will be violated.

Why This Approach Works

Referential Integrity: You can restore snapshots to empty databases without constraint violations. No manual dependency sorting needed. Reproducibility: Same configuration + same source data yields the same snapshot shape. If you use random sampling, set sampling.seed for stable row selection. Composability: Combine multiple root tables to extract overlapping data graphs. Basecut deduplicates automatically. Safety: Limits prevent accidentally extracting your entire production database. Anonymization runs inline during extraction.

Next Steps

Execution Modes

Choose where extraction runs: self-hosted agent or local CLI

Storage Providers

Configure S3, GCS, or local filesystem storage

YAML Reference

All configuration options explained

Snapshot Rules

Protect PII during extraction

Getting Started

Core Concepts

Configuration

Workflows

Integrations

CLI Reference

Deployment

Reference

The Core Algorithm

1. Start from Root Tables

2. Follow Relationships Recursively

3. Ensure Referential Integrity

4. Apply Limits and Filters

Example Walkthrough

Why This Approach Works

Next Steps

Execution Modes

Storage Providers

YAML Reference

Snapshot Rules

Getting Started

Core Concepts

Configuration

Workflows

Integrations

CLI Reference

Deployment

Reference

​The Core Algorithm

​1. Start from Root Tables

​2. Follow Relationships Recursively

​3. Ensure Referential Integrity

​4. Apply Limits and Filters

​Example Walkthrough

​Why This Approach Works

​Next Steps

Execution Modes

Storage Providers

YAML Reference

Snapshot Rules

The Core Algorithm

1. Start from Root Tables

2. Follow Relationships Recursively

3. Ensure Referential Integrity

4. Apply Limits and Filters

Example Walkthrough

Why This Approach Works

Next Steps