Skip to main content
Basecut doesn’t just export tables—it extracts connected subgraphs of your data by following foreign key relationships. This ensures every snapshot is self-contained and referentially valid.

The Core Algorithm

1. Start from Root Tables

You define one or more root tables with optional WHERE clauses:
version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'created_at > :since'
    params:
      since: '2024-01-01'

  - table: orders
    where: 'status = :status'
    params:
      status: 'completed'
Basecut executes these queries to get your seed rows—the starting points for extraction.

2. Follow Relationships Recursively

From each seed row, Basecut traverses foreign keys in both directions: Upstream (Parents): Follow foreign keys to referenced tables
  • orders.user_id → users.id
  • line_items.product_id → products.id
Downstream (Children): Follow reverse foreign keys to dependent tables
  • users.id ← orders.user_id
  • orders.id ← shipments.order_id
You control how far to traverse in each direction:
traverse:
  parents: 5 # Follow parent FKs 5 levels deep
  children: 10 # Follow child FKs 10 levels deep
Basecut also supports virtual foreign keys — application-enforced relationships that aren’t declared as database constraints. Define them in your config and they’re traversed exactly like real FK edges. See YAML Reference.

3. Ensure Referential Integrity

As Basecut discovers rows, it tracks dependencies:
  • Before including a row, ensure all its foreign key targets are included
  • If a parent row is missing, add it (even if it requires traversing beyond depth limits)
  • Result: zero broken foreign key references

4. Apply Limits and Filters

Prevent runaway extraction with safeguards:
limits:
  rows:
    per_table: 1000 # Max 1000 rows per table
    total: 50000 # Max 50,000 rows overall
If limits are hit, Basecut stops adding new rows but ensures already-included rows remain referentially valid.

Example Walkthrough

Given this schema:
users
+-- orders (users.id -> orders.user_id)
|   +-- line_items (orders.id -> line_items.order_id)
|   |   `-- products (line_items.product_id -> products.id)
|   `-- shipments (orders.id -> shipments.order_id)
With configuration:
version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'email = :email'
    params:
      email: 'alice@example.com'

traverse:
  parents: 3
  children: 10
Extraction process:
  1. Seed: Find users row for alice@example.com → 1 row
  2. Downstream L1: Find all orders where user_id = alice.id → 5 rows
  3. Downstream L2: Find all line_items for those 5 orders → 23 rows
  4. Upstream L1 (from line_items): Find all products referenced → 8 rows
  5. Downstream L3: Find all shipments for those 5 orders → 5 rows
Result: 42 rows across 5 tables, all referentially valid. If you insert them into an empty database, no foreign key constraints will be violated.

Why This Approach Works

Referential Integrity: You can restore snapshots to empty databases without constraint violations. No manual dependency sorting needed. Reproducibility: Same configuration + same source data yields the same snapshot shape. If you use random sampling, set sampling.seed for stable row selection. Composability: Combine multiple root tables to extract overlapping data graphs. Basecut deduplicates automatically. Safety: Limits prevent accidentally extracting your entire production database. Anonymization runs inline during extraction.

Next Steps

Execution Modes

Choose where extraction runs: self-hosted agent or local CLI

Storage Providers

Configure S3, GCS, or local filesystem storage

YAML Reference

All configuration options explained

Snapshot Rules

Protect PII during extraction