Skip to main content
Basecut doesn’t just export tables—it extracts connected subgraphs of your data by following foreign key relationships. This ensures every snapshot is self-contained and referentially valid.

The Core Algorithm

1. Start from Root Tables

You define one or more root tables with optional WHERE clauses:
version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'created_at > :since'
    params:
      since: '2024-01-01'

  - table: orders
    where: 'status = :status'
    params:
      status: 'completed'
Basecut executes these queries to get your seed rows—the starting points for extraction.

2. Follow Relationships Recursively

From each seed row, Basecut traverses foreign keys in both directions: Upstream (Parents): Follow foreign keys to referenced tables
  • orders.user_id → users.id
  • line_items.product_id → products.id
Downstream (Children): Follow reverse foreign keys to dependent tables
  • users.id ← orders.user_id
  • orders.id ← shipments.order_id
You control how far to traverse in each direction:
traverse:
  parents: 5 # Follow parent FKs 5 levels deep
  children: 10 # Follow child FKs 10 levels deep

3. Ensure Referential Integrity

As Basecut discovers rows, it tracks dependencies:
  • Before including a row, ensure all its foreign key targets are included
  • If a parent row is missing, add it (even if it requires traversing beyond depth limits)
  • Result: zero broken foreign key references

4. Apply Limits and Filters

Prevent runaway extraction with safeguards:
limits:
  rows:
    per_table: 1000 # Max 1000 rows per table
    total: 50000 # Max 50,000 rows overall
If limits are hit, Basecut stops adding new rows but ensures already-included rows remain referentially valid.

Example Walkthrough

Given this schema:
users
+-- orders (users.id -> orders.user_id)
|   +-- line_items (orders.id -> line_items.order_id)
|   |   `-- products (line_items.product_id -> products.id)
|   `-- shipments (orders.id -> shipments.order_id)
With configuration:
version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'email = :email'
    params:
      email: 'alice@example.com'

traverse:
  parents: 3
  children: 10
Extraction process:
  1. Seed: Find users row for alice@example.com → 1 row
  2. Downstream L1: Find all orders where user_id = alice.id → 5 rows
  3. Downstream L2: Find all line_items for those 5 orders → 23 rows
  4. Upstream L1 (from line_items): Find all products referenced → 8 rows
  5. Downstream L3: Find all shipments for those 5 orders → 5 rows
Result: 42 rows across 5 tables, all referentially valid. If you insert them into an empty database, no foreign key constraints will be violated.

Why This Approach Works

Referential Integrity: You can restore snapshots to empty databases without constraint violations. No manual dependency sorting needed. Reproducibility: Same configuration + same source data yields the same snapshot shape. If you use random sampling, set sampling.seed for stable row selection. Composability: Combine multiple root tables to extract overlapping data graphs. Basecut deduplicates automatically. Safety: Limits prevent accidentally extracting your entire production database. Anonymization runs inline during extraction.

Next Steps