Skip to main content
The basecut.yml file defines what Basecut extracts, how traversal is bounded, how PII is anonymized, and where output is written.

Complete Example

version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'id = :id'
    params:
      id: 42
  - table: billing.invoices
    where: 'org_id = :org_id'
    params:
      org_id: 10

exclude:
  - audit_logs
  - internal_tokens

include_all:
  - countries
  - currencies

traverse:
  parents: 10
  children: 2
  stop_at:
    - user_roles
    - order_tags

virtual_foreign_keys:
  - from: posts.author_id
    to: users.id
  - from:
      table: billing.invoice_items
      columns: [invoice_id, tenant_id]
    to:
      table: billing.invoices
      columns: [id, tenant_id]

limits:
  rows:
    per_table: 1000
    total: 100000
    tables:
      events: 200
      logs: 0

sampling:
  mode: random
  seed: 42

anonymize:
  mode: auto
  excluded_domains:
    - yourcompany.com
  rules:
    users:
      email: fake_email
      phone: fake_phone
    '*.password_hash': redact

output: ./snapshots
# Or object form:
# output:
#   provider: s3
#   bucket: my-team-snapshots
#   region: us-east-1

Top-Level Fields

FieldRequiredTypeDescription
versionYesstringConfig version. Must be '1'.
nameNostringSnapshot name fallback used by CLI if --name is not provided.
fromYesarrayOne or more root selectors to start extraction.
excludeNoarrayTables to exclude entirely from extraction.
include_allNoarrayLookup/reference tables extracted in full before traversal.
traverseNoobjectTraversal controls (parents, children, stop_at).
virtual_foreign_keysNoarrayUser-defined FK edges for relationships not declared in DB constraints.
limitsNoobjectRow-count budgets and overrides.
samplingNoobjectRow selection mode and random seed.
anonymizeNostring/objectoff, auto, or object with mode + rules.
outputYesstring/objectOutput destination (./path, s3://..., gs://..., or provider object).

from

from defines the starting rows.
from:
  - table: users
    where: 'id = :id'
    params:
      id: 1
  • Use table: users for public.users.
  • Use table: schema.table for non-public schemas.
  • where supports named params (:id), values go in params.

traverse

traverse:
  parents: 10
  children: 2
  stop_at:
    - user_roles
FieldTypeDefaultDescription
parentsint10Max parent FK depth (-1 = unlimited).
childrenint2Max child FK depth (0 = disable downstream traversal).
stop_atarray-Junction/link tables where FK expansion should stop.

virtual_foreign_keys

Use virtual_foreign_keys when related tables are enforced in application code but not declared as database FK constraints. Single-column (compact) form:
virtual_foreign_keys:
  - from: posts.author_id
    to: users.id
  - from: comments.post_id
    to: public.posts.id
Composite-column (structured) form:
virtual_foreign_keys:
  - from:
      table: billing.invoice_items
      columns: [invoice_id, tenant_id]
    to:
      table: invoices
      columns: [id, tenant_id]
Notes:
  • Virtual FK edges are traversed the same way as real FKs during extraction.
  • Bare table names are normalized to public.<table>.
  • Compact endpoints must be table.column or schema.table.column.
  • from.columns and to.columns must have the same number of columns.
  • Duplicate virtual FK definitions are rejected.

limits.rows

limits:
  rows:
    per_table: 1000
    total: 100000
    tables:
      users: 200
      audit_logs: 0
FieldTypeDefaultDescription
per_tableint1000Default row-count cap per table.
totalint100000Total row-count cap across the snapshot.
tablesmap[string]int-Per-table row-count overrides (0 means unlimited for that table; use exclude to skip extraction).
Cycle handling: When tables form circular foreign key relationships (cycles), Basecut automatically caps the combined row count for the cycle group based on the sum of effective table limits. For example, if three tables in a cycle each have per_table: 1000, the cycle budget is 3000 total rows. Adjust per_table or add individual tables entries to control cycle sizes.

anonymize

Shorthand:
anonymize: auto
Object form:
anonymize:
  mode: manual # auto | manual | off
  excluded_domains: # optional: global email domains to skip anonymization
    - company.com
    - test.example.com
  rules:
    users:
      email: fake_email
    partners:
      email:
        strategy: fake_email
        params:
          excluded_domains: [partner.org]  # per-rule: replaces global list for this rule
    audit_logs:
      email:
        strategy: fake_email
        params:
          excluded_domains: []  # explicit empty: anonymize ALL emails, ignore global list
    '*.password_hash': redact
Notes:
  • mode: auto enables built-in PII auto-detection plus explicit rules.
  • mode: manual applies only explicit rules.
  • mode: off disables anonymization.
  • Table-grouped rule keys can be unqualified (users) for public.users.
  • excluded_domains (top-level) is an optional list of email domains that bypass anonymization. Emails matching these domains (case-insensitive, exact match) are left unchanged, whether the email strategy was set via explicit rules or auto-detection. Useful for preserving internal/test emails like @yourcompany.com.
  • Per-rule params.excluded_domains overrides the global list for that specific rule. If present (even as an empty list), the global excluded_domains are ignored for that rule. A rule without params.excluded_domains inherits the global list.

output

String shorthand:
output: ./snapshots
# output: s3://my-bucket/team-a
# output: gs://my-bucket/team-a
Object form:
output:
  provider: local # local | s3 | gcs
  path: ./snapshots
For cloud providers:
output:
  provider: s3
  bucket: my-bucket
  region: us-east-1
  endpoint: https://<accountid>.r2.cloudflarestorage.com # optional for S3-compatible backends
  prefix: team-a/snapshots
Notes:
  • For provider: s3, bucket is required.
  • For Cloudflare R2, use provider: s3 + endpoint and set region: auto.

Minimal Config

version: '1'

from:
  - table: users
    where: 'id = :id'
    params:
      id: 1

anonymize: auto
output: ./snapshots