Skip to main content
The basecut.yml file defines what Basecut extracts, how traversal is bounded, how PII is anonymized, and where output is written.

Complete Example

version: '1'
name: 'my-snapshot'

from:
  - table: users
    where: 'id = :id'
    params:
      id: 42
  - table: billing.invoices
    where: 'org_id = :org_id'
    params:
      org_id: 10

exclude:
  - audit_logs
  - internal_tokens

include_all:
  - countries
  - currencies

traverse:
  parents: 10
  children: 2
  stop_at:
    - user_roles
    - order_tags

limits:
  rows:
    per_table: 1000
    total: 100000
    tables:
      events: 200
      logs: 0

sampling:
  mode: random
  seed: 42

anonymize:
  mode: auto
  rules:
    users:
      email: fake_email
      phone: fake_phone
    '*.password_hash': redact

output: ./snapshots
# Or object form:
# output:
#   provider: s3
#   bucket: my-team-snapshots
#   region: us-east-1

Top-Level Fields

FieldRequiredTypeDescription
versionYesstringConfig version. Must be '1'.
nameNostringSnapshot name fallback used by CLI if --name is not provided.
fromYesarrayOne or more root selectors to start extraction.
excludeNoarrayTables to exclude entirely from extraction.
include_allNoarrayLookup/reference tables extracted in full before traversal.
traverseNoobjectTraversal controls (parents, children, stop_at).
limitsNoobjectRow-count budgets and overrides.
samplingNoobjectRow selection mode and random seed.
anonymizeNostring/objectoff, auto, or object with mode + rules.
outputYesstring/objectOutput destination (./path, s3://..., gs://..., or provider object).

from

from defines the starting rows.
from:
  - table: users
    where: 'id = :id'
    params:
      id: 1
  • Use table: users for public.users.
  • Use table: schema.table for non-public schemas.
  • where supports named params (:id), values go in params.

traverse

traverse:
  parents: 10
  children: 2
  stop_at:
    - user_roles
FieldTypeDefaultDescription
parentsint10Max parent FK depth (-1 = unlimited).
childrenint2Max child FK depth (0 = disable downstream traversal).
stop_atarray-Junction/link tables where FK expansion should stop.

limits.rows

limits:
  rows:
    per_table: 1000
    total: 100000
    tables:
      users: 200
      audit_logs: 0
FieldTypeDefaultDescription
per_tableint1000Default row-count cap per table.
totalint100000Total row-count cap across the snapshot.
tablesmap[string]int-Per-table row-count overrides (0 means unlimited for that table; use exclude to skip extraction).
Cycle handling: When tables form circular foreign key relationships (cycles), Basecut automatically caps the combined row count for the cycle group based on the sum of effective table limits. For example, if three tables in a cycle each have per_table: 1000, the cycle budget is 3000 total rows. Adjust per_table or add individual tables entries to control cycle sizes.

anonymize

Shorthand:
anonymize: auto
Object form:
anonymize:
  mode: manual # auto | manual | off
  rules:
    users:
      email: fake_email
      phone:
        strategy: partial_mask
        params:
          visible: 4
    '*.password_hash': redact
Notes:
  • mode: auto enables built-in PII auto-detection plus explicit rules.
  • mode: manual applies only explicit rules.
  • mode: off disables anonymization.
  • Table-grouped rule keys can be unqualified (users) for public.users.

output

String shorthand:
output: ./snapshots
# output: s3://my-bucket/team-a
# output: gs://my-bucket/team-a
Object form:
output:
  provider: local # local | s3 | gcs
  path: ./snapshots
For cloud providers:
output:
  provider: s3
  bucket: my-bucket
  region: us-east-1
  endpoint: https://<accountid>.r2.cloudflarestorage.com # optional for S3-compatible backends
  prefix: team-a/snapshots
Notes:
  • For provider: s3, bucket is required.
  • For Cloudflare R2, use provider: s3 + endpoint and set region: auto.

Minimal Config

version: '1'

from:
  - table: users
    where: 'id = :id'
    params:
      id: 1

anonymize: auto
output: ./snapshots