basecut.yml file defines what Basecut extracts, how traversal is bounded, how PII is anonymized, and where output is written.
Complete Example
Top-Level Fields
| Field | Required | Type | Description |
|---|---|---|---|
version | Yes | string | Config version. Must be '1'. |
name | No | string | Snapshot name fallback used by CLI if --name is not provided. |
from | Yes | array | One or more root selectors to start extraction. |
exclude | No | array | Tables to exclude entirely from extraction. |
include_all | No | array | Lookup/reference tables extracted in full before traversal. |
traverse | No | object | Traversal controls (parents, children, stop_at). |
virtual_foreign_keys | No | array | User-defined FK edges for relationships not declared in DB constraints. |
limits | No | object | Row-count budgets and overrides. |
sampling | No | object | Row selection mode and random seed. |
anonymize | No | string/object | off, auto, or object with mode + rules. |
output | Yes | string/object | Output destination (./path, s3://..., gs://..., or provider object). |
from
from defines the starting rows.
- Use
table: usersforpublic.users. - Use
table: schema.tablefor non-public schemas. wheresupports named params (:id), values go inparams.
traverse
| Field | Type | Default | Description |
|---|---|---|---|
parents | int | 10 | Max parent FK depth (-1 = unlimited). |
children | int | 2 | Max child FK depth (0 = disable downstream traversal). |
stop_at | array | - | Junction/link tables where FK expansion should stop. |
virtual_foreign_keys
Use virtual_foreign_keys when related tables are enforced in application code
but not declared as database FK constraints.
Single-column (compact) form:
- Virtual FK edges are traversed the same way as real FKs during extraction.
- Bare table names are normalized to
public.<table>. - Compact endpoints must be
table.columnorschema.table.column. from.columnsandto.columnsmust have the same number of columns.- Duplicate virtual FK definitions are rejected.
limits.rows
| Field | Type | Default | Description |
|---|---|---|---|
per_table | int | 1000 | Default row-count cap per table. |
total | int | 100000 | Total row-count cap across the snapshot. |
tables | map[string]int | - | Per-table row-count overrides (0 means unlimited for that table; use exclude to skip extraction). |
per_table: 1000, the cycle budget is 3000 total rows. Adjust per_table or add individual tables entries to control cycle sizes.
anonymize
Shorthand:
mode: autoenables built-in PII auto-detection plus explicitrules.mode: manualapplies only explicitrules.mode: offdisables anonymization.- Table-grouped rule keys can be unqualified (
users) forpublic.users. excluded_domains(top-level) is an optional list of email domains that bypass anonymization. Emails matching these domains (case-insensitive, exact match) are left unchanged, whether the email strategy was set via explicit rules or auto-detection. Useful for preserving internal/test emails like@yourcompany.com.- Per-rule
params.excluded_domainsoverrides the global list for that specific rule. If present (even as an empty list), the globalexcluded_domainsare ignored for that rule. A rule withoutparams.excluded_domainsinherits the global list.
output
String shorthand:
- For
provider: s3,bucketis required. - For Cloudflare R2, use
provider: s3+endpointand setregion: auto.