Skip to main content
A snapshot is a point-in-time capture of a subset of your database, packaged with everything needed to restore that exact data state to any compatible database.

What Makes a Snapshot

Basecut doesn’t just export tables—it creates self-contained, referentially-valid snapshots by following foreign key relationships. This ensures every snapshot can be applied to an empty database without constraint violations.

Key Characteristics

  1. Referentially Complete: Every row includes all its foreign key dependencies
  2. Point-in-Time: Captures database state at a specific moment
  3. Self-Contained: Includes schema metadata for validation
  4. Portable: Can be restored to any database with matching schema
  5. Versioned: Multiple versions can coexist for the same snapshot name

Snapshot Anatomy

Every snapshot consists of three required components and one optional component:

1. Manifest (manifest.json)

Contains metadata about the snapshot:
{
  "version": "1.0",
  "created_at": "2024-01-29T10:30:00Z",
  "source": {
    "host": "prod-db.internal",
    "database": "myapp",
    "database_name": "myapp_production"
  },
  "stats": {
    "total_rows": 2847,
    "table_count": 23,
    "warning_count": 0,
    "extraction_duration_ms": 45320,
    "uncompressed_size": 12543200,
    "compressed_size": 3145600
  },
  "config": {
    "root_table": "public.users",
    "max_upstream_depth": 5,
    "max_downstream_depth": 10,
    "sample_mode": "random"
  },
  "schema_hash": "a1b2c3d4e5f6..."
}
Purpose:
  • Identifies the snapshot uniquely
  • Records extraction context for debugging
  • Stores git metadata for reproducibility
  • Tracks performance metrics

2. Schema (schema.json)

Schema metadata for all included tables at extraction time. Purpose:
  • Validate target database compatibility before restore
  • Detect schema drift between snapshot and target
  • Document database structure at extraction time

3. Data Files (tables/*.csv.gz)

One compressed CSV file per table with extracted rows:
tables/
├── public.users.csv.gz                (234 KB, 847 rows)
├── public.organizations.csv.gz        (12 KB, 45 rows)
├── public.orders.csv.gz               (523 KB, 2,134 rows)
├── public.line_items.csv.gz           (891 KB, 8,912 rows)
├── public.products.csv.gz             (156 KB, 432 rows)
└── public.payment_methods.csv.gz      (34 KB, 178 rows)
Format:
  • Standard CSV with header row
  • Gzip compressed for efficiency
  • NULL values represented as empty fields
  • Special characters escaped per RFC 4180

4. Warnings (warnings.json) - Optional

Only present if extraction encountered warnings:
{
  "warnings": [
    {
      "code": "LIMIT_REACHED",
      "message": "per_table limit (1000) reached for table 'audit_logs'",
      "table": "public.audit_logs",
      "severity": "warning"
    }
  ]
}
Purpose:
  • Document extraction issues without failing
  • Warn users about partial data
  • Aid in debugging configuration problems

Snapshot Lifecycle

1. Creation

Local Execution (default):
basecut snapshot create \
  --config basecut.yml \
  --name "dev-seed" \
  --source "$BASECUT_DATABASE_URL"
Process:
  1. CLI reads configuration from basecut.yml
  2. Connects to source database
  3. Executes root queries to get seed rows
  4. Recursively follows foreign keys (upstream + downstream)
  5. Applies anonymization rules to PII fields
  6. Compresses data to CSV.gz files
  7. Generates manifest and schema metadata
  8. Uploads to configured storage (S3/GCS/local)

2. Storage

Snapshots are stored in your configured provider: S3 Example:
s3://my-basecut-snapshots/
└── team-a/                         # optional prefix
    └── snapshots/
        └── 8f4f4b0e-fbc7-4f70.../ # job id
            ├── manifest.json
            ├── schema.json
            ├── warnings.json (optional)
            └── tables/
                ├── public.users.csv.gz
                └── public.orders.csv.gz
Name/tag references (for example dev-seed:brave-lion) are resolved via API metadata, not by directly constructing storage paths.

3. Restoration

Basic restore:
basecut snapshot restore dev-seed:latest --target "$BASECUT_DATABASE_URL"
Process:
  1. CLI resolves snapshot version (:latestbrave-lion)
  2. Downloads manifest and schema from storage
  3. Validates target database schema compatibility
  4. Determines insertion order (respecting foreign keys)
  5. Inserts data in dependency order
  6. Verifies referential integrity
  7. Reports completion with row counts

Guarantees

Referential Integrity

Guarantee: Every snapshot is self-contained and referentially valid. What this means:
  • Every foreign key constraint will be satisfied after restore
  • No “dangling references” or broken relationships
  • Can restore to an empty database without errors
How it works:
  • Basecut tracks all foreign key relationships during extraction
  • Before including a row, ensures all referenced rows are included
  • If a parent is missing, adds it (even beyond depth limits)

Schema Preservation

Guarantee: Schema metadata is captured at extraction time. What this means:
  • Restore validates target schema matches snapshot schema
  • Detects missing tables, columns, or type changes
  • Prevents data corruption from schema drift

Size Estimation

Estimate snapshot size before creating:
Snapshot size ≈ (Raw data size × compression ratio) + overhead

Where:
- compression ratio ≈ 0.1 to 0.3 (gzip)
- overhead ≈ 50-100KB (manifest + schema)
Source DataEstimated Snapshot Size
10MB1-3MB
100MB10-30MB
1GB100-300MB
10GB1-3GB

Performance Characteristics

Creation Time

Dataset SizeExtraction Time
< 10k rows5-30 seconds
10k-100k rows30s-5 minutes
100k-1M rows5-30 minutes
> 1M rowsUse agents (30m-2h)

Snapshot vs Backup

Basecut is NOT a backup tool. Do not use snapshots as your database backup strategy.
FeatureSnapshotBackup
PurposeTesting with realistic dataDisaster recovery
CompletenessSubset of dataFull database
SizeSmall (10MB-1GB typical)Full database size
AnonymizationBuilt-in PII protectionRaw production data
Use snapshots for:
  • Seeding development databases
  • Sharing realistic test data with team
  • Reproducing production bugs locally
Use backups for:
  • Disaster recovery
  • Point-in-time recovery
  • Compliance archiving

Next Steps