Snapshots

A snapshot is a point-in-time capture of a subset of your database, packaged with everything needed to restore that exact data state to any compatible database.

What Makes a Snapshot

Basecut doesn’t just export tables—it creates self-contained, referentially-valid snapshots by following foreign key relationships. This ensures every snapshot can be applied to an empty database without constraint violations.

Key Characteristics

Referentially Complete: Every row includes all its foreign key dependencies
Point-in-Time: Captures database state at a specific moment
Self-Contained: Includes schema metadata for validation
Portable: Can be restored to any database with matching schema
Versioned: Multiple versions can coexist for the same snapshot name

Snapshot Anatomy

Every snapshot consists of three required components and one optional component:

1. Manifest (`manifest.json`)

Contains metadata about the snapshot:

{
  "version": "1.0",
  "created_at": "2024-01-29T10:30:00Z",
  "source": {
    "host": "prod-db.internal",
    "database": "myapp",
    "database_name": "myapp_production"
  },
  "stats": {
    "total_rows": 2847,
    "table_count": 23,
    "warning_count": 0,
    "extraction_duration_ms": 45320,
    "uncompressed_size": 12543200,
    "compressed_size": 3145600
  },
  "config": {
    "root_table": "public.users",
    "max_upstream_depth": 5,
    "max_downstream_depth": 10,
    "sample_mode": "random"
  },
  "schema_hash": "a1b2c3d4e5f6..."
}

Purpose:

Identifies the snapshot uniquely
Records extraction context for debugging
Stores git metadata for reproducibility
Tracks performance metrics

2. Schema (`schema.json`)

Schema metadata for all included tables at extraction time. Purpose:

Validate target database compatibility before restore
Detect schema drift between snapshot and target
Document database structure at extraction time

3. Data Files (`tables/*.csv.gz`)

One compressed CSV file per table with extracted rows:

tables/
├── public.users.csv.gz                (234 KB, 847 rows)
├── public.organizations.csv.gz        (12 KB, 45 rows)
├── public.orders.csv.gz               (523 KB, 2,134 rows)
├── public.line_items.csv.gz           (891 KB, 8,912 rows)
├── public.products.csv.gz             (156 KB, 432 rows)
└── public.payment_methods.csv.gz      (34 KB, 178 rows)

Format:

Standard CSV with header row
Gzip compressed for efficiency
NULL values represented as empty fields
Special characters escaped per RFC 4180

4. Warnings (`warnings.json`) - Optional

Only present if extraction encountered warnings:

{
  "warnings": [
    {
      "code": "LIMIT_REACHED",
      "message": "per_table limit (1000) reached for table 'audit_logs'",
      "table": "public.audit_logs",
      "severity": "warning"
    }
  ]
}

Purpose:

Document extraction issues without failing
Warn users about partial data
Aid in debugging configuration problems

Snapshot Lifecycle

1. Creation

Local Execution (default):

basecut snapshot create \
  --config basecut.yml \
  --name "dev-seed" \
  --source "$BASECUT_DATABASE_URL"

Process:

CLI reads configuration from basecut.yml
Connects to source database
Executes root queries to get seed rows
Recursively follows foreign keys (upstream + downstream)
Applies anonymization rules to PII fields
Compresses data to CSV.gz files
Generates manifest and schema metadata
Uploads to configured storage (S3/GCS/local)

2. Storage

Snapshots are stored in your configured provider: S3 Example:

s3://my-basecut-snapshots/
└── team-a/                         # optional prefix
    └── snapshots/
        └── 8f4f4b0e-fbc7-4f70.../ # job id
            ├── manifest.json
            ├── schema.json
            ├── warnings.json (optional)
            └── tables/
                ├── public.users.csv.gz
                └── public.orders.csv.gz

Name/tag references (for example dev-seed:brave-lion) are resolved via API metadata, not by directly constructing storage paths.

3. Restoration

Basic restore:

basecut snapshot restore dev-seed:latest --target "$BASECUT_DATABASE_URL"

Process:

CLI resolves snapshot version (:latest → brave-lion)
Downloads manifest and schema from storage
Validates target database schema compatibility
Determines insertion order (respecting foreign keys)
Inserts data in dependency order (or uses an automatic two-phase fallback for eligible nullable FK cycles)
Verifies referential integrity
Reports completion with row counts

Guarantees

Referential Integrity

Guarantee: Every snapshot is self-contained and referentially valid. What this means:

Every foreign key constraint will be satisfied after restore
No “dangling references” or broken relationships
Can restore to an empty database without errors

How it works:

Basecut tracks all foreign key relationships during extraction
Before including a row, ensures all referenced rows are included
If a parent is missing, adds it (even beyond depth limits)

Schema Preservation

Guarantee: Schema metadata is captured at extraction time. What this means:

Restore validates target schema matches snapshot schema
Detects missing tables, columns, or type changes
Prevents data corruption from schema drift

Size Estimation

Estimate snapshot size before creating:

Snapshot size ≈ (Raw data size × compression ratio) + overhead

Where:
- compression ratio ≈ 0.1 to 0.3 (gzip)
- overhead ≈ 50-100KB (manifest + schema)

Source Data	Estimated Snapshot Size
10MB	1-3MB
100MB	10-30MB
1GB	100-300MB
10GB	1-3GB

Performance Characteristics

Creation Time

Dataset Size	Extraction Time
< 10k rows	5-30 seconds
10k-100k rows	30s-5 minutes
100k-1M rows	5-30 minutes
> 1M rows	Use agents (30m-2h)

Snapshot vs Backup

Basecut is NOT a backup tool. Do not use snapshots as your database backup strategy.

Feature	Snapshot	Backup
Purpose	Testing with realistic data	Disaster recovery
Completeness	Subset of data	Full database
Size	Small (10MB-1GB typical)	Full database size
Anonymization	Built-in PII protection	Raw production data

Use snapshots for:

Seeding development databases
Sharing realistic test data with team
Reproducing production bugs locally

Use backups for:

Disaster recovery
Point-in-time recovery
Compliance archiving

Next Steps

How It Works

Deep dive into the extraction algorithm

Snapshot Versioning

Managing multiple versions of snapshots

Storage Providers

Configure S3, GCS, or local filesystem

CLI Reference

Complete command documentation

Getting Started

Core Concepts

Configuration

Workflows

Integrations

CLI Reference

Deployment

Reference

What Makes a Snapshot

Key Characteristics

Snapshot Anatomy

1. Manifest (`manifest.json`)

2. Schema (`schema.json`)

3. Data Files (`tables/*.csv.gz`)

4. Warnings (`warnings.json`) - Optional

Snapshot Lifecycle

1. Creation

2. Storage

3. Restoration

Guarantees

Referential Integrity

Schema Preservation

Size Estimation

Performance Characteristics

Creation Time

Snapshot vs Backup

Next Steps

How It Works

Snapshot Versioning

Storage Providers

CLI Reference

Getting Started

Core Concepts

Configuration

Workflows

Integrations

CLI Reference

Deployment

Reference

​What Makes a Snapshot

​Key Characteristics

​Snapshot Anatomy

​1. Manifest (manifest.json)

​2. Schema (schema.json)

​3. Data Files (tables/*.csv.gz)

​4. Warnings (warnings.json) - Optional

​Snapshot Lifecycle

​1. Creation

​2. Storage

​3. Restoration

​Guarantees

​Referential Integrity

​Schema Preservation

​Size Estimation

​Performance Characteristics

​Creation Time

​Snapshot vs Backup

​Next Steps

How It Works

Snapshot Versioning

Storage Providers

CLI Reference

What Makes a Snapshot

Key Characteristics

Snapshot Anatomy

1. Manifest (`manifest.json`)

2. Schema (`schema.json`)

3. Data Files (`tables/*.csv.gz`)

4. Warnings (`warnings.json`) - Optional

Snapshot Lifecycle

1. Creation

2. Storage

3. Restoration

Guarantees

Referential Integrity

Schema Preservation

Size Estimation

Performance Characteristics

Creation Time

Snapshot vs Backup

Next Steps