A snapshot is a point-in-time capture of a subset of your database, packaged with everything needed to restore that exact data state to any compatible database.
What Makes a Snapshot
Basecut doesn’t just export tables—it creates self-contained, referentially-valid snapshots by following foreign key relationships. This ensures every snapshot can be applied to an empty database without constraint violations.
Key Characteristics
- Referentially Complete: Every row includes all its foreign key dependencies
- Point-in-Time: Captures database state at a specific moment
- Self-Contained: Includes schema metadata for validation
- Portable: Can be restored to any database with matching schema
- Versioned: Multiple versions can coexist for the same snapshot name
Snapshot Anatomy
Every snapshot consists of three required components and one optional component:
1. Manifest (manifest.json)
Contains metadata about the snapshot:
{
"version": "1.0",
"created_at": "2024-01-29T10:30:00Z",
"source": {
"host": "prod-db.internal",
"database": "myapp",
"database_name": "myapp_production"
},
"stats": {
"total_rows": 2847,
"table_count": 23,
"warning_count": 0,
"extraction_duration_ms": 45320,
"uncompressed_size": 12543200,
"compressed_size": 3145600
},
"config": {
"root_table": "public.users",
"max_upstream_depth": 5,
"max_downstream_depth": 10,
"sample_mode": "random"
},
"schema_hash": "a1b2c3d4e5f6..."
}
Purpose:
- Identifies the snapshot uniquely
- Records extraction context for debugging
- Stores git metadata for reproducibility
- Tracks performance metrics
2. Schema (schema.json)
Schema metadata for all included tables at extraction time.
Purpose:
- Validate target database compatibility before restore
- Detect schema drift between snapshot and target
- Document database structure at extraction time
3. Data Files (tables/*.csv.gz)
One compressed CSV file per table with extracted rows:
tables/
├── public.users.csv.gz (234 KB, 847 rows)
├── public.organizations.csv.gz (12 KB, 45 rows)
├── public.orders.csv.gz (523 KB, 2,134 rows)
├── public.line_items.csv.gz (891 KB, 8,912 rows)
├── public.products.csv.gz (156 KB, 432 rows)
└── public.payment_methods.csv.gz (34 KB, 178 rows)
Format:
- Standard CSV with header row
- Gzip compressed for efficiency
- NULL values represented as empty fields
- Special characters escaped per RFC 4180
4. Warnings (warnings.json) - Optional
Only present if extraction encountered warnings:
{
"warnings": [
{
"code": "LIMIT_REACHED",
"message": "per_table limit (1000) reached for table 'audit_logs'",
"table": "public.audit_logs",
"severity": "warning"
}
]
}
Purpose:
- Document extraction issues without failing
- Warn users about partial data
- Aid in debugging configuration problems
Snapshot Lifecycle
1. Creation
Local Execution (default):
basecut snapshot create \
--config basecut.yml \
--name "dev-seed" \
--source "$BASECUT_DATABASE_URL"
Process:
- CLI reads configuration from
basecut.yml
- Connects to source database
- Executes root queries to get seed rows
- Recursively follows foreign keys (upstream + downstream)
- Applies anonymization rules to PII fields
- Compresses data to CSV.gz files
- Generates manifest and schema metadata
- Uploads to configured storage (S3/GCS/local)
2. Storage
Snapshots are stored in your configured provider:
S3 Example:
s3://my-basecut-snapshots/
└── team-a/ # optional prefix
└── snapshots/
└── 8f4f4b0e-fbc7-4f70.../ # job id
├── manifest.json
├── schema.json
├── warnings.json (optional)
└── tables/
├── public.users.csv.gz
└── public.orders.csv.gz
Name/tag references (for example dev-seed:brave-lion) are resolved via API metadata, not by directly constructing storage paths.
3. Restoration
Basic restore:
basecut snapshot restore dev-seed:latest --target "$BASECUT_DATABASE_URL"
Process:
- CLI resolves snapshot version (
:latest → brave-lion)
- Downloads manifest and schema from storage
- Validates target database schema compatibility
- Determines insertion order (respecting foreign keys)
- Inserts data in dependency order
- Verifies referential integrity
- Reports completion with row counts
Guarantees
Referential Integrity
Guarantee: Every snapshot is self-contained and referentially valid.
What this means:
- Every foreign key constraint will be satisfied after restore
- No “dangling references” or broken relationships
- Can restore to an empty database without errors
How it works:
- Basecut tracks all foreign key relationships during extraction
- Before including a row, ensures all referenced rows are included
- If a parent is missing, adds it (even beyond depth limits)
Schema Preservation
Guarantee: Schema metadata is captured at extraction time.
What this means:
- Restore validates target schema matches snapshot schema
- Detects missing tables, columns, or type changes
- Prevents data corruption from schema drift
Size Estimation
Estimate snapshot size before creating:
Snapshot size ≈ (Raw data size × compression ratio) + overhead
Where:
- compression ratio ≈ 0.1 to 0.3 (gzip)
- overhead ≈ 50-100KB (manifest + schema)
| Source Data | Estimated Snapshot Size |
|---|
| 10MB | 1-3MB |
| 100MB | 10-30MB |
| 1GB | 100-300MB |
| 10GB | 1-3GB |
Creation Time
| Dataset Size | Extraction Time |
|---|
| < 10k rows | 5-30 seconds |
| 10k-100k rows | 30s-5 minutes |
| 100k-1M rows | 5-30 minutes |
| > 1M rows | Use agents (30m-2h) |
Snapshot vs Backup
Basecut is NOT a backup tool. Do not use snapshots as your database backup
strategy.
| Feature | Snapshot | Backup |
|---|
| Purpose | Testing with realistic data | Disaster recovery |
| Completeness | Subset of data | Full database |
| Size | Small (10MB-1GB typical) | Full database size |
| Anonymization | Built-in PII protection | Raw production data |
Use snapshots for:
- Seeding development databases
- Sharing realistic test data with team
- Reproducing production bugs locally
Use backups for:
- Disaster recovery
- Point-in-time recovery
- Compliance archiving
Next Steps