What Makes a Snapshot
Basecut doesn’t just export tables—it creates self-contained, referentially-valid snapshots by following foreign key relationships. This ensures every snapshot can be applied to an empty database without constraint violations.Key Characteristics
- Referentially Complete: Every row includes all its foreign key dependencies
- Point-in-Time: Captures database state at a specific moment
- Self-Contained: Includes schema metadata for validation
- Portable: Can be restored to any database with matching schema
- Versioned: Multiple versions can coexist for the same snapshot name
Snapshot Anatomy
Every snapshot consists of three required components and one optional component:1. Manifest (manifest.json)
Contains metadata about the snapshot:
- Identifies the snapshot uniquely
- Records extraction context for debugging
- Stores git metadata for reproducibility
- Tracks performance metrics
2. Schema (schema.json)
Schema metadata for all included tables at extraction time.
Purpose:
- Validate target database compatibility before restore
- Detect schema drift between snapshot and target
- Document database structure at extraction time
3. Data Files (tables/*.csv.gz)
One compressed CSV file per table with extracted rows:
- Standard CSV with header row
- Gzip compressed for efficiency
- NULL values represented as empty fields
- Special characters escaped per RFC 4180
4. Warnings (warnings.json) - Optional
Only present if extraction encountered warnings:
- Document extraction issues without failing
- Warn users about partial data
- Aid in debugging configuration problems
Snapshot Lifecycle
1. Creation
Local Execution (default):- CLI reads configuration from
basecut.yml - Connects to source database
- Executes root queries to get seed rows
- Recursively follows foreign keys (upstream + downstream)
- Applies anonymization rules to PII fields
- Compresses data to CSV.gz files
- Generates manifest and schema metadata
- Uploads to configured storage (S3/GCS/local)
2. Storage
Snapshots are stored in your configured provider: S3 Example:dev-seed:brave-lion) are resolved via API metadata, not by directly constructing storage paths.
3. Restoration
Basic restore:- CLI resolves snapshot version (
:latest→brave-lion) - Downloads manifest and schema from storage
- Validates target database schema compatibility
- Determines insertion order (respecting foreign keys)
- Inserts data in dependency order (or uses an automatic two-phase fallback for eligible nullable FK cycles)
- Verifies referential integrity
- Reports completion with row counts
Guarantees
Referential Integrity
Guarantee: Every snapshot is self-contained and referentially valid. What this means:- Every foreign key constraint will be satisfied after restore
- No “dangling references” or broken relationships
- Can restore to an empty database without errors
- Basecut tracks all foreign key relationships during extraction
- Before including a row, ensures all referenced rows are included
- If a parent is missing, adds it (even beyond depth limits)
Schema Preservation
Guarantee: Schema metadata is captured at extraction time. What this means:- Restore validates target schema matches snapshot schema
- Detects missing tables, columns, or type changes
- Prevents data corruption from schema drift
Size Estimation
Estimate snapshot size before creating:| Source Data | Estimated Snapshot Size |
|---|---|
| 10MB | 1-3MB |
| 100MB | 10-30MB |
| 1GB | 100-300MB |
| 10GB | 1-3GB |
Performance Characteristics
Creation Time
| Dataset Size | Extraction Time |
|---|---|
| < 10k rows | 5-30 seconds |
| 10k-100k rows | 30s-5 minutes |
| 100k-1M rows | 5-30 minutes |
| > 1M rows | Use agents (30m-2h) |
Snapshot vs Backup
| Feature | Snapshot | Backup |
|---|---|---|
| Purpose | Testing with realistic data | Disaster recovery |
| Completeness | Subset of data | Full database |
| Size | Small (10MB-1GB typical) | Full database size |
| Anonymization | Built-in PII protection | Raw production data |
- Seeding development databases
- Sharing realistic test data with team
- Reproducing production bugs locally
- Disaster recovery
- Point-in-time recovery
- Compliance archiving
Next Steps
How It Works
Deep dive into the extraction algorithm
Snapshot Versioning
Managing multiple versions of snapshots
Storage Providers
Configure S3, GCS, or local filesystem
CLI Reference
Complete command documentation