Basecut can execute snapshot creation in two modes. Both produce identical snapshots—the difference is where the processing happens.
For production workloads, self-hosted agents are recommended. They provide better security (your credentials stay in your environment), better performance (same region as your database), and predictable costs.
Choose Your Execution Mode
Use this decision tree to select the right mode for your use case:
Quick Decision Guide
1. Do you have a self-hosted agent with network access to your source database?
- No → Use Local Execution (default)
- Yes → Continue
2. Are you working in production or with production-sized data?
- Yes → Use Self-Hosted Agents (recommended for production)
- No → Continue
3. Do you have self-hosted agents deployed?
- No → Use Local Execution (default, no setup required)
- Yes → Use Self-Hosted Agents (better performance and security)
Quick Reference
| Scenario | Recommended Mode | Reason |
|---|
| Private DB + agent in VPC | Self-Hosted Agent | Agent can reach private network safely |
| Private DB + no agent path | Local | No remote worker can reach the database |
| Quick testing / development | Local | Faster setup, no infrastructure needed |
| Production workloads | Self-Hosted Agent | Better security, performance, and reliability |
| Large datasets (> 1M rows) | Self-Hosted Agent | Doesn’t consume local resources |
| CI/CD pipelines | Self-Hosted Agent | Consistent execution environment |
| Small datasets (< 100k rows) | Local | Quick and simple |
| Team collaboration | Self-Hosted Agent | Centralized execution and job history |
Default behavior: Local execution (no --async flag)
Agent Execution (Recommended for Production)
Deploy agents in your infrastructure to process extractions. Your CLI submits jobs to the Basecut API, self-hosted agents pick up the work, and results are stored in your configured S3/GCS bucket.
When to Use
- Production databases - Agent runs in cloud environment with stable network and resources
- Large extractions - Process millions of rows without consuming local CPU/memory
- Team workflows - Centralized execution with job history and shared snapshots
- CI/CD pipelines - No local database access needed—agent connects directly to your DB
How It Works
basecut snapshot create \
--config basecut.yml \
--name "prod-snapshot" \
--async
- CLI submits job to Basecut API with your configuration
- Self-hosted agent picks up the job from the queue
- Agent connects to your database (requires network access)
- Extraction runs on agent infrastructure
- Snapshot uploaded to configured cloud storage (S3/GCS)
- CLI polls for job status and reports completion
Storage: Always uses cloud storage (S3 or GCS)
Requirements:
- Agent needs network access to your database
- Database credentials passed securely via config
- Cloud storage bucket configured in
basecut.yml
Configuration
# basecut.yml
output:
provider: s3
bucket: my-basecut-snapshots
region: us-east-1
# prefix: snapshots/ # optional path prefix
# Agent execution is enabled with `--async`
--async is ignored when output.provider: local (agents can’t write to your
local filesystem). In that case, Basecut runs locally and prints a warning.
Local Execution
The CLI runs extraction directly on your machine. Useful for development and databases without public access.
When to Use
- No agent deployment yet - Start immediately with no infrastructure
- DB reachable from your machine - Local dev DB, tunnel, or VPN from laptop
- Development/testing - Quick iteration without job queue latency
- Small datasets - Extraction completes in seconds on your laptop
- Air-gapped environments - No internet access required (with local storage)
How It Works
basecut snapshot create \
--config basecut.yml \
--name "dev-snapshot" \
--source "postgresql://localhost:5432/myapp"
- CLI reads configuration
- Connects directly to your database
- Runs extraction algorithm locally
- Stores snapshot in local filesystem or uploads to cloud storage
- Reports completion immediately
Storage: Can use local filesystem or cloud storage (your choice)
Requirements:
- Database accessible from your machine
- Sufficient local CPU/memory for extraction
- Disk space for snapshot files (if using local storage)
Configuration
For local filesystem storage:
# basecut.yml
output:
provider: local
path: /absolute/path/to/snapshots
format: directory
For cloud storage (CLI uploads after extraction):
# basecut.yml
output:
provider: s3 # or gcs
bucket: my-basecut-snapshots
region: us-east-1
Local snapshots stored on filesystem are tied to the creating machine.
Restoring on a different machine requires access to the same file path.
Comparison
| Feature | Self-Hosted Agent | Local Execution |
|---|
| Database Access | Agent connects from your infrastructure | CLI connects directly |
| Processing Power | Your infrastructure (EC2, GKE, etc.) | Your local machine |
| Storage | S3 or GCS (required) | Local filesystem or cloud |
| Authentication | basecut login or BASECUT_API_KEY required | basecut login or BASECUT_API_KEY required |
| Job Queue | Yes (track history, retry failures) | No (immediate execution) |
| Best For | Production, large datasets, teams | Development, quick testing, small extractions |
| Cost | Your infrastructure costs (EC2, storage) | Free (your local resources) |
| Security | Credentials stay in your environment | Direct database access |
Switching Between Modes
Same configuration file works for both modes. Control via CLI flags:
# Agent execution
basecut snapshot create --config basecut.yml --async
# Local execution
basecut snapshot create --config basecut.yml
--async queues agent execution; omitting it runs locally.
Next Steps