Why Self-Hosted?
Security & Compliance:- Your database credentials never leave your environment
- Data processing happens entirely in your VPC/cloud
- Meet strict compliance requirements (SOC2, HIPAA, GDPR)
- Zero-trust architecture
- Choose your own compute resources (CPU, memory)
- Deploy in the same region as your database
- Scale agents based on your workload
- No network latency to external services
- Pay only for your infrastructure (EC2, GCS, etc.)
- No per-job fees or processing charges
- Predictable costs at scale
When to Use Self-Hosted Agents
- Production workloads - Recommended default for all production snapshots
- Private databases - Behind VPN/VPC without public access
- Compliance requirements - Data cannot leave your infrastructure
- High volume - Processing dozens or hundreds of snapshots
Railway (Fastest)
Deploy a Basecut agent to Railway in one click:BASECUT_API_KEYBASECUT_DATABASE_URL
Docker Deployment
Basic Docker Run
Run a single agent container:| Variable | Required | Description |
|---|---|---|
BASECUT_API_KEY | Yes | Organization-scoped API key (bc_live_* or bc_test_*) |
BASECUT_DATABASE_URL | Yes | Database connection string for the agent |
AWS_ACCESS_KEY_ID | No | AWS credentials (or mount ~/.aws) |
AWS_SECRET_ACCESS_KEY | No | AWS credentials |
AWS_REGION | No | AWS region fallback when output.region is not set in basecut.yml |
GOOGLE_APPLICATION_CREDENTIALS | No | Path to GCS service account key |
--poll-interval, --heartbeat-interval, --agent-id, --run-once.
Docker Compose
For production deployments with multiple agents and monitoring:Kubernetes Deployment
Deploy agents in a Kubernetes cluster with autoscaling:AWS ECS Deployment
Run agents on ECS Fargate with IAM roles for S3 access:Agent Configuration
Network Access
Agents need outbound access to:-
Basecut API (
api.basecut.dev:443)- Poll for jobs
- Report job status
- Upload job logs
-
Your database (e.g.,
prod-db.internal:5432)- Execute extraction queries
- Read schema metadata
-
Cloud storage (S3/GCS)
- Upload snapshot artifacts
- Outbound HTTPS (443) to
api.basecut.dev - Outbound PostgreSQL (5432) to your database
- Outbound HTTPS (443) to
s3.amazonaws.comorstorage.googleapis.com
Database Credentials
Best practice: Use read-only database user- Environment variable:
BASECUT_DATABASE_URL=postgres://basecut_agent:password@host:5432/db - AWS Secrets Manager / Kubernetes Secrets
Monitoring and Logging
Monitor agents via container logs and your platform health checks.Troubleshooting
Agent Not Picking Up Jobs
Check agent logs:- Invalid
BASECUT_API_KEY(should start withbc_live_orbc_test_) - Network access blocked to
api.basecut.dev - Agent registered to wrong organization
Database Connection Failures
Test database access from agent:- Firewall blocking database port
- Database requires SSL (
?sslmode=require) - Read-only user lacks schema permissions
S3/GCS Upload Failures
Verify cloud credentials:- IAM role/service account lacks
PutObjectpermission - Bucket doesn’t exist or is in wrong region
- Missing or invalid S3 region on the agent (
output.regionorAWS_REGION) - Network egress blocked to cloud storage API
- Use
output.provider: s3withoutput.endpoint - Set
output.regionexplicitly (Cloudflare R2:region: auto)
Cost Optimization
Right-Size Agent Resources
| Snapshot Size | CPU | Memory | Recommended |
|---|---|---|---|
| < 10k rows | 0.5 | 1GB | Development/small teams |
| 10k-100k rows | 1.0 | 2GB | Most production workloads |
| 100k-1M rows | 2.0 | 4GB | Large databases |
| > 1M rows | 4.0 | 8GB | Enterprise-scale |
Autoscaling
Scale agents based on job queue depth:- Kubernetes HPA: Scale on CPU/memory utilization
- ECS Service Autoscaling: Scale on CloudWatch metric
JobQueueDepth(custom metric) - Target: 1 agent per 5-10 jobs in queue
Spot Instances
Agents are fault-tolerant—jobs resume if an agent dies:- ECS: Use Fargate Spot for 70% cost savings
- Kubernetes: Use spot node pools
- EC2: Use Spot Instances with interruption handling