Self-Hosted Agents

Self-hosted agents are the recommended approach for production use. Your agents run in your infrastructure with your database credentials—Basecut never touches your data or credentials. Agents continuously poll for snapshot jobs, execute extractions, and upload results to your configured storage (S3/GCS). All processing happens in your environment.

Why Self-Hosted?

Security & Compliance:

Your database credentials never leave your environment
Data processing happens entirely in your VPC/cloud
Meet strict compliance requirements (SOC2, HIPAA, GDPR)
Zero-trust architecture

Performance & Control:

Choose your own compute resources (CPU, memory)
Deploy in the same region as your database
Scale agents based on your workload
No network latency to external services

Cost Efficiency:

Pay only for your infrastructure (EC2, GCS, etc.)
No per-job fees or processing charges
Predictable costs at scale

When to Use Self-Hosted Agents

Production workloads - Recommended default for all production snapshots
Private databases - Behind VPN/VPC without public access
Compliance requirements - Data cannot leave your infrastructure
High volume - Processing dozens or hundreds of snapshots

Railway (Fastest)

Deploy a Basecut agent to Railway in one click:

After deployment, set at least:

BASECUT_API_KEY
BASECUT_DATABASE_URL

Use this option when you want managed agent hosting without setting up Docker, Kubernetes, or ECS manually.

Docker Deployment

Basic Docker Run

Run a single agent container:

docker run -d \
  --name basecut-agent \
  --restart unless-stopped \
  -e BASECUT_API_KEY=your_org_api_key \
  -e BASECUT_DATABASE_URL=postgres://readonly@prod-db:5432/myapp \
  -v ~/.aws:/root/.aws:ro \
  ghcr.io/basecuthq/basecut-agent:latest

Environment variables:

Variable	Required	Description
`BASECUT_API_KEY`	Yes	Organization-scoped API key (`bc_live_` or `bc_test_`)
`BASECUT_DATABASE_URL`	Yes	Database connection string for the agent
`AWS_ACCESS_KEY_ID`	No	AWS credentials (or mount `~/.aws`)
`AWS_SECRET_ACCESS_KEY`	No	AWS credentials
`AWS_REGION`	No	AWS region fallback when `output.region` is not set in `basecut.yml`
`GOOGLE_APPLICATION_CREDENTIALS`	No	Path to GCS service account key

Agent settings like poll interval and agent ID are configured via CLI flags: --poll-interval, --heartbeat-interval, --agent-id, --run-once.

Docker Compose

For production deployments with multiple agents and monitoring:

# docker-compose.yml
version: '3.8'

services:
  agent:
    image: ghcr.io/basecuthq/basecut-agent:latest
    restart: unless-stopped
    environment:
      BASECUT_API_KEY: ${BASECUT_API_KEY}
      BASECUT_DATABASE_URL: ${BASECUT_DATABASE_URL}
      GOOGLE_APPLICATION_CREDENTIALS: /etc/basecut/gcs-key.json
    volumes:
      # AWS credentials
      - ~/.aws:/root/.aws:ro
      # GCS credentials (if using GCS)
      - ./gcs-key.json:/etc/basecut/gcs-key.json:ro
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 2G
    logging:
      driver: json-file
      options:
        max-size: '10m'
        max-file: '3'

  # Optional: Run multiple agents for parallelism
  agent-worker-2:
    extends:
      service: agent
    container_name: basecut-agent-2

  agent-worker-3:
    extends:
      service: agent
    container_name: basecut-agent-3

Start the agent pool:

# Create .env file
cat > .env <<EOF
BASECUT_API_KEY=bc_live_abc123_yoursecret
BASECUT_DATABASE_URL=postgres://readonly@prod-db.internal:5432/myapp
EOF

# Start agents
docker compose up -d

# Scale to 5 agents
docker compose up -d --scale agent=5

Kubernetes Deployment

Deploy agents in a Kubernetes cluster with autoscaling:

# basecut-agent.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: basecut-agent
  namespace: basecut
spec:
  replicas: 3
  selector:
    matchLabels:
      app: basecut-agent
  template:
    metadata:
      labels:
        app: basecut-agent
    spec:
      serviceAccountName: basecut-agent # For Workload Identity or IRSA
      containers:
        - name: agent
          image: ghcr.io/basecuthq/basecut-agent:latest
          env:
            - name: BASECUT_API_KEY
              valueFrom:
                secretKeyRef:
                  name: basecut-credentials
                  key: api-key
            - name: BASECUT_DATABASE_URL
              value: 'postgres://readonly@prod-db.internal:5432/myapp'
            # GKE Workload Identity (GCS access)
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /var/secrets/google/key.json
          resources:
            requests:
              cpu: 1000m
              memory: 2Gi
            limits:
              cpu: 2000m
              memory: 4Gi
          volumeMounts:
            - name: gcp-credentials
              mountPath: /var/secrets/google
              readOnly: true
      volumes:
        - name: gcp-credentials
          secret:
            secretName: basecut-gcp-key

---
apiVersion: v1
kind: Secret
metadata:
  name: basecut-credentials
  namespace: basecut
type: Opaque
stringData:
  api-key: bc_live_abc123_yoursecret

---
# Optional: Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: basecut-agent-hpa
  namespace: basecut
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: basecut-agent
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Deploy:

kubectl create namespace basecut
kubectl apply -f basecut-agent.yaml

AWS ECS Deployment

Run agents on ECS Fargate with IAM roles for S3 access:

{
  "family": "basecut-agent",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/BasecutAgentRole",
  "containerDefinitions": [
    {
      "name": "basecut-agent",
      "image": "ghcr.io/basecuthq/basecut-agent:latest",
      "essential": true,
      "environment": [
        {
          "name": "BASECUT_DATABASE_URL",
          "value": "postgres://readonly@prod-db.internal:5432/myapp"
        }
      ],
      "secrets": [
        {
          "name": "BASECUT_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:basecut/api-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/basecut-agent",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "agent"
        }
      }
    }
  ]
}

IAM policy for TaskRole (S3 access):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-basecut-snapshots/*",
        "arn:aws:s3:::my-basecut-snapshots"
      ]
    }
  ]
}

Deploy via CLI:

# Register task definition
aws ecs register-task-definition --cli-input-json file://basecut-agent-task.json

# Create service
aws ecs create-service \
  --cluster production \
  --service-name basecut-agent \
  --task-definition basecut-agent \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-xyz789],assignPublicIp=ENABLED}"

Agent Configuration

Network Access

Agents need outbound access to:

Basecut API (api.basecut.dev:443)
- Poll for jobs
- Report job status
- Upload job logs
Your database (e.g., prod-db.internal:5432)
- Execute extraction queries
- Read schema metadata
Cloud storage (S3/GCS)
- Upload snapshot artifacts

Firewall rules:

Outbound HTTPS (443) to api.basecut.dev
Outbound PostgreSQL (5432) to your database
Outbound HTTPS (443) to s3.amazonaws.com or storage.googleapis.com

Database Credentials

Best practice: Use read-only database user

-- PostgreSQL: Create read-only user for agent
CREATE USER basecut_agent WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE myapp TO basecut_agent;
GRANT USAGE ON SCHEMA public TO basecut_agent;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO basecut_agent;

-- Ensure future tables are readable
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO basecut_agent;

Pass credentials via:

Environment variable: BASECUT_DATABASE_URL=postgres://basecut_agent:password@host:5432/db
AWS Secrets Manager / Kubernetes Secrets

Monitoring and Logging

Monitor agents via container logs and your platform health checks.

Troubleshooting

Agent Not Picking Up Jobs

Check agent logs:

docker logs basecut-agent

Common issues:

Invalid BASECUT_API_KEY (should start with bc_live_ or bc_test_)
Network access blocked to api.basecut.dev
Agent registered to wrong organization

Verify connectivity:

docker exec basecut-agent curl https://api.basecut.dev/health

Database Connection Failures

Test database access from agent:

docker exec basecut-agent psql "$BASECUT_DATABASE_URL" -c "SELECT 1"

Common issues:

Firewall blocking database port
Database requires SSL (?sslmode=require)
Read-only user lacks schema permissions

S3/GCS Upload Failures

Verify cloud credentials:

# AWS
docker exec basecut-agent aws s3 ls s3://my-bucket

# GCS
docker exec basecut-agent gsutil ls gs://my-bucket

Common issues:

IAM role/service account lacks PutObject permission
Bucket doesn’t exist or is in wrong region
Missing or invalid S3 region on the agent (output.region or AWS_REGION)
Network egress blocked to cloud storage API

Tip for S3-compatible endpoints (Cloudflare R2, MinIO):

Use output.provider: s3 with output.endpoint
Set output.region explicitly (Cloudflare R2: region: auto)

Cost Optimization

Right-Size Agent Resources

Snapshot Size	CPU	Memory	Recommended
< 10k rows	0.5	1GB	Development/small teams
10k-100k rows	1.0	2GB	Most production workloads
100k-1M rows	2.0	4GB	Large databases
> 1M rows	4.0	8GB	Enterprise-scale

Autoscaling

Scale agents based on job queue depth:

Kubernetes HPA: Scale on CPU/memory utilization
ECS Service Autoscaling: Scale on CloudWatch metric JobQueueDepth (custom metric)
Target: 1 agent per 5-10 jobs in queue

Spot Instances

Agents are fault-tolerant—jobs resume if an agent dies:

ECS: Use Fargate Spot for 70% cost savings
Kubernetes: Use spot node pools
EC2: Use Spot Instances with interruption handling

Next Steps

Execution Modes

Understand agent vs local execution

Troubleshooting

Debug common agent issues

CI/CD Integration

Use agents in GitHub Actions

Environment Variables

Complete agent configuration reference

​Why Self-Hosted?

​When to Use Self-Hosted Agents

​Railway (Fastest)

​Docker Deployment

​Basic Docker Run

​Docker Compose

​Kubernetes Deployment

​AWS ECS Deployment

​Agent Configuration

​Network Access

​Database Credentials

​Monitoring and Logging

​Troubleshooting

​Agent Not Picking Up Jobs

​Database Connection Failures

​S3/GCS Upload Failures

​Cost Optimization

​Right-Size Agent Resources

​Autoscaling

​Spot Instances

​Next Steps

Execution Modes

Troubleshooting

CI/CD Integration

Environment Variables

Why Self-Hosted?

When to Use Self-Hosted Agents

Railway (Fastest)

Docker Deployment

Basic Docker Run

Docker Compose

Kubernetes Deployment

AWS ECS Deployment

Agent Configuration

Network Access

Database Credentials

Monitoring and Logging

Troubleshooting

Agent Not Picking Up Jobs

Database Connection Failures

S3/GCS Upload Failures

Cost Optimization

Right-Size Agent Resources

Autoscaling

Spot Instances

Next Steps