project-page active pal-e-backup
project-pal-e-backup updated 2026-03-16

pal-e-backup

Vision

Off-site backup and disaster recovery for the entire Pal-E platform. If the PC dies, everything can be rebuilt from cloud backups — databases, git repos, identity, secrets, object storage. One unified pipeline, one cloud destination, managed as IaC.

User Stories

Role Key Story Success Metric
Platform owner sleep-at-night I want to know that a disk failure won't destroy my platform Full restore from cloud backups tested and documented
Platform owner backup-confidence I want to be alerted if any backup fails Alert fires within 1 hour of a missed backup
Platform owner restore-speed I want to restore the full platform in under 2 hours Restore time documented from DR test

Plan

Active: plan-pal-e-backup — Off-Site Platform Backup

7 phases: foundation (S3 + repo), database backups, Forgejo backup, MinIO mirror, identity/secrets, monitoring/verification, disaster recovery test.

Board

board-pal-e-backup — Pal-E Backup Board. Continuous kanban.

Status

  • Current backup coverage: pal-e-docs DB and woodpecker DB have CNPG WAL archiving to local MinIO. Terraform state backed up to local MinIO daily. Everything else has zero backup.
  • Off-site backup: None. All backups are on the same disk as the data they protect.
  • Backup verification: cnpg-backup-verify CronJob exists but is currently failing.

Milestones

None yet. First milestone will be defined when Phase 7 (DR test) completes — "Platform Protected."

Architecture

flowchart TD
    subgraph k3s["k3s Node (archbox)"]
        PG_DOCS["pal-e-docs DB
CNPG"] PG_WP["woodpecker DB
CNPG"] PG_BBALL["basketball-api DB
plain pod"] PG_MCD["mcd-tracker DB
plain pod"] FORGEJO["Forgejo
git repos + SQLite"] KEYCLOAK["Keycloak
H2 file DB"] MINIO["MinIO
WAL + TF state + assets"] K8S_SECRETS["k8s Secrets"] end subgraph cron["Backup CronJobs"] PGDUMP["pg_dump
daily"] GDUMP["gitea dump
daily"] MIRROR["mc mirror
daily"] KEXPORT["keycloak export
daily"] SDUMP["secrets export
daily (encrypted)"] end subgraph cloud["External S3 (Backblaze B2)"] S3["s3://pal-e-backups/"] S3_PG["postgres/"] S3_FG["forgejo/"] S3_MM["minio-mirror/"] S3_KC["keycloak/"] S3_SEC["k8s-secrets/"] end PG_DOCS --> PGDUMP PG_WP --> PGDUMP PG_BBALL --> PGDUMP PG_MCD --> PGDUMP FORGEJO --> GDUMP MINIO --> MIRROR KEYCLOAK --> KEXPORT K8S_SECRETS --> SDUMP PGDUMP --> S3_PG GDUMP --> S3_FG MIRROR --> S3_MM KEXPORT --> S3_KC SDUMP --> S3_SEC S3_PG --> S3 S3_FG --> S3 S3_MM --> S3 S3_KC --> S3 S3_SEC --> S3

Backup Flow. Five CronJobs run daily in the k3s cluster. Each targets a specific data category, compresses/encrypts as appropriate, and uploads to a single external S3 bucket organized by directory. A verification job checks freshness and alerts on failure.

Repos

Repo Platform Role Status
pal-e-backup Forgejo Terraform + backup scripts + CronJob manifests planned

Inbox

No untriaged items.