Rollback Runbook¶

Quick Rollback via Helm¶

# List recent releases to find the target revision
helm history zkprova

# Roll back to the previous revision
helm rollback zkprova <revision>

# Example: roll back to revision 5
helm rollback zkprova 5

Helm rollback restores the previous Kubernetes manifests and image tags. The database is not rolled back automatically — see the post-rollback checklist below.

Rollback via GitHub Actions¶

Open Actions > Deploy in the GitHub UI
Click Run workflow
Select the target environment (staging or production)
The workflow will build and deploy from the current main branch tip

To deploy a specific older commit:

Create a temporary branch from the known-good commit:

git checkout -b hotfix/rollback-<sha> <good-commit-sha>
git push origin hotfix/rollback-<sha>

Open a PR to main, merge it, and let the auto-staging deploy trigger
Manually promote to production via workflow_dispatch once staging is verified

Decision Framework: Rollback vs Hotfix Forward¶

Signal	Rollback	Hotfix Forward
Users actively impacted	Yes	—
Root cause is unclear	Yes	—
Fix is obvious and small (< 30 min)	—	Yes
Database migration ran (non-reversible)	—	Yes
Multiple commits since last good deploy	Yes	—

Default to rollback when in doubt — restoring service is the priority.

Post-Rollback Verification Checklist¶

[ ] kubectl rollout status deployment/zkprova-zkprova-backend --timeout=120s
[ ] kubectl rollout status deployment/zkprova-zkprova-frontend --timeout=120s
[ ] Run smoke tests: bash scripts/smoke-test.sh <deploy-url>
[ ] Check application logs: kubectl logs -l app=zkprova-backend --tail=50
[ ] Verify database connectivity: hit /health and confirm "database": "ok"
[ ] Check for failed Kubernetes events: kubectl get events --sort-by=.lastTimestamp | tail -20
[ ] If a migration was part of the bad deploy, assess whether a reverse migration is needed
[ ] Notify the team in Slack with: environment, rolled-back revision, reason, and next steps