Rollback Runbook¶
Quick Rollback via Helm¶
# List recent releases to find the target revision
helm history zkprova
# Roll back to the previous revision
helm rollback zkprova <revision>
# Example: roll back to revision 5
helm rollback zkprova 5
Helm rollback restores the previous Kubernetes manifests and image tags. The database is not rolled back automatically — see the post-rollback checklist below.
Rollback via GitHub Actions¶
- Open Actions > Deploy in the GitHub UI
- Click Run workflow
- Select the target environment (
stagingorproduction) - The workflow will build and deploy from the current
mainbranch tip
To deploy a specific older commit:
- Create a temporary branch from the known-good commit:
- Open a PR to
main, merge it, and let the auto-staging deploy trigger - Manually promote to production via
workflow_dispatchonce staging is verified
Decision Framework: Rollback vs Hotfix Forward¶
| Signal | Rollback | Hotfix Forward |
|---|---|---|
| Users actively impacted | Yes | — |
| Root cause is unclear | Yes | — |
| Fix is obvious and small (< 30 min) | — | Yes |
| Database migration ran (non-reversible) | — | Yes |
| Multiple commits since last good deploy | Yes | — |
Default to rollback when in doubt — restoring service is the priority.
Post-Rollback Verification Checklist¶
- [ ]
kubectl rollout status deployment/zkprova-zkprova-backend --timeout=120s - [ ]
kubectl rollout status deployment/zkprova-zkprova-frontend --timeout=120s - [ ] Run smoke tests:
bash scripts/smoke-test.sh <deploy-url> - [ ] Check application logs:
kubectl logs -l app=zkprova-backend --tail=50 - [ ] Verify database connectivity: hit
/healthand confirm"database": "ok" - [ ] Check for failed Kubernetes events:
kubectl get events --sort-by=.lastTimestamp | tail -20 - [ ] If a migration was part of the bad deploy, assess whether a reverse migration is needed
- [ ] Notify the team in Slack with: environment, rolled-back revision, reason, and next steps