Zero-Downtime Castopod Upgrade on Kubernetes
Upgrading a production podcast platform without dropping a single listener connection.
The Challenge
Our Castopod instance at kastaspuods.lt needed an upgrade from v1.13.7 to v1.13.8. Requirements: – Zero downtime – listeners actively streaming podcasts – No data loss – database contains all podcast metadata and analytics – Include bug fix – v1.13.8 contains a fix we contributed for federated comments
The Strategy
1. Backup First, Always
Before touching anything, we ran a full backup using Borgmatic:
kubectl exec -n kastaspuods deploy/borgmatic -- borgmatic --stats
Result: 435MB database dumped, compressed to 199MB, shipped to Hetzner Storage Box.
2. Pin Your Versions
Our deployment was using castopod/castopod:latest – a ticking time bomb. We changed to:
image: castopod/castopod:1.13.8
Explicit versions mean reproducible deployments and controlled upgrades.
3. Rolling Update Strategy
The key to zero downtime is Kubernetes' RollingUpdate strategy:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
What this means:
– maxUnavailable: 0 – Never terminate an old pod until a new one is ready
– maxSurge: 1 – Allow one extra pod during rollout
With 2 replicas, the rollout proceeds: 1. Spin up 1 new pod (now 3 total) 2. Wait for new pod to be Ready 3. Terminate 1 old pod (back to 2) 4. Repeat until all pods are new
4. Apply and Watch
kubectl apply -f app-deployment.yaml
kubectl rollout status deployment/app --timeout=180s
Total rollout time: ~90 seconds. Zero dropped connections.
5. Post-Upgrade Verification
CodeIgniter handles most post-upgrade tasks automatically. We verified:
kubectl exec deploy/app -- php spark migrate:status
kubectl exec deploy/app -- php spark cache:clear
kubectl exec deploy/redis -- redis-cli flushall
The Result
| Metric | Value |
|---|---|
| Downtime | 0 seconds |
| Rollout time | ~90 seconds |
| Data loss | None |
| Backup size | 199MB compressed |
Lessons Learned
- Backup before everything – Takes 60 seconds, saves hours of panic
- Pin versions explicitly –
latestis not a version strategy - Use maxUnavailable: 0 – The single most important setting for zero-downtime
- Keep yaml in sync with cluster – Our yaml said 1 replica, cluster had 2
- Check upstream releases – Our bug report was fixed, no patching needed
The Bug That Got Fixed
We had reported Issue #577 – federated comments from Mastodon showed “Jan 1, 1970” due to a column mismatch in a UNION query. We patched it manually, reported upstream, and v1.13.8 includes the official fix.
Architecture
Traffic: Ingress -> Nginx (S3 proxy) -> Castopod:8000
|
MariaDB + Redis
Backup: Borgmatic -> mysqldump -> Borg -> Hetzner
kastaspuods.lt is a Lithuanian podcast hosting platform running on Kubernetes.