Hello 👋
Welcome to another week — and another opportunity to grow into a strong, confident DevOps, Infrastructure, or Platform Engineer.
Today’s issue is brought to you by The Engineering Ladder — where we share practical, career-shaping lessons in DevOps and Software Engineering to help you level up with clarity and direction.
💡 PS: Before we dive into today’s topic, I want to quickly share something important with you…
If you’ve been following The Engineering Ladder, you already know one thing I believe deeply:
👉 Real tech careers are built on evidence, not just interest.
That belief is exactly why we built CloudOps Academy.
CloudOps Academy is a hands-on training program for DevOps Engineers, Infrastructure Engineers, and Platform Engineers who want more than theory.
We focus on helping engineers build real systems, understand how production environments work, and gain the confidence to perform in real roles — not just pass interviews.
At CloudOps Academy, you don’t just “learn tools.”
You learn how to:
✅ Design and operate real cloud infrastructure
✅ Work with Docker, CI/CD, monitoring, and automation the way teams do in production
✅ Think like a reliability-focused engineer, not just a script writer
✅ Build projects you can confidently explain in interviews
✅ Grow from uncertainty to clarity with structured guidance and mentorship
Our goal is simple:
to help you become job-ready, confident, and credible as an engineer.
If you’re serious about building a strong DevOps or Cloud career — and you want guidance from engineers who are actively working in the field — we’d love to talk.
📞 Phone: +237 653 583 000
📧 Email: [email protected]
No pressure.
Just clarity on whether CloudOps Academy is the right next step for you.
Now, let’s get into today’s lesson 👇
Three engineers. Three different companies. Three different deployment strategies.
All three had the same goal — ship new code to production without breaking anything.
The first engineer worked at a payment processing company. They deployed a new pricing calculation feature using a rolling deployment. Halfway through the rollout, they realized the new pricing logic had a bug. But by then, half their servers were already running the broken version. Some users got correct prices. Some got wrong ones. Reconciling the mess took two days.
The second engineer worked at a food delivery startup. They pushed a new order assignment algorithm using a blue-green deployment. The new version had a memory leak nobody caught in testing. Within four minutes of switching all traffic over, response times spiked across the board. They switched back in 30 seconds. Zero users were affected for more than a few minutes.
The third engineer worked at a social media company. They rolled out a new feed ranking algorithm using a canary deployment. They sent 2% of traffic to the new version first. Engagement metrics dropped. They caught it before 98% of their users ever saw the change. They pulled back, fixed the algorithm, and tried again the following week.
Same goal. Very different outcomes.
The difference was not the quality of the code. It was the choice of deployment strategy.
Today we are going to break down all three — what they are, how they work, when to use each one, and what breaks them.
Why Your Deployment Strategy Matters More Than You Think
Most engineers think about deployment as the last step — the thing you do after the real work is done.
That thinking is what causes production incidents.
Your deployment strategy determines:
How many users get hurt if something goes wrong
How fast you can recover when it does
Whether you can catch problems before they reach everyone
How confident your team feels shipping code
A bad deployment strategy does not just cause downtime. It slows down your entire engineering culture. Teams become afraid to ship. Features sit in staging for weeks. Everyone dreads Fridays.
A good deployment strategy does the opposite. It gives your team confidence. It makes shipping feel safe. And safe shipping means faster shipping.
Let us look at each strategy clearly.
Strategy One: Rolling Deployment
What it is
A rolling deployment replaces old instances of your application with new ones gradually, one at a time or in small batches. At any point during the deployment, some instances are running the old version and some are running the new version.
Think of it like replacing the crew on a ship while it is still sailing. You swap one crew member at a time. The ship never stops. But for a period, you have a mixed crew — some old, some new.
How it works in practice
Imagine you have six servers running version 1.0 of your application. You want to deploy version 2.0.
A rolling deployment does this:
Start: [v1] [v1] [v1] [v1] [v1] [v1]
Step 1: [v2] [v1] [v1] [v1] [v1] [v1]
Step 2: [v2] [v2] [v1] [v1] [v1] [v1]
Step 3: [v2] [v2] [v2] [v1] [v1] [v1]
Step 4: [v2] [v2] [v2] [v2] [v1] [v1]
Step 5: [v2] [v2] [v2] [v2] [v2] [v1]
Done: [v2] [v2] [v2] [v2] [v2] [v2]
Each step removes one old server and adds one new one. Traffic is always being served. No downtime.
With Docker Swarm — which we covered last week — this is exactly what happens when you run:
docker service update \
--image your-registry/app:v2.0 \
--update-parallelism 1 \
--update-delay 15s \
--update-order start-first \
your-appWhen rolling deployment works well
Your application is stateless — meaning a user can be served by any instance without losing context
Old and new versions can safely run alongside each other
Your team deploys frequently and wants a simple, low-overhead process
You have a moderate number of servers — not so many that the rollout takes hours
What breaks it
The payment company story at the top of this article is the perfect example.
Rolling deployments fail badly when old and new versions are not compatible with each other. If your new code changes an API response format, some users get the old format from old servers and some get the new format from new servers. Depending on your application, this inconsistency can cause real problems — especially in systems where sessions, carts, or state are involved.
Rolling deployments also make rollback harder. You cannot instantly go back to v1.0 because some servers are already on v2.0 and some are not. You have to roll the roll back — which takes the same amount of time as the original deployment.
Rollback strategy
docker service rollback your-appSwarm reverses the process — replacing v2.0 instances with v1.0 instances one at a time. It works, but it is not instant.
Strategy Two: Blue-Green Deployment
What it is
Blue-green deployment keeps two identical production environments running at all times. One is called blue — it is the current live environment serving all your users. The other is called green — it is the new version sitting ready, fully deployed, fully tested, but not yet receiving any traffic.
When you are ready to go live, you flip a switch — usually at the load balancer level — and all traffic moves from blue to green instantly. Blue stays running but goes idle.
If something goes wrong, you flip the switch back. All traffic returns to blue. Green goes idle again. The entire rollback takes seconds.
How it works in practice
Imagine you are using Nginx as your load balancer. Your blue environment runs on servers with IP addresses in the 10.0.1.x range. Your green environment runs on 10.0.2.x.
Your Nginx config normally looks like this:
upstream app {
server 10.0.1.1:3000;
server 10.0.1.2:3000;
server 10.0.1.3:3000;
}You deploy v2.0 to your green servers. You test it thoroughly — real smoke tests, real health checks, real data. When you are satisfied, you update Nginx:
upstream app {
server 10.0.2.1:3000;
server 10.0.2.2:3000;
server 10.0.2.3:3000;
}Reload Nginx:
nginx -s reloadTraffic moves to green. The switch is instant. Blue is still running, ready to receive traffic again the moment you need it.
This is exactly how the food delivery startup in our opening story recovered in 30 seconds. The blue environment never went away. It was sitting there, fully healthy, waiting.
When blue-green works well
You need the ability to roll back instantly — not gradually
Your deployment carries significant risk — a major refactor, a new payment integration, a database schema overhaul
You want to test the new version with real production infrastructure before anyone uses it
Your application handles stateful sessions or carts where version inconsistency would cause real problems
What breaks it
Cost. Blue-green requires you to run two full production environments simultaneously. For small teams on a budget, doubling your infrastructure cost — even temporarily — is not always realistic.
Database migrations. If your new version requires a database schema change, you have a problem. The green environment needs the new schema. But if you roll back to blue, blue needs the old schema. The database is shared between both. You cannot have it both ways.
The solution is the same expand-and-contract pattern we covered last week. Add new columns as nullable before the deployment. Remove old columns only after green is confirmed stable and blue is decommissioned.
Sessions. If a user logs in on blue, their session might live in blue's memory. After the traffic switch, their next request goes to green — which has no record of that session. They get logged out.
Solve this by storing sessions in a shared external store — Redis, for example — that both blue and green can access.
Rollback strategy
# Point traffic back to blue in Nginx
nginx -s reloadInstant. The fastest rollback of any strategy.
Strategy Three: Canary Deployment
What it is
A canary deployment sends a small percentage of your real production traffic to the new version while the majority of users stay on the old version. You watch metrics carefully. If everything looks good, you gradually increase the percentage until the new version is serving everyone. If something looks wrong, you pull the canary back and nobody else is affected.
The name comes from coal miners who used to carry canaries into mines. If the air became toxic, the canary would show signs first — giving miners time to get out before the poison reached them. Your canary deployment is the same idea. A small slice of traffic hits the new version first. If it reacts badly, you catch it before it hits everyone.
How it works in practice
You have your application running in production. You deploy v2.0 alongside v1.0. You configure your load balancer to send 5% of traffic to v2.0 and 95% to v1.0.
Using Nginx, this looks like:
upstream app {
server 10.0.1.1:3000 weight=95; # v1.0
server 10.0.2.1:3000 weight=5; # v2.0 canary
}You watch your dashboards for 30 minutes. You look at:
Error rates — is v2.0 throwing more errors than v1.0?
Latency — are requests to v2.0 slower?
Business metrics — are users on v2.0 completing checkouts at the same rate?
If everything looks good, you increase the canary to 20%, then 50%, then 100%:
upstream app {
server 10.0.1.1:3000 weight=50; # v1.0
server 10.0.2.1:3000 weight=50; # v2.0
}If something looks wrong at any point:
upstream app {
server 10.0.1.1:3000 weight=100; # back to v1.0
server 10.0.2.1:3000 weight=0; # canary pulled
}The social media company in our opening story caught their feed algorithm problem at 2% traffic. Only 2 out of every 100 users ever saw the broken version. The other 98 never knew anything happened.
When canary works well
You are making changes that affect user behaviour — UI changes, algorithm changes, pricing changes
You want real production validation before committing to a full rollout
Your team has monitoring and dashboards good enough to catch subtle problems quickly
You are deploying to a large user base where even a small percentage represents meaningful signal
What breaks it
Thin traffic. If you only have 500 users, 5% canary means 25 people. That is not enough signal to tell you anything meaningful. Canary deployments work best when your traffic is large enough that a small percentage still represents real usage patterns.
Complex session state. Like rolling deployments, canary means two versions run simultaneously. Users can jump between versions across requests if they are not pinned to one. This can cause inconsistency in stateful applications.
Solve this with sticky sessions — the load balancer sends the same user to the same version for the duration of their session.
Missing metrics. If you do not have dashboards and alerts set up before you run a canary, you are flying blind. The power of canary is the ability to observe and compare. Without observability, it is just a partial rollout with no safety net.
Rollback strategy
Set the canary weight to zero in your load balancer. Instant. Only a small percentage of users were ever affected.
Choosing the Right Strategy
Here is a simple way to think about it:
Use rolling when you deploy frequently, your versions are compatible with each other, and you want low infrastructure overhead. It is your everyday workhorse.
Use blue-green when the stakes are high, you need instant rollback, and you can afford to run two environments temporarily. Use it for your most important releases.
Use canary when you are making changes that affect how users behave, you have good monitoring, and you want real production feedback before committing. Use it when you are not sure how users will respond to a change.
Many mature teams combine all three. They use rolling for routine deployments. They use blue-green for major releases. They use canary for product changes where user behaviour is the unknown.
This Week’s Challenge
✅ Look at how your team currently deploys. Which strategy are you using — even if you never called it by name?
✅ Think about the last time a deployment caused a problem. Which strategy would have caught it earlier or made the rollback faster?
✅ Pick one application you own. Write down which strategy you would choose for it and why. Consider: how stateful is it, how large is your user base, how fast do you need to roll back?
You do not have to change everything today. But knowing which tool to reach for — and why — is what separates engineers who deploy with confidence from engineers who deploy with anxiety.
Final Thoughts
Blue-green, canary, rolling — these are not just deployment patterns. They are risk management tools.
Every time you deploy, you are making a bet. These strategies are how you control the size of that bet. Rolling lets you deploy gradually. Blue-green lets you escape instantly. Canary lets you test on real users without risking all of them.
The best engineers do not pick one strategy and use it for everything. They understand all three deeply enough to know which one fits the situation in front of them.
That judgment is what makes deployment feel less like a gamble and more like a craft.
Deploy with a plan. Recover with confidence. Ship without fear.
Ship confidently. Deploy quietly. Keep production alive.
PS:
At CloudOps Academy, we help engineers make this exact transition — from uncertainty to clarity — through hands-on training, real systems, and structured mentorship.
If you’re ready to move beyond theory and start building real DevOps skills, reach out:
📞 +237 653 583 000
📧 [email protected]
P.S. If you found this helpful, share it with a friend or colleague who’s on their DevOps or Software engineering journey. Let’s grow together!
Got questions or thoughts? Reply to this newsletter-we’d love to hear from you!
See you on Next Week.
Looking for structured, expert-led mentorship to accelerate your Cloud or DevOps career?
Visit consult.akumblaiseacha.com — where I work 1:1 with aspiring and experienced tech professionals to help them build real skills, grow their career, and land the opportunities they deserve.
From personalized career roadmaps and hands-on project guidance, to interview prep, LinkedIn positioning, and job search strategy — everything is tailored to your specific goals and timeline.
No cohorts. No pre-recorded content. Just direct, focused mentorship from a Senior DevOps Engineer with years of real-world, production experience.
👉 Book your session today → consult.akumblaiseacha.com
Join Whatsapp Community here:
Weekly Backend and DevOps Engineering Resources
The DevOps Career Roadmap: A Guide to Becoming a World Class DevOps Engineer by Akum Blaise Acha
API Versioning 101 for Backend Engineers by Akum Blaise Acha
System Design 101: Understanding Database Sharding by Akum Blaise Acha
Why Engineers Should Embrace the Art of Writing by Akum Blaise Acha
From Good to Great: Backend Engineering by Akum Blaise Acha
System Design 101: Understanding Caching by Akum Blaise Acha


