Kanban for Infrastructure team

paul.cutting · 2017-11-13 21:04:23 UTC

I’m currently running a Kanban board for an Infrastructure team and wanted to share techniques I’ve found useful but equally get other peoples advice and experience on running a team of DevOps engineers keeping infrastructure alive while also encouraging true DevOps and self-service.

The Challenges

Support

Everyone’s favourite disruption to a nice planned board, but a necessary evil. Support for the team in question ranges from critical production issues to handover problems and general “how to” questons.

Large Issues

Compared to other development teams, an infrastructure team’s issues will be quite large, and there is definitely a preference to work on them one person at a time. Often taking several days or even weeks to complete.

Understanding

With cloud based infrastructure the work seems to be more complex and require much more explanation, certainly if you want to achieve my personal goal (usually for Scrum stories) of “anyone can understand any story on a backlog”.

Ownership and Priority

Infrastructure does have lots of owners, from the business focussed engineer, to the head of infrastructure, even the CTO overall. This can prove a problem in maintaining an organised backlog (before we reach To Do) with different stakeholders wanting different tickets done sooner. The business is all about getting environments set-up while the Head of infrastructure wants heavy (and costly) technical debt resolved, often to help improve the former.

Techniques

Expedite track

To aid with support we have a expedite process available for urgent issues, but in reality all normal work stops if there is a support emergency.

Strict Limit Work In Progress

To avoid the temptation to work on other things a low WIP limit has proved useful and strong messages to engineers to finish what they are working on. This also leads to good Daily stand-up dicusssion and task trimming (see below).

Impact Mapping

One of the biggest wins (for Understanding) has been to use Impact Mapping to almost translate the quite technical work from this team and show where it will have an impact. This could range from improving the speed development teams can deploy to making cost savings on the infrastructure. Maintaining an impact map is time consuming but valuable to help show and discuss with business stakeholder (sometimes without heavy infrastructure knowledge) where time is being spent.

Task trimming

A few Kanban purists might flinch at this one, but I am finding it useful to sometimes reflect on remaining work for a story and take a team decision to “trim off” the remaining work a new ticket(s) and complete the original task. This prevents stories staying in progress when they are effectively blocked by a situation unforseen at the start. For example, a business decision (a change of mind) not to risk deploying new DB to Customer X project in Production.

andycleff · 2017-11-13 22:12:55 UTC

@Colleen paging ms colleen, white courtesy (kanban) phone please…

andycleff · 2017-11-13 22:16:14 UTC

Do you ever track the cost of an expedite class of service? It can be dramatic… Daniel Vacanti dives in deep in his book on the topic: https://www.amazon.com/Actionable-Agile-Metrics-Predictability-Introduction-ebook/dp/B013ZQ5TUQ

andycleff · 2017-11-13 22:17:58 UTC

Um, we don’t allow those sorts of people around our campfire…

Pragmatists however are always welcome

paul.cutting · 2017-11-14 10:51:11 UTC

We do report how long overall BAU work takes vs Tech Debt and Improving Platform. But do not track Expedite specifically. I’ll take a look at book recommendation…thanks!

andycleff · 2017-11-14 11:48:31 UTC

One of the interesting topics that Vacanti discusses is the impact that expedited class of service work has on all the standard class work in the queue.

plot spoiler: teams should seriously consider refusing expedited work by policy and principle.

kschlabach · 2017-11-14 14:01:09 UTC

We use scrum for our product teams and kanban for our support and performance testing teams (I’m pushing to embed these folks into the product teams… but until then…). In those scenario’s, we use a different ticket type (Jira system) for support fires than our “normal planned” work. This makes it really easy to reflect back and sort on the ratio of disruptions vs. strategic work and facilitate both retrospectives and conversations upward as needed.

Colleen · 2017-12-02 20:33:05 UTC

@paul.cutting do you have a limit for the number of items that can be expedited at any given time or does this count against your WIP? I often do a +1 (so only one item can ever be expedited at a time) but it still has a big impact as @andycleff mentions. You could argue that if you reprioritize each column every day that that would ultimately have the same effect.

paul.cutting · 2017-12-04 15:06:40 UTC

@Colleen Yes, just one issue is allowed in expedite, which has worked for us so far, but we may have to look at this again as more projects (which need supporting) go live.