I’m currently running a Kanban board for an Infrastructure team and wanted to share techniques I’ve found useful but equally get other peoples advice and experience on running a team of DevOps engineers keeping infrastructure alive while also encouraging true DevOps and self-service.
Everyone’s favourite disruption to a nice planned board, but a necessary evil. Support for the team in question ranges from critical production issues to handover problems and general “how to” questons.
Compared to other development teams, an infrastructure team’s issues will be quite large, and there is definitely a preference to work on them one person at a time. Often taking several days or even weeks to complete.
With cloud based infrastructure the work seems to be more complex and require much more explanation, certainly if you want to achieve my personal goal (usually for Scrum stories) of “anyone can understand any story on a backlog”.
Ownership and Priority
Infrastructure does have lots of owners, from the business focussed engineer, to the head of infrastructure, even the CTO overall. This can prove a problem in maintaining an organised backlog (before we reach To Do) with different stakeholders wanting different tickets done sooner. The business is all about getting environments set-up while the Head of infrastructure wants heavy (and costly) technical debt resolved, often to help improve the former.
To aid with support we have a expedite process available for urgent issues, but in reality all normal work stops if there is a support emergency.
Strict Limit Work In Progress
To avoid the temptation to work on other things a low WIP limit has proved useful and strong messages to engineers to finish what they are working on. This also leads to good Daily stand-up dicusssion and task trimming (see below).
One of the biggest wins (for Understanding) has been to use Impact Mapping to almost translate the quite technical work from this team and show where it will have an impact. This could range from improving the speed development teams can deploy to making cost savings on the infrastructure. Maintaining an impact map is time consuming but valuable to help show and discuss with business stakeholder (sometimes without heavy infrastructure knowledge) where time is being spent.
A few Kanban purists might flinch at this one, but I am finding it useful to sometimes reflect on remaining work for a story and take a team decision to “trim off” the remaining work a new ticket(s) and complete the original task. This prevents stories staying in progress when they are effectively blocked by a situation unforseen at the start. For example, a business decision (a change of mind) not to risk deploying new DB to Customer X project in Production.