Lean Enterprise - by Joanne Molesky, Barry O'Reilly and Jez Humble - Part III
Posted on August 19th, 2016
This book covers a lot about transforming a traditional enterprise into a newer, more suited Lean based company. It covers from finances to portfolio management going throught adoption, technical practices and lot more.
The book is long. And tiring. But full of very valid useful bits of knowledge. I’ve covered Part I and Part II in previous posts.
This is about Part III which covers the exploit phase. Now that opportunities have been identified and evolved, they now need to be prepared to become mainstream. The general theme here is continuous “on-the-fly" learning. Be that for the process and the leadership team or actual application/system being developed.
Improvement Kata
The main learning of Part III is what the authors call Improvement Kata. The idea is fairly simple. You execute 4 steps repeatadly to improve your organization. It is done at all levels of the organization. The four steps are:
- Understand the direction or challenge
- Grasp the current condition
- Establish the next target condition
- Iterate toward the target condition
You can learn more about the improvement Kata at http://bit.ly/11iBzlY.
The goal is to use the Improvement Kata for both your organization as well as your product or service and work on improving constantly. If you’re talking about your organization, you’ll be thinking about the process goals/challenges/directions and where to go. If you’re thinking about your product or service, you’ll be looking at what problems it is trying to solve for your customers, why you’re not solving it right now and how to assess how close you’re getting to actually solving it.
Cost of Delay
When it comes to judging how to prioritize what should be worked on and when, the authors use the now famous Cost of Delay popularized by Donald Reinersten in his book The Principles of Product Development Flow (which will earn a blog post in the coming month or so). The idea behind the Cost of Delay is not that hard to grasp but it is hard to evaluate in practice.
The idea is easier in to understand with an example:
Let’s say you have three tasks, A, B and C.
Task A takes 3 weeks to be completed, costs USD 10 000, and will bring USD 1000 per week for the remaining weeks of the next 50 weeks.
Task B takes 6 weeks to be completed, costs USD 15 000, won’t bring anything but the company will incur USD 1500 of fine every week if not completed in 10 weeks.
Task C takes 5 weeks to be completed, costs USD 25 000, and will bring USD 7500 per week for the remaining weeks of the next 12 weeks.
If you consider only revenue, you’d choose task C first as it will bring you USD 52 500 (7 weeks - 12 minus 5 to complete - * USD 7500). Task A would bring USD 47 000 (47 weeks - 50 minus 3 to complete) and task B won’t bring anything.
If you consider profit, task A profits USD 37 000, B avoids a loss of USD 2000 per week after 10 weeks and C profits USD 27 500 if completed as soon as possible. So task A would be the choice.
Obviously something is missing, cost of delay is the answer from Reinersten. He argues we should look at what it costs us to delay each task. He suggests we analyze the outcome of completing tasks in every possible order.
If we completed tasks A, B and C in order, A would profit USD 37 000, B wouldn’t incur any losses (but cost USD 15 000) and C wouldn’t bring any revenue (but cost USD 25 000) therefore leaving us with a loss of USD 3000.
If we execute A, C then B, A gives us USD 37 000, C has a profit of USD 5000 and B incurs USD 6000 loss (plus USD 15 000 cost) leaving us with a profit of USD 21 000.
If we execute B, A, C, B doesn’t incur any extra loss (but costs USD 15 000), A profits USD 31 000 and C doesn’t bring anything (but costs USD 25 000) leaving us with a loss of USD 9 000.
If we execute B, C, A, B doesn’t incur any extra loss (but costs USD 15 000), C incurs a loss of USD 17 500, A profits USD 26 000 so we end up with a loss of USD 6500.
If we run C, A, B, C profits USD 27 500, A profits USD 32 000 and B incurs an extra loss of USD 6000 leaving us with a USD 38 500 profit.
Finally running C, B, A, C still brings USD 27 500, B incurs an extra USD 1500 loss, and A profits USD 26 000, we end up with a profit of USD 27 000.
As a consequence, we should obviously execute task C first and then B and A as this maximizes our profit. Had we not taken the time to calculate the cost of delaying each task, we would have made the wrong choice on what task to handle first.
Having said that, we still don't know what the Cost of Delay is. And it's not easy to get that single number depending on your scenario. For Task B, the cost of delay is obvious: USD 1500 per week after week 10. For Task A, it's USD 1000 per week starting immediately. For C, it's a USD 7500 per week until week 12 and then it drops to 0 but incurs the full cost. My suggestion is to put as much effort as you can to understand the costs of delaying your tasks. If you cannot fully quantify them as of right now, estimate it and move along. As you think about it more frequently, it’ll be easier to quantify it.
Note that we have not calculated the options in which we don't even execute task C. In some cases that is a viable options and might be the best one. In our example, if we only executed A and B, we would end up with a USD 22 000 profit and doing B and A would result in a USD 16 000 profit so neither of those options are better than our C, B, A alternative but you can probably find a scenario in which A, B is the best option (and a little harder but also one where B and A is the best).
Cost of Delay divided by duration (CD3)
The calculation method I provided previously for cost of delay is a great tool to understand how to prioritze work but the reality is that there are better and easier ways to calculate the cost of delay. When you do use those alternatives, Cost of Delay just tells you, for a given task, how much it costs to postpone that task. Unfortunately, that doesn't take into account how much time the task takes to be completed (which affects the outcome).
That's why the authors talk about Cost of Delay divided by duration (CD3). This composed metric takes into consideration the time is takes to complete a given task and therefore, makes it obvious to choose betwen two tasks that have the same cost of delay but in which one takes less time than the other to complete. In such case, you always want to complete the shortest task first so that you start earning value sooner rather than later.
CD3 is, therefore, the prefered prioritization metric used to decided what should be explored first and why.
Value Steam Map
Given a couple of tasks, thanks to CD3, you can now prioritize the most valuable tasks to be worked on. However, that doesn't help you understand which tasks you should be even looking at. A Value Stream Map (VSM) is a useful tool to help you understand, at an organization level, where you can start looking at tasks.
Creating a VSM is not that hard. The first step is to work collaboratively with multiple groups in your organization and identify every step needed by the organization to deliver value to its customers. Once you have a map that shows you how customer value is delivered in your organization, work to add three metrics to each step of your map:
- Lead Time (LT) - This measures how long it takes between the time an idea/work reaches your organization/step and the time when that idea's value is delivered to customers/next step
- Process Time (PT) - Process times shows you how long it takes for a piece of work to be completed afer it reaches a given step
- Percent complete and accurate (%C/A) - This gives you an understanding of, for any given step, how often does the work exits that step but then returns to it because of a problem further down the line
A VSM should help you understand where your process and systems bottlenecks are and how to focus your attention to identify tasks for improvements.
Continuous Delivery
A VSM is great at identifying what parts of your process need to be worked on in order to achieve better outcomes. Often, a big chunk of that work lies in your ability to reduce the friction between writing software (code) and ensuring that this software attends the actual needs of your customers and is avaiable for them to use.
Continuous Delivery helps address this common bottleneck. Your organization might be a purely software oriented organization or an organization that provides software as a way to address non-software related needs. Regardless of your situation, it is very likely that your delivery is a considerable bottleneck in achieving the results needed. More so, delivering whenever it makes sense from a business perspective allows your organization to provide its customers with partial changes in order to validate what has been built actually addresses your customers needs.
There are very many pieces to allow for Continuous Delivery to become a reality and the book of the same name (which also deserves a post to come within the year) provides a lot more when it comes to why and how to get it done. One of the authors of this book also authored the Continuous Delivery one so the synergies are pretty clear.
Getting to results
A VSM allows us to understand where in the process layed the value. CD helps us remove the friction between that understanding and actually delivering it. Now we need to make sure what we deliver has the impact we hoped it would. To do so, the authors offer two techniques.
The first one is super simple and well known. Using why questions, navigate to the root cause of the issue and ensure that whatever solution is found addresses the root cause. If the anwer to the why question is not beyong the scope of your organization (as in, your organization can still change that), keep asking why. When you find an answer that goes beyong your organization's power/scope, get the last answer and it represents your actionable root cause. Changing that root cause state to your desired condition becomes your target condition. How that condition is achieved should be irrelevant to the organization so long as it is achieved (and is legal).
The second technique is less known but comes from a famous author of the field: Gojko Adzic. Gokjo recommends an acitivity where all stakeholders work together to describe a mind map in which the center represents the problem as described by stakeholders originally and then slowly grown (collaboratively) to answer why that problem should be solved, who can solve it, how that may happen and what will solve it.
Hypothesis Driven Development
Now that we understand the root cause of the problem we're trying to solve, we need to understand if what we're actually thinking of building will, in fact, address that root cause. The proposed approached in the book is what is now called as Hypothesis Driven Development. It is highly similar to the scientific method used so commonly in physics, chemistry and so many other "actual" sciences. Overall, the idea is that you describe a cause-effect relationship and set yourself up to test whether or not you can validate that you only get the effect if the cause is present and that having only the cause is sufficient to get the effect.
When it comes to software development, Jeff Gothelf and Josh Seiden offered a template that works for them in their book Lean UX and the authors refer in the book:
We believe that [building this feature]
[for these people]
Will achieve [this outcome].
We will know we are successful when we see [this signal from the market].
One of the easiest ways to eliminate hypothesis is by actually relying on other people's work or to understand your costumers without having to actually talk to them any single time. That's where your user research becomes key. The authors present a quadrant of types of user researches from Janice Fraser that I found very useful:
An important reminder at this stage of the process is that you're still trying your best to learn as much as possible. Regardless of the level at which you're trying to push your organization through, you need to keep in your mind (and others) that the goal to any experiment is to reduce uncertainty by the maximum amount possible with the smallest possible cost. That means that you're looking for experiments which balance the ratio between the amount of information obtained (quality of the information and quantity) and cost of preparation to obtain that information.
A few good ideas to reduce experiment costs while maximizing amount of information obtained are:
- 80/20 rule: handle 80% of the cases with 20% of the work and ignore the corner cases
- Don't build for scale. When it comes to information, you'll get variability of data points within your first 1000 entries. No need to prepare for millions or billions data points.
- A consequence of the 80/20 rule is to not handle cross-browser needs. Idenitfy your target audience and build only for them. You can handle the rest once you've proven it is worth investing in it.
- Another one that is a bit more controversial is to not worry about having a significant test coverage. You want the common target case to work but it's fine if there are a few scenarios in which you get an error. You're not trying to actually develop the functionality, you just want to verify it yields the outcome you expected.
Handling operations
Now that you've managed to get experiments running and you know why you're pursuing them, you're bound to find some things that work and should be kept running. The authors mention Amazon a lot here as a good example on how to make operations work in a Lean Enterprise.
The recommendations are quite famous by now. The first one is the "two pizzas" team rule. You should be able to feed any single team with two pizzas and no more. This ensures that the team is small enough that they are very aligned and communication overhead is kept to a minimum.
With such small teams, you might end up with a lot of teams. It all blends and loses power if those teams cannot accomplish anything on their own. To avoid this scenario, Amazon enforces that every team must provide APIs for their systems that will allow anyone (within or without Amazon) to integrate with them. More so, such integration should be possible without the need for a discussion between teams. APIs should be documented enough that anyone can find out how to integrate on their own.
Finally, Werner Vogels (Amazon's CTO at the time of the writing) enforces that "You build it, you run it". This means that each of those teams is also responsible to ensure that the services they have built are available, responsive and achieve the business outcome they desired.
If we look at those three recommendations, they make heavy use of the effects of Conway's Law to ensure that teams match the desired systems boundaries. Instead of letting the teams shape the systems boundaries, those rules make it so that the system boundaries shape the teams (which will reinforce those) in a way that ensures the desired business outcome.
The authors point out that there are couple more pre-requisites for this to actually work. The first one is to enable teams to thrive. Daniel Pink's, now famous, work on the basis to allow for super performant individuals and teams is core to the argument here. The teams that are shaped to match desired system boundaries need to be able to decide on their own how achieve their goals (autonomy). They also need to have the skills to do it (mastery) so that the tasks are achievable. Finally, they need to really understand what the goal is. Not how to get to that goal, they can find that out on their own if they have autonomy and master, but where they really want to go (purpose).
This last point is crucial and hard to get right. This is now the most important responsibility for the senior team: to align rewards with the desired outcomes. Similarly to experiments, it should not matter how the outcome is achieved. The idea is that the teams are most likely going to find smarter faster ways to reach the desired outcome than a plan laid out by leadership.
Moving from current state to new architecture
Hopefully all the arguments the authors have laid out make sense. If they do, you will find yourself with a problem: how do we transition from our current monolithic system and rigid processes to the new way of architecting organizations and systems.
The pattern is common when it comes to software and will sound scary if you think about people and organization but the idea still holds. The pattern is called Strangler. When it comes to systems, the idea is that you build a new version in a new architecture that replaces a piece of the old system. Once happy with the new version, you slowly direct users to the new version instead of the old one until you have fully diverted the usage to the new system. You can then remove the ununsed piece of the old system.
A similar approach works for organizations. You slowly change a piece of the process and a group of people to work differently. Once they start showing results, you expand the change to other groups slowly until everyone has moved to the new processes.
Part III is pretty dense because it handles one of the longest periods in an organization's life. The majority of the work will be in the exploit phase and that's where actual sizable value is returning to the organization. Ultimately, part III probably describes more so what daily life in a Lean Enterprise will feel life once you get there. In a few weeks, I'll tackle the last part of the book, Part IV, which covers how to transform your whole organization. What needs to change and where and when.