Complexity and Problem in the IT world

I have been going through the Beyond the Goal audio series by Eliyahu Goldratt on the Theory of Constraints and the Linux’s foundation Introduction to DevOps course by John Willis. These are excellent materials for process improvement and enterprise transformation. I highly recommend either of these materials if you want to learn more about enterprise transformation.

In all of these materials, the issue of large complicated systems is a common theme and each has similar perspectives on how to approach problems as they arise in these systems.

In the Beyond the Goal series, Goldratt talks about the definitions of complexity and problem. Goldratt uses these words to set the stage of how to view and interpret obstacles that get in the way of throughput and effectiveness. Once the definitions are laid out then the understanding of how Goldratt approaches transformation begin to start making sense.

Complexity, as defined by Goldratt, is a lot different than how I have understood complexity. Normally, the complexity would be defined as how many points we have in a system. The more stages, phases, handoffs, or steps we have, normally, the more complex something is. For Goldratt however,  the definition of complexity is how deterministic is a change in a system. The less deterministic the effects of a change, the more complicated a system is.

For example, if I am playing with a toy car and a wheel pops off, I can very easily say what the effects are and know that the car is going to be scraping against the ground on one side. On the other hand,  if I take a random bolt out of the engine on a real car the specific symptoms of the change are difficult to say for sure and at best will be some kind of generalization. Like if I take out the engine mount, then the engine harmonics will be off and the car will shake at 2k rpm’s (This actually happened to my car). But I cannot tell you the exact shake, what it will feel like on the road, or the other long-term damage that will result. A car is a complex system with many dependencies internally and externally.

This is all to say, a system is “complex” if the effects of a change to said system are difficult to predict.

Goldratt continues to say that problems don’t exist. In reality, there are no problems, the only problems in real life are from us not understanding reality. The molecules in your laptop don’t have a problem being a laptop, electrons flow in there, your monitor is on, and the only problems you might have are because of faulty human design. The universe exists without any real problems. The only problems we face are our own misunderstandings of reality. This is the same in human-made systems and organizations. We simply just don’t understand our own systems well enough and we don’t adapt to them and therefore cause problems.

In the Intro to DevOps course, Willis introduces John Allspaw’s thesis on the heuristics of resolving outages. In the thesis, Allspaw talks about a  problem were the result of a specific employee’s blog post that caused an error that resulted in a crazy disaster for Etsy at the time. This thesis is an excellent example of what complexity really means.

Most of the delay in resolving the issue was not from the issue its self but rather how the issue was understood. Large systems are often too big for any one person to understand, so synchronization between team members is vital. What adds difficulty, is that if problems are truly complex as Goldratt defines it, then finding the root cause is extremely difficult because the same actions can lead to different results. Everytime we make an action, we change the system, and so simple looking at the end results of the system at after a change no lead to different root causes. The unpredictability of the environment increases with each change and therefore becomes more complex.

What adds difficulty, is that if problems are truly complex as Goldratt defines it, then finding the root cause is extremely difficult because the same actions can lead to different results and as you try to correct the system your actions may be making it worse. And sometimes even having root cause does not necessarily mean reversing the action will correct the problems.

If any of you have been in an IT outage, you know that people jump from idea to idea, not really understanding the problem and maybe even changing the problem and not realizing it.

Allspaw breaks down the Heuristics of solving IT outages like this (This is put into my own words so take it with a grain of salt):

1. ) Look for recent changes in the System first. Since there is a lot of noise, looking for evidence that leads to the conclusion may not be easy to find. It is better to start at what variance was introduced and work from there.

2.) If you can’t tie a change to the systems irregular behavior, begin widening the search. Remember that systems are non-deterministic and finding the next step in the chain of events is not simple. You’ll often have to look places you didn’t expect.

3.) If you still can’t tie changes to irregular behavior then start verifying past diagnosis that you remember. Or, start disqualifying common/general diagnosis or recent ones that you may find similar.

4.) Because no one set of tests or any one person can understand the system as a whole fully, use your peers to verify ideas and solutions. Together we understand reality, alone we cannot. It’s feedback, good feedback that is, that leads us to the right answer.

Just like Goldratt describes large systems as complex, but through understanding via collaboration, we can tackle any problems and make them non-problems.

John Willis often talks about “non-deterministic thinking”. I am still working out the exact definition of this, but I believe what he is saying is something similar to Goldratt. The idea is that you don’t look at the world at just with one’s own understanding, you accept that most of life is a living and moving complex system. It’s understanding that buying roses for your wife/girlfriend today does not mean the same thing as buying roses tomorrow (because women are most certainly complex systems). It’s kinda like knowing that you don’t know anything, despite what experience and bias might tell you and that your experience is…biased.

Once you accept that nothing is set in stone,  you can begin to adapt to change. I think this is what Bruce Lee meant when he said: “Be like water”.







Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s