Stability And Crisis Management

Stewart Brandt [BibRef-Brandt1995] describes "sheer layers" of change in a building that evolve at different rates. The foundation changes rarely; the plumbing and wiring change exhibit similarly seldom change; the wall paper and paint change quite a bit more frequently, and the interior decor is almost always in flux. Each one of these layers is part of the system we call a building. A crisis is perhaps a change that happens at a deep enough level to go beyond routine experience where it touches the structure of the building — or organization — but a shallow enough change that it doesn't stop the organization dead in its tracks. Moving the furniture wouldn't be construed as a crisis; fixing a leak in the plumbing would.

A crisis is almost always a surprise: an unforeseen glitch in the stability of the organization. What makes a crisis a crisis is that it upsets stability, and that it does so precipitously. (We sometimes talk of the "software development crisis," but something that has gone on for 30 years can hardly be called a crisis!)

In the same sense that you want to build a new organization on the stable core inside the existing organization, you want to hold the environment stable while you are making change. You don't want constantly to be changing the foundations. If the environment is noisy, and if the organization exhibits arbitrary behaviors, then you can never know whether a given change resulted in an improvement or made things worse, or neither! This is a fundamental principle of organizational change; it is one of the deepest principles of Deming's approaches to organization management [BibRef-Deming1986] and one finds it at the foundations of ISO process improvement methods.

The pattern approach to organizational improvement is attentive to the stable parts of an organization and attempts to detach itself from noisy, day-to-day variations. Part of this stability comes from attentiveness to the deep structure that ties to values and relationships; these tend to change less frequently than practices, policies, and processes. Part of this stability comes from role normalization.

Crises can and will arise, and some of the patterns (e.g., SacrificeOnePerson, DayCare) specifically address contexts with a crisis component. However, these crises are relatively small relative to the overall organizational structure and to the goals of the enterprise. Most of the patterns instead strive to head off crises; most of the scheduling patterns (e.g., CompletionHeadroom) are of this nature as are some of the structural patterns (FireWalls, EngageCustomers).
Software development, like mountain climbing, is an inherently risky undertaking. Yet there are two types of risks: there is risk that you won't reach the summit, or that your product is a flop in the marketplace. These are risks we must take; in fact, we not only take them, we enjoy these risks! We view them as opportunities rather than risks. On the other hand, though, there are risks that the whole undertaking will go to ground because we didn't plan for the weather, or more significantly, the team doesn't function well in the face of unforeseen difficulty (see, for example, [BibRef-Krakauer1997]). These are the true risks we must avoid.
There is no pattern--or pattern language--for risk management. Risk averseness is an emergent property of healthy organizations. As in Alexander there are no patterns for "safe house", safety is an emergent property of houses built around concepts of appropriately joined spaces that draw on human context. We have provided this one section on Crisis Management in the book to tie together this popular concern and to draw out the principles we believe address the concern. Most of the solutions to so-called crisis management are distributed through the patterns:
  • CompensateSuccess talks about the problem of rewarding people who excel under crisis situations
  • TheOpenClosedPrincipleOfTeams, above, talks about the danger of a crisis becoming a way of life (because in a culture where everything is a crisis, nothing is a crisis, and that's not healthy!)
  • The same section talks about the psychology of burnout, which is closely tied to crisis management styles.
  • StandUpMeeting takes a current crisis as its context.
  • TeamPerTask is one way of isolating crises.
There are some things to be aware of in crisis situations.

  • Contraction of organization and stronger management influence during crises. Consider this sociogram of a regression testing organization showing the relationships between roles under "normal" operations (connections between roles have been removed for clarity):
external image BaseRegVer_Diff1.gif

The arrows depict how the roles are displaced in the social network diagram when the organization goes into a crisis. Note that local control roles take over (Project Management, the Program Change Committee (PCC), Team Leader), while technical roles (like the Lab Architect) and even Line Management get out of the way. The process gurus (the Process Management Team or PMT) are among the first to go! Coupling between roles skyrockets under stress, and the organization's "diameter" sharply decreases. We find this is a typical pattern, and it is a great expediency as long as it does not become the norm for day-to-day business. Note that one way this is accomplished is through a StandUpMeeting, where managers get frequent status from everyone. That pattern highlights the dangers of allowing ongoing daily meetings to perpetuate the crisis mentality. Such meetings are fine for redressing short-term crises--but prolonged recurrence of very frequent status meetings can create a crisis mentality: a problem that every developer can relate to.
  • Suspension of the normal organization, replaced by artificial temporary one. For example, a "firefighting" team might be organized to deal with a sudden serious quality problem in the software. Note that firefighting can easily become a way of life: firefighting teams usually get special rewards, which make firefighting desirable. Before long, the team is lurching from one crisis to another. Instead, you want to isolate fire-fighting; see TeamPerTask.

Last, crisis can be a good thing. We don't believe that organizations should seek crises, but neither should they be so risk-averse that crises create fear. Crises create opportunities for learning; a postmortem of a crisis can sow seeds of great organizational learning.
  • Crises can create opportunities for large-scale culture changes. A perceived (and probably real) crisis at ParcPlace Systems precipitated a focused introspection and post-mortem exercise that led to organizational renewal [BibRef-Gabriel1996].
  • Crises create learning opportunities. A crisis in a healthy organization provides an "opportunity" for a retrospective [BibRef-Kerth2001].

In summary: crises can and will happen, and they provide opportunities for learning. The learning should drive the organization into a better sense of order and stability, but not to the point where other changes can't surprise the organization into learning again!