CHAPTER 2 Succeeding and Failing on Constraint Management
… Then, yielding to our intellectual onset, the gates of the Sixth Dimension shall fly open; after that a Seventh …
—A. Square, in Flatland: A Romance of Many Dimensions,
by Edwin A. Abbott (1884)
Skeptics and reluctant learners abound—and your authors have more than once been included in that number. Organizations ask, “What’s the ROI?” and the team member asks “What’s the WIIFM?” These are both good questions. We begin, therefore, with two case studies illustrating the payoff to be had by understanding a project’s hierarchy of constraints.
These two case studies both involve NASA. The differing outcomes have little to do with luck, as some might contend (although being lucky matters—Napoleon demanded it of his generals). Rather, they have a great deal more to do with how the project teams understood their missions and how they leveraged what flexibility they could find.
EFFECTIVE LEVERAGE: HOUSTON, WE HAVE A PROJECT
May you live in interesting times.
—Eric Frank Russell (as Duncan H. Munro)
in Astounding Science Fiction (1950)
(often cited incorrectly as an ancient Chinese curse)
Case 1: Apollo 13 CO2 Exchanger, NASA Houston, Project Managers
Sudden, drastic changes in project constraints—loss of a resource, change in a delivery date, or changes in client expectations—will affect the hierarchy of constraints on a project. Stark changes, caused when a project migrates from one dimension to another, require an immediate and wholesale change in thinking at all levels of the project team. Consider how the constraints hierarchy drove the project before the change and how it was rethought postimpact.
Failure Is Not an Option
In discussing preparation of the movie Apollo 13, Flight Dynamics Officer Jerry C. Bostick had this to say about his meeting with the script writers:
In preparation for the movie [Apollo 13], the script writers, Reinart and Broyles, came … to interview me. One of their questions was, “Weren’t there times when everybody, or at least a few people, just panicked?” My answer was, “No, when bad things happened, we just calmly laid out all the options, and failure was not one of them. We never panicked, and we never gave up on finding a solution.” I immediately sensed that Broyles wanted to leave and assumed that he was bored with the interview. Only months later did I learn that when they got in their car to leave, he started screaming, “That’s it! That’s the tag line for the whole movie, Failure is not an option. Now we just have to figure out who to have say it.” Of course, they gave it to the [Gene] Kranz [Mission Director, played by Ed Harris] character, and the rest is history.
“Oops” must be the word we least want to hear on high-risk projects. What follows will almost never be good news. Such was the case at exactly 55 hours, 54 minutes, and 53.555 seconds into the mission of Apollo 13. This project underwent an instantaneous transfer from the Performance/Time/Cost dimension into the Time/Cost/Performance dimension when oxygen tank number 2 on board the spacecraft exploded.
You probably won’t see the American Movie Classics channel run a festival of “Great Project Management Movies” any time soon, but if they did, Ron Howard’s motion picture Apollo 13, based on the real-life story, would be a natural candidate. Faced with a potentially disastrous accident, project teams overcome one potentially fatal barrier after another to bring the crew safely back to Earth (see Figure 2-1), guided by mission director Gene Kranz’s mantra, “Failure is not an option.”
But of course failure is an option. Sometimes, it looks like the most likely option of all. The odds in the actual Apollo 13 disaster were stacked against a happy outcome, and everyone—including Gene Kranz—had to be well aware of that fact. At the same time, however, letting the idea of failure into your mind can be a psychological trap that leads you to premature surrender. We’ve heard the fable of the two frogs who fell into a pail of milk. One, realizing he could not jump out, surrendered to the inevitable and drowned. The other, refusing to admit defeat, continued to paddle and in the process churned the milk into an island of butter, where he rested until the farmer came, found him, and threw him out. How do you balance the value of realism against the value of optimism in solving problems? As we’ll shortly see, one way is to reject the false dilemma the question poses. To see how we do this, let’s break down the problem.
Fig. 2-1. Apollo 13 Mission Control, Houston, 1970. Project management team hard at work.
Within the overall project “get the astronauts home safely,” there are a number of subprojects, including the following:
•Develop a power-up sequence that draws fewer than 20 amps
•Calculate a burn rate to get the reentry angle within tolerance using the Earth in the capsule window as the sole reference point
•Design a way to fit the square command module CO2 [carbon dioxide] scrubber filter into the round Lunar Excursion Module (LEM) filter socket
That last subproject was vital, because the LEM’s CO2 scrubbers are meant to take care of the needs of two people for a day and a half, not three people for three days. And nobody ever imagined that the command module scrubbers would need to be used in the LEM, so they were not designed to be compatible. They’re square, and the necessary holes are round. Meanwhile, the CO2 levels have gone up past 8, and at 15 things become dangerous, and eventually deadly. Gene Kranz assigns a project team, saying, “I suggest you gentlemen invent a way to put a square peg in around hole—rapidly.”
As the engineers gather in a conference room, boxes of miscellaneous junk—everything that’s loose on board the spacecraft—are being dumped onto tables. The project engineer gestures at it and says, “We have to make this [square filter] fit into the hole for this [cylindrical filter] using nothing but that [miscellaneous junk].” The engineers dive in with the right attitude, but for all they know, there isn’t even a solution present on the table. If they’re one 20¢ screw short of what they need, it might as well be a $20 million screw, because either way, they can’t have it.
Now we face the psychological dilemma of “failure is not an option.” We understand that a commitment to success is a way to improve the likelihood of achieving it—giving up too easily increases the risk of failure—but that’s not quite enough. How else can we increase the odds of our success? Our suggestion is to reject the false dilemma, and to do this, we use the triple constraints. Failure is not only an option—it’s a gateway to success … if you fail in the right dimension.
Why “Good Enough” Matters
How good is “good enough”? For a lot of project managers, the answer is to dismiss the question by saying, “good enough isn’t!” By rejecting the very concept of “good enough,” these managers strive to take projects to a higher order of excellence, to bring the goal of quality front and center into the discipline of project management, and to motivate people to achieve their absolute best. These are noble goals.
Nevertheless, the question demands an answer. If good enough isn’t, then clearly “good enough” is incorrectly defined. It’s a serious question: How good is good enough? In this project, three lives are riding on a performance outcome, and the clock is ticking. If perfect performance takes too long, can we afford perfection? What level of performance will be satisfactory as long as we can achieve it by our deadline?
This doesn’t imply that we intend to squeeze by with minimum performance—not at all. We plan to do our absolute best. In fact, the reason for rejecting the idea of good enough is to demonstrate our commitment to excellence and quality. But every golfer needs to know what par is, even if he happens to be Tiger Woods.
Defining Quality
Pontius Pilate asked, “What is truth?” and arguments still rage among philosophers. The project manager’s equivalent question is “What is quality?” The PMBOK® Guide definition, taken from the American Society for Quality, is “the degree to which a set of inherent characteristics fulfill requirements.”
In general, ideas about quality are classified as judgmental (synonymous with superiority or excellence, also known as a “transcendent” view of quality), product-based (linked to specific and measurable variables, such as the chip speed of a computer), user-based (determined by what the customer wants or fitness for intended use—for example, if you’re off-roading, a Jeep is superior to a Cadillac, but if you plan to run a luxury limousine service, the Cadillac is superior), value-based (the ratio of usefulness, satisfaction, and other factors to price), and manufacturing-based (conformance to requirements or specifications, Six Sigma defect rates, low allowable variation).
Each of these definitions has value and legitimacy depending on context, so definitions of “good enough” and quality must be reached by an understanding of the individual project environment.
Performance Criteria
Apollo 13’s CO2 filter project, like all projects, is bounded by the triple constraints. It is always vital at the beginning of the project to ensure that you have a good understanding of the project goal and the project context. The initial mission statement, as you’ll recall, is “invent a way to put a square peg in a round hole—rapidly.” In other words, take the square CO2 scrubber and figure out a way to make it do its job adequately in the round socket of the LEM. That’s the performance criterion, and it’s one of the three legs of the triple constraints.
Why define the baseline as “adequate”? Why not “perfect”? Let’s explore the question for a moment. We think it’s vital to define the performance criteria for triple constraints purposes as the minimum acceptable, not the best possible. We need to know what par is. Once we’ve defined par, we can then define superior performance. Remember, superior performance—quality—is only superior if it adds value. Let’s look at some quality metrics that won’t add value in this particular situation:
•Standardization. In general, having parts conform to standard designs, templates, and toolings is good practice. Here, it adds no value. This is a one-shot effort.
•Durability. Making it good enough to last 10 years adds no value. If it breaks 10 minutes after splashdown, there’s no harm done.
•Industrial design. Its attractiveness, visual design, and aesthetic qualities add no value. If it keeps them alive, it’ll be beautiful enough in the eyes of the beholders.
Conversely, some quality elements would add value and might be worth a bit of extra time, if there is some to spare:
•Ease of assembly. Especially as CO2 levels build up and the astronauts begin to suffer mental impairment, an easy-to-assemble design would lower risk and might stretch the deadline.
•Fewer parts. Given the possibility of other breakdowns and the need to improvise, consuming fewer scare resources would add a safety margin to other projects, both known and potential.
•Efficiency. If it does a better filtering job or consumes less power, this adds a safety margin that might lower risk elsewhere or mitigate risk for problems that might yet occur.
Time Constraint
The stated deadline is “before the CO2 level reaches 15.” We have an approximate idea how long that will take, but there’s some variability. Additionally, the level-15 criterion itself isn’t an absolute. That is someone’s best guess as to the level at which the astronauts will be too impaired to be able to build what the engineers come up with. Possibly, the astronauts will still be able to do it at 15.2. Possibly, on the other hand, they might not be able to do it at 14.8. If the latter, it does no good whatsoever to argue, “but you told me I had until 15!” The deadline isn’t just what they tell you it is, it’s the point at which events move forward irrevocably. The clock is ticking and the CO2 is accumulating. At some moment the astronauts will begin to suffer from impaired judgment followed eventually by unconsciousness and death. We can only approximate how long we have, but it isn’t long and it isn’t subject to negotiation.
If there is a trade-off to be made between the time constraint and the performance criteria, we know that ultimate failure—the death of the Apollo 13 astronauts—comes most rapidly from failure to meet the time constraint. That is, if we build a perfect CO2 filter, but we finish it too late, we’ve still failed. Perfect performance does not compensate for a failed deadline.
But wait! Why isn’t the reverse equally true? If you fail to meet the performance criteria, isn’t it irrelevant how quickly you fail to do so? Actually, it depends on the extent of the failure.
To illustrate, let’s look at this scenario: You’ve managed to come up with an inefficient partial solution that will last only half as long as it’s going to take to get the astronauts back home, but you’ve done so within the original time constraint. Do you take this solution? Absolutely! Although you have failed to make the performance goal for the project within the original time constraint, you’ve reset the game clock and given yourself a whole new window of time in which to attack the problem anew. With a day or more to work instead of mere hours, your chance of finding a solution that solves the remainder of the problem has become that much more possible.
In other words, the right kind of failure is not only an option, but sometimes a desirable one. We can’t accept a failure to meet the time constraint, but we can live with a partial performance failure and stay in this game.
Cost Constraint
In the movie, the cost constraint is represented in the scene in which one of the technicians staggers in under the weight of a box filled with all the items available on the spacecraft. This project has a zero dollar budget, but it has a budget nevertheless, and it’s a highly restrictive one. It’s a resource availability budget. In triple constraint terms, the cost constraint consists of anything you can spend on your project: money, material, even man-hours. There’s a limit to what you can spend, sometimes imposed by organizational decisions and sometimes imposed by the environment.
Here, the cost constraint is environmentally imposed, and it’s an absolute. We have what we have—whether or not it’s adequate. It has nothing to do with how much we value the astronauts or how much it’s worth to us to bring them home. The fact remains that we don’t have the option to send up as much as a gram extra within the available time.
Think (ahem) outside the box, however. The box represents the cost constraint. It is someone’s best judgment, someone who is under pressure, as to what is available on the spacecraft for this project. That judgment may not be infallible. Again, the constraint is exactly what it is, not what someone says it is.
Some sample questions to probe resource flexibility:
•Have we found everything we can possibly use? The final resource tally to build the CO2 filter included someone’s sock.
•Have we been as creative as possible in thinking of all possible uses for each item? Can the cover of the flight plan be used as a stiffener board to hold filter material in place?
•Can we make do without some component that we don’t have? Even if it’s customary to screw the device to the wall for security, can we hold it on with a bungee cord instead?
Summary of Flexibility
The only flexibility in the time constraint has to do with how long the astronauts can actually hold out against the effects of hypoxia. We see it as a probability range centering on when the CO2 reaches 15, and when exactly it reaches 15 is itself a probability range. The flexibility in the cost constraint is a function of our creativity. The flexibility in the performance constraint involves the utility of partial solutions as deadline extenders, widening the range of acceptable answers.
The deadline could be earlier than estimated as well as later. That means there’s no real flexibility in the time constraint. We’d better be conservative, because we have no real control over the actual hypoxia level. And for any given expenditure of resources, we’ll gain a bigger boost if there happens to be a wider range of acceptable outcomes (performance). In order of flexibility, our hierarchy follows the order of driver/middle constraint/weak constraint or, in this case, time/cost/performance.
The Hierarchy of Failure
The order of flexibility is a “hierarchy of acceptable failure.” First, failures grow in damage to the project as they climb from the weak constraint to the driver. A temporary rig that keeps the astronauts breathing (performance) does minor but acceptable damage to the project (see Figure 2-2). Slicing through a plastic bag during final assembly (cost) causes our hearts to beat faster as the engineers check to see whether they have a replacement or work-around. Missing the deadline (time) means curtains to movie and mission alike.
Second, and useful to project managers, is that exploiting the more flexible constraints—that is, choosing to fail where failure actually is an option—may be a creative way to solve more pressing or challenging problems where failure is not an option. Apollo 13 failed to go to the moon. They failed to keep the command module powered up for the return flight. They failed to use the computer during a critical burn. But each failure opened a new door.
Fig. 2-2. The CO2 Filter Bungeed to the LEM Wall. The product of the real-life project: not pretty, but effective.
As a check on our methodology, let’s look at the options for failure for our CO2 filter project. Interestingly, the widest options for failure lie within the performance criteria. We can fail to make something that will last long enough; we can fail to make something that meets many of the standard specifications for CO2 filters; we can fail to make something out of standard materials—our failure options are numerous.
In the cost arena, we can fail to use all our resources, fail to use resources in the standard and approved fashion, fail to use resources for the purpose for which they were originally intended, and fail to conform to the standard bill of materials, among other types of failures.
With the time constraint, for as much as we regard the actual deadline as uncertain, it isn’t actually uncertain at all. It occurs at a very definite point. We just don’t know exactly when it is.
Triple Constraints for CO2 Filter Project
PROJECT: Build CO2 filter adapter to allow square CM filters to be used in cylindrical LEM sockets.
—DRIVER: Time Constraint (complete before the CO2 levels overwhelm the astronauts, which occurs at approximately level 15)
—MIDDLE: Cost Constraint (use only what’s available on the spacecraft, more or less what’s in the box)
—WEAK: Performance Criteria (works well enough to allow the astronauts to live and continue performing their duties for the abbreviated mission duration)
When the modified CO2 filter was bungeed to the wall and the levels were dropping to a safe level once again, one of the engineers who’d designed the workaround—a man with the look of someone who in a subsequent generation would own a complete set of the Advanced Dungeons & Dragons rule books—was voicing his relief when a console operator looked at him and said, “You, sir, are a steely-eyed missileman.”
One secret of being a “steely-eyed project manager,” then, is taking the time to understand the triple constraints on your project. Which of the six dimensions does your current project call home? What sorts of problems and challenges should you expect? Where are you going to find the flexibility and extra resources you need to face those challenges?
And what happens if you fail to get the triple constraints right?
Three Levels of Apollo Constraints
We now understand that after oxygen tank number 2 exploded, the hierarchy of constraints for Apollo 13 changed to time/cost/performance. What did it change from, and what makes us so sure we’re right? Earlier, we stated that the hierarchy of the Apollo 13 mission started as performance/time/cost. In addition, the Apollo program as a whole is an example of the time/performance/cost hierarchy. How can both be true?
Program versus Project Constraints. Oddly, there is no requirement that project goals and program goals have the same hierarchy of constraints, although the opposite would seem intuitive. The quick proof is to look at a project’s constraints compared with those of a work package within the project. Imagine the project’s driver is the time constraint. Now consider any task not on the critical path. Notice that time can’t possibly be the driver of a noncritical task, because you always have the flexibility of its slack or float.
In the same way that a project is composed of work packages, a program is a collection of projects managed together. Some programs have a planned end point, and are therefore also projects (e.g., the Apollo program), while other programs have no planned end point and fall outside the realm of the triple constraints, because their constraints are normally reset at the beginning of each funding cycle (e.g., the National Cancer Institute). Both types of programs have projects within them, but only the first type is relevant here.
Because priorities at the program level and at the individual project level can differ, so can the hierarchy of constraints. Just because you know the right answer at one level doesn’t mean you can be confident you know the right answer at the other.
Hierarchy of Constraints at the Program Level. The Apollo program, the culminating chapter of the Destination Moon trilogy, was set in motion by John F. Kennedy’s May 1961 Special Message to the Congress on Urgent National Needs: “I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the moon and returning him safely to the earth.”
The time constraint, clearly, is “before this decade is out.” The performance constraint is “return him safely to the earth.” The project objective is “landing a man on the moon.” The cost constraint, although not specified, can be analyzed. If you don’t know the funding amount, determine who must authorize and can stop further spending. In this case, it’s Congress, making the initial cost constraint “whatever we can get out of Congress.”
For the hierarchy of constraints, we need to figure out the real purpose of the project. Why are we going to the moon? Different reasons lead to different hierarchies. The amount of funding, the sense of urgency, and the level and type of performance all change depending on whether the motive is science, colonization, or a Mount Everest expedition (that is, “because it’s there”). In 1961, the big reason was “to beat the Soviet Union.”
This makes the driver time, the weak constraint cost, and, by default, the middle constraint performance.
Hierarchy of Constraints for the Project. The Apollo 13 mission, however, was flown after the program had achieved its stated objective. The space race was over, and the United States had won the gold. Time couldn’t very well be the driver of Apollo 13. Cost was a candidate to be the driver, but most of the major costs had already been incurred. Remaining costs were quite flexible. (Apollo missions 16–18, however, were cancelled, because enough cost savings were possible to make cancellation worthwhile.) This leaves performance as the driver.
Although cost considerations overall were of growing significance to NASA, the relatively small amounts at stake to make safety-related performance adjustments for the Apollo 13 mission were not at issue. Flexibility was great. The cost pressure expressed itself in the form of time: it was important to wrap up the remaining Apollo missions because future NASA budget cycles were sure to be reduced. The reason for a constraint and the way in which the constraint is expressed don’t have to be related. Budgets can put pressure on deadlines; deadlines can alter performance. Watch out for this; if you get confused, you can easily make wrong decisions.
This places the original Apollo 13 project constraints of performance/time/cost within the Apollo program’s time/performance/cost environment.
TRIPLE CONSTRAINTS PRINCIPLES
1. The hierarchy of constraints tends to be stable on projects unless there is a big change in the project environment. Whenever a big change occurs, check to see whether the “why” of the project has changed. If it has, there’s a good chance the hierarchy of constraints has changed as well.
2. Failure is not only an option, it’s also a gateway to success … if you fail in the right dimension.
3. In defining the performance criteria for triple constraints purposes, you should specify the minimum acceptable performance, not the best possible. We need to know what par is. Once we’ve defined par, we can then define superior performance. Superior performance—quality—is only superior if it adds value. It ain’t dog food if the dog don’t eat it.
4. The constraints aren’t what they tell you, but what is real. Pay attention to what you’re told, but it may not be the whole story.
5. You don’t have a really complete operational definition of the project unless you can define it in terms of the triple constraints, and you don’t have a good understanding of the objective unless you can express it in the hierarchy of constraints—that is, the constraints in the order of flexibility.
6. There is no requirement that project goals and program goals have the same hierarchy of constraints.
7. The reason for a constraint and the way in which the constraint is expressed don’t have to be related. Budgets can put pressure on deadlines; deadlines can alter performance. Watch out for this; if you get confused, you can easily make wrong decisions.
INEFFECTIVE LEVERAGE: WHEN FAILURE IS FAILURE
What we have here is a failure to communicate.
—Cool Hand Luke (1967)
Case 2: The Challenger Disaster, NASA Houston, Project Managers
The right kind of failure, as we’ve just learned, can lead to success. On the other hand, the wrong kind of failure leads directly to failure (do not pass GO, do not collect $200).
If the advantage of understanding the triple constraints and their proper hierarchy on your project is that they yield knowledge of the right priorities and the hidden options available for your exploitation as a project manager, what would be the corresponding disadvantages of a failure to understand, or worse, of an incorrect understanding?
Wishful thinking, the effects of corporate culture, time pressure, multiple stakeholders, and other circumstances can conspire to make it extremely difficult for project managers and the project team to make good decisions, and the consequences can be catastrophic.
What Caused Challenger?
Do you remember where you were the moment you heard about the Challenger disaster (see Figure 2-3)?
Common cultural moments come maybe once a generation: like that cold Florida morning in January 1986 when 73 seconds separated takeoff from tragedy. Michael was standing at a trade show booth at Chicago’s McCormick Place, staring unbelieving at the network replay on a black-and-white portable television some salesman had smuggled in so he could watch Sunday’s Super Bowl. Heidi was walking down a hallway in St. Paul, Minnesota, and saw the disaster through the window on a TV that had been set up for the special screening.
What caused the Challenger explosion? From a technical perspective, it was caused by the failure of the O-rings. From a process perspective, it was caused by various failures in the risk management and overall program management processes. Leadership failure. Political interferences. Poor graphic sense.
Fig. 2-3. The shuttleChallengerexplodes shortly after takeoff.
Challenger also shows what happens on a project when the hierarchy of constraints is not well understood, shared, or communicated among project stakeholders.
The Organizational Environment
One of the most concise views of NASA’s post-Apollo organizational environment comes from physicist/raconteur/genius Richard Feynman, named to the presidential commission investigating the Challenger accident. As he ambles Columbo-like through a Dilbert-meets-Kafka NASA landscape asking innocent questions of frontline managers and technical personnel, he observes, “When I left the meeting [with NASA engineers], I had the definite impression that I had found the same game as with the [O-ring] seals: management reducing criteria and accepting more and more errors that weren’t designed into the device, while the engineers are screaming from below, ‘HELP!’ and ‘This is a RED ALERT!’”
Background: Monday Morning, 9 AM
Challenger mission STS-51-L had originally been scheduled to lift off at 3:43 p.m. (EST) on January 22. Delays in the previous mission, bad weather in Senegal (the transoceanic abort landing site), the selection of a new alternate abort landing site, launch processing delays to meet the new early morning launch time, and bad weather predicted for Kennedy Space Center forced the five separate launch delays. Each delay received press coverage with an increasingly negative tone. On January 27 a ground-servicing equipment hatch closing fixture couldn’t be removed from the orbiter hatch, triggering yet another 24 hour delay—the sixth—while maintenance crews sawed off and drilled out the attaching bolt. The press was merciless.
The purpose of the countdown process is to ensure all items are checked before launch is initiated. It’s not unusual for numerous issues to be escalated, discussed, worked on, or remediated. Part of the process included a discussion of potential O-ring failure at the anticipated low temperature. A teleconference with the engineers who had created the oversized fuel tank gaskets at Morton-Thiokol, shuttle engineers at Marshall Space Flight Center, and personnel at Kennedy Space Center was held on the evening of January 27. The decision resulting from that meeting had a huge effect on the decision to launch.
What caused Challenger? Feynman argues that when the Apollo project ended, NASA needed to invent a mission that only they could do, and to do so, “it is…apparently necessary…to exaggerate [how often, how safe, how much would be discovered]. … Meanwhile, I would guess, the engineers at the bottom are saying, ‘No, no!’ … Maybe [the big cheeses] don’t say explicitly Don’t tell me,’ but they discourage communication, which amounts to the same thing.”
But do we have at NASA—or at most organizations, no matter how frustrated we may get with our own management’s actions on occasion—a leadership cadre willing to sacrifice shuttle-loads of astronauts or other personnel so we can all keep our cushy government jobs? Oh, we hear the cries of “Yes!” in the background, but few people truly function without a conscience. They—and we, to be honest—have a thousand and one ways to live within our moral envelopes, no matter how others may see it.
Edward Tufte’s Graph
Not to be overlooked in a discussion of the Challenger disaster is the fierce debate over how data were communicated. Edward Tufte, internationally known expert in visual presentation, believes Feynman’s demonstration of O-ring weakness is flawed. So an O-ring pulled from a glass of ice water was brittle. Compared with what? A glass of room temperature water with a bit of O-ring rubber for comparison would have been more correct, argues Tufte.
Tufte continues by pointing out a flaw more serious than the brittle O-ring: the analytical process broke down and failed to make it crystal clear to management that at 28° or 29° Fahrenheit, the chance of catastrophic failure was overwhelming. Here, management failures extend from the top well down into the layers of middle management and Morton-Thiokol engineering: management was convicted of the crime of “overriding intellectual failure” and a “scandalous discrepancy between the intellectual tasks at hand and the images created to serve those tasks.”
Satisfying as it may be to conclude that somehow Mad magazine’s proverbial Usual Gang of Idiots ended up in command of the once-great agency that took us to the Moon, that answer seems somehow too pat. It is no mystery how stupid people can behave stupidly, but it is far more interesting to contemplate how smart people can experience these lapses in judgment that seem so blindingly obvious to everyone else after the fact. Perhaps they aren’t always as stupid as they seem, and the mistakes not as obvious in foresight as in hindsight.
It seems, for example, that Tufte’s insightful visual may not have accurately reflected the actual position of the engineers, nor the data available at the time. Faculty and students at the Rochester Institute of Technology (RIT) cite these flaws in Tufte: he evidently believed the engineers knew the temperatures at launch of all the shuttles, when they did not, although they tried to obtain that information. He “misunderstands and misrepresents” the engineers’ argument and evidence, and his vertical and horizontal axes respectively track the wrong effect and mix O-ring temperature and ambient air temperature (none of which militates against Tufte’s ideas about the importance of effective visual design and presentation of information). The conclusion of the RIT analysis is that the engineers were not at fault as Tufte argues.
In either case, Tufte is not primarily interested in explaining the reason behind any management or engineering bias in favor of flawed information presentation or flawed decision making. Given that persuasively and powerfully presenting information makes it easier to get buy-in, how do we determine what the right goal is? What should the engineers be trying to say? Why are they encountering resistance? What is the nature of management’s objection? The answers to those questions help determine the right approach to presenting one’s data.
The Mission and the Project
Feynman is onto something when he argues that NASA needed to invent a mission following Apollo, but he’s wrong: NASA has always had a mission—in essence “to boldly go where no man has gone before.” What NASA was lacking post-Apollo was a project.
NASA projects have always had somewhat of a red herring element to them, because the underlying mission of NASA is not one with universal stakeholder acceptance. Not everyone believes that space exploration, pure science, or eventual colonization are worthwhile goals, especially where their taxpayer dollars are concerned. As a result, NASA sells its projects to the stakeholder community based on other values: creating technology spin-offs, pursuing commercial or military applications, beating the Soviets in a space race, or even selling Tang®.
The space shuttle, marketed as “the DC-3 of outer space,” never caught the public imagination the way Apollo did—there was no overwhelming goal of national pride, no enemy to beat. As a result, funding was tight from the beginning. If NASA admitted that instead of the “DC-3 of outer space” they were building something more like the “Curtiss Condor of outer space,” they would not be commended for honesty, but rather be punished by budget cuts.
If a Job’s Worth Doing, It’s Worth Doing Badly
There’s the scene in Butch Cassidy and the Sundance Kid in which Butch and Sundance are at the top of a cliff, the river so far below it looks like the Road Runner’s Coyote is about to fall in.
“Jump!” Butch says. After sputtering for a few moments, Sundance blurts out the truth. “I can’t swim!”
“Don’t worry, the fall will probably kill you,” Butch replies cheerfully.
Why are they contemplating such a drastic jump? Because the “Who are those men?” posse is closing in. The choice is simple: jump with the probability of dying, or stay and fight with the certainty of being killed. It’s an unpleasant choice, but fundamentally an easy one. You can’t very well call it a “good” decision, because there isn’t a really good alternative. One choice is simply less bad than the other.
Most people believe the purpose of senior management is to make good decisions. It’s not. The purpose of senior management is to make bad decisions. Anybody can make a good decision. All you need is a minimum of one good alternative. If there are multiple good alternatives, even a sub-optimal decision isn’t that terrible. But there’s a military abbreviation, “AOS,” which translates to All Options [Lousy]. The worse the set of options, the more likely the decision will be kicked up the ladder to a higher level of manager. If you’re high enough, all you ever see are AOS choices.
When your choice is between “bad” and “worse,” it’s not always difficult to choose, but it’s certainly not pleasant. And if somebody wants to point out what’s wrong with your decision, they aren’t being responsive. You know there’s a lot wrong with your decision. But the other alternatives appear worse. To make a legitimate argument for you to change your mind, someone has to show you that your decision is worse, not merely bad. And the last thing you need is some Morton-Thiokol engineer telling you something that you already know—thank you very much—but you can’t do anything about.
Working with a none-too-perfect technology to execute its mission, NASA now confronted fundamentally different choices than it previously had with Apollo. The essential decision is whether the current mission under the current circumstances is worth the current risk, and if it is, then you proceed. The choice isn’t between perfect conditions and rotten conditions; it’s between arguably compromised conditions and no project at all.
Unfortunately, you can’t always explain the situation in public. Sometimes we have to pretend the dilemma doesn’t exist or that we aren’t really the one making the choice when we really are. Or maybe we can tell the truth … if we spin it very, very carefully.
Truth versus Fiction
The truth is out there, but it’s wearing camouflage, and those who want to believe things are not as they really are have every opportunity to buy into a happier dimension. The truth was NASA was no longer in the days of Apollo, even if they wanted to believe that they were. The ugliness of the inexorable triple constraint pressure intruded on the faux reality. The language of the day became Newspeak (think Orwell’s 1984), and cognitive disconnect settled over the space program.
Cognitive disconnect results from a paradox: cut costs without compromising safety or schedule. Then do it again. When people perceive an irreconcilable dilemma, the resultant adjustment can have unintended negative consequences. So, too, it is with organizations. The NASA dilemma may not be as stark as it seems. But when your resources are being cut and performance expectations remain the same, what else are you to think?
Principle of Linguistic Relativity
Language plays a huge role in how people perceive reality, and every organization is aware of this. The principle of linguistic relativity, also known as the Sapir-Whorf Hypothesis, claims a relationship between the language that you speak and how you understand the world around you. Benjamin Whorf, a linguistic expert who once worked as an insurance adjuster, observed that language-influenced misunderstandings were a leading cause of fires. For example, “inflammable” was thought to be the opposite of “flammable,” rather than a synonym. The fire started, people would explain in insurance claims, because an empty gas can had been stored next to oily rags. The opposite of “full” is “empty,” but when you pour the gas out of a gas can, you’re left with a can filled with fumes that are more flammable (or inflammable) than the gas itself.
Who Defines the Project?
The question of who defines the project—indeed, if anyone really “defines” it in a volitional sense—is critical to understanding NASA’s Challenger dilemma. Although some projects are “defined” in a traditional sense, it’s better to think of other projects as “discovered.” It’s not up to you, the project manager, or even up to the customer or project sponsor, to decide what the project is, or when it’s appropriate to hang out the “Mission Accomplished” banner and swoop proudly down onto the carrier deck. The project is what the project is, and you don’t have a choice in the matter.
The space shuttle was supposed to be the DC-3 of outer space, but it actually was a compromise to move space exploration and space science forward in the absence of a compelling public drive. All the wishing in the world wouldn’t produce the extra funding or modify the environment to allow NASA to pursue the project they wanted to pursue.
In your project environment, it’s important to realize that you aren’t there necessarily to do what the boss says or what the customer says. If a disconnect exists between what you’re told and the reality of your project environment, the earlier you become aware of the problem, the greater the options available to you to solve the dilemma. We’ll explore variations on this problem throughout this book.
The internal language of NASA, rooted in the successes of Apollo, was slow to change for reasons both inertial and political, even though the shuttle program was a radically different effort, located in a different project management dimension altogether. This tended to encourage people toward a deeply flawed understanding of their own organizational environment and equally flawed decision making.
Moving from Apollo to the Shuttle
In the Apollo dimension, there was a race. Time was the driving force: NASA had to beat the Soviets to the moon. In second place came issues of safety and technical performance. Risk was acceptable, but there were limits. Backup missions and redundancy were the order of the day. To speed the schedule and reduce the risk, flexibility was to be found in the area of cost: money and manpower.
The Space Shuttle Program is a logical follow-up to the Apollo Program, if you think in terms of space exploration, space research, and space colonization. It is not a logical follow-up, however, if you think of the Apollo Program as part of our national effort to compete against the Soviet Union. Maybe the 1980 U.S. Men’s Olympic Hockey Team is a more appropriate next step, or possibly the Strategic Defense Initiative. Certainly, the space shuttle failed to capture the public imagination (and Congressional and White House support) the way the lunar program did. There was no compelling public purpose to make Congress back the money truck up to the NASA loading dock.
Cost, not schedule, and not safety and technical performance, was the driving force behind the new shuttle project (see Figure 2-4). This is not what management (not to mention engineering) would have chosen; it’s what they were given. The project isn’t what you want it to be, it’s what it is. Worse yet, the appropriations came with a clear understanding that not everyone was supportive of NASA’s unadorned mission. If the agency couldn’t deliver, there were plenty of other people who could use those funds.
But Apollo had permanently raised the bar for NASA. It could not turn around and deliver small-time results. The space shuttle had to be, in Apple™-speak, “insanely great,” the DC-3 of outer space, even though there had not yet been a DC-2 … or -1. The most logical approach, it would seem, would be to maximize quality by stretching out the schedule. Time, after all, appears more flexible than safety and performance, at least from an engineering perspective.
But there are stakeholders to consider. The funding cycle is influenced by public relations and media coverage, and this program has negative stakeholders, people whose interests are advanced when your project fails. If you drag things out too long, wait too long between missions, or take too long in the development process, you risk your funding—and the very existence of the program. You must consider time pressure, sometimes even at the cost of compromises in safety and performance. You cannot allow risk to a single mission to jeopardize the program as a whole.
Fig. 2-4. Shifting Hierarchy of Constraints. The shifting priorities at NASA from Apollo to the shuttle program.
As Figure 2-4 shows, no priority for the shuttle program remains in its original Apollo position. And in the dimension in which the new program lives, no priority is where management or engineering staff would wish it to be. Ideally, flexibility in cost would be the first tool for fighting problems, followed by flexibility in time. Instead, we are forced by circumstances and our environment to accept greater flexibility in performance.
Further complicating the project, in the case of Apollo, the ultimate reason for the program (“We’ve got to beat the Soviets!”) may be spoken aloud, while in the case of the shuttle, the reason for cost being the driving force (“We’re building the Curtiss Condor of Outer Space because that’s all we can afford!”) cannot be spoken aloud. Telling the truth will result in the same punitive consequences we’re trying to avoid.
The result is an organizational culture that finds it internally hard to shake off the previous (successful) program mind-set, and in Sapir-Whorf fashion, the language of time and performance is the language that is spoken. Cost language is whispered, and then often only among senior managers. Communication, as Feynman correctly observes, is compromised—sometimes fatally (literally). You may want the truth, but you can’t handle the truth.
Why do managers in this situation eventually stop listening to the engineers? Because the engineers aren’t telling them anything (1) they don’t know and (2) that’s useful to them. Risk decisions are management decisions, not engineering decisions. Engineering data are input to a management decision: What are the probability and impact of a given event? The management decision is made based on whether, under the totality of circumstances, the risk is worth taking. Such decisions are based on values.
Do we make certain decisions knowing there is a significant likelihood of loss of human life? Of course we do. Soldiers, public safety workers, and others put themselves in harm’s way all the time. The question before us is whether the risks posed to the Challenger crew were worth taking. (We note that it was extremely difficult for the engineers to gather data that would quantify actual risk on the project. Because the numbers in all likelihood would be something quite opposed to what management could afford to recognize officially, why would management want them gathered in the first place?)
It’s important not to impugn the motives of the managers unfairly here. When Feynman assumes that these motives are merely job preservation and the desire of a bureaucracy to preserve itself, he misses the mark. Certainly such self-preservation drives are present, but substantial idealism is at work as well. There is a NASA mission, and the management believes in it. Keeping the dream alive in hard times forces unattractive decisions. Moreover, the managers are in a box not entirely of their own making. They don’t, after all, set the level of their funding, and although they must find the strongest arguments possible, there are limits (legal and practical) to their ability to lobby and sell NASA programs to Congress and the public.
In moving from Apollo to the shuttle, NASA went through the looking glass from one project hierarchy (time/performance/cost) to an opposing project hierarchy (cost/time/performance). This creates a project environment so completely different that it influences the way you think about and behave in the context of that environment. Yet, here at NASA, one project language is spoken aloud, and another is followed in secret—by management.
Assault on Morton-Thiokol
Let’s set the Waybac machine to the evening of January 27, 1986, to see how the conflict between NASA’s historical time/performance/cost orientation and its then-current real, but unacknowledged, cost/time/performance environment influenced thinking about the launch decision.
The potential hazard of O-ring failure at the anticipated low temperature was discussed in a teleconference with Morton-Thiokol engineers, Marshall Space Flight Center engineers, and Kennedy Space Center personnel on the evening of January 27. According to engineer Roger Boisjoly, a participant in the conference, Morton-Thiokol recommended against the launch. One NASA official responded that, although he was “appalled” at Thiokol’s recommendation, he would not launch over the contractor’s objection. Another NASA official conducted his own assessment of the data and found it to be “inconclusive.”
Boisjoly says:
NASA’s very nature since early space flight was to force contractors and themselves to prove it was safe to fly. The statement by [the NASA official] about our data being inconclusive should have been enough all by itself to stop the launch according to NASA’s own rules, but we all know that was not the case.
Subsequently, the four senior Morton-Thiokol executives present conducted an off-line five-minute caucus and made a launch recommendation. The engineers, according to Boisjoly, were excluded from the final discussion and from the vote. He concludes:
NASA [through intense customer intimidation] placed [Morton-Thiokol] in the position of proving that it was not safe to fly instead of proving that it was safe to fly. Also, note that NASA immediately accepted the new decision to launch because it was consistent with their desires and please note that no probing questions were asked.
That’s how it looks from an operational perspective. Unethical, cowardly managers fold under pressure and people die. But from the perspective of the triple constraints, the picture isn’t exactly the good guy/bad guy scenario Boisjoly describes. Organizational life, no matter how it may seem, isn’t a Dilbert cartoon.
The management perspective in this situation isn’t irrational, illogical, or arbitrary. It may in fact be wrong, but that’s not the same thing. The operational perspective might be wrong, too.
Management’s perspective is based on the program environment, in which cost and time both trump safety and performance, for the reasons discussed. Clearly, the decision to launch had to be based on such a perspective, because otherwise it makes no sense to have switched the “go” standard from “prove it’s safe to fly” to “prove it’s not safe to fly.” Boisjoly’s telling phrase “NASA’s very nature from the early days” reveals the contradiction—that is, NASA’s “very nature” has in fact changed. The engineering perspective assumes that the NASA perspective is still what it used to be in the Apollo age.
When technical professionals and management continue to argue, it’s not the technical validity of the technical opinions that’s in dispute, it’s the relevance. With risk levels so high and the consequence of failure so dramatic, human beings would rather not stare too directly into the abyss. Don’t rub my nose in it, please. I understand your position; why don’t you understand mine?
Let’s return to the Challenger launch, which is now scheduled for 9:37 a.m. (EST) January 28. President Ronald Reagan is scheduled to give his State of the Union address that night. Challenger requires another two-hour delay for a failure in the equipment that monitors the fire detection system. The decision-making apparatus has put substantial pressure on “go,” but a last-minute abort is still possible.
The State of the Union
President Ronald Reagan was scheduled to give the State of the Union address the night of January 28, 1986. The address was delayed because of the Challenger disaster. One often-repeated theory about the Challenger disaster is that the decision to launch was driven by Reagan’s desire to speak live to Christa McAuliff during State of the Union address. No direct evidence of this, of course, has ever been found, and it is highly unlikely that any such evidence exists. Any Reagan aide high enough to give such an order to NASA would also have had the political savvy to know the risks involved.
Joshua Gilder, a senior speechwriter to President Reagan who worked on that particular State of the Union address, laughed. “NASA and the shuttle flight weren’t even on our radar,” he said. “Maybe NASA management cared, but none of us were thinking about it.”
In any event, you need not assume any White House pressure placed on NASA to understand that, yes, indeed, the State of the Union address very likely played a role in the pressure to launch. The pressure NASA officials (or any operations-level managers) felt need not be related to any pressure actually applied by the White House (or any executive team, major customer, or big boss). There is a tendency to amplify any messages (or even the illusion of messages) transmitting down a hierarchy.
From the White House perspective, a live Challenger phone call during the speech is at most a gimmick, a minor enhancement, not worth the additional risk if something should go wrong. But to NASA officials, a live broadcast from Challenger during the State of the Union is priceless publicity, an opportunity to earn credit with the administration and a definite mission enhancer. Where do you think most of the pressure is likely to come from?
No one needs to make an official statement. No one needs to go on the record with an updated version of “Will no one rid me of this meddlesome priest?” No one needs to give an order. We can all figure out the importance of the mission on our own, and it’s worth some increase in acceptable risk to make it happen.
On your projects, be careful when you’re told “the boss wants …” or “the customer wants …” in the absence of evidence. That doesn’t mean you should ignore requests or orders or fail to perform as required. Try to figure out the origin of these goals. Who wants these things? Why do they want them? How will the project be affected? What is the impact on risk?
It’s not agreement that gets you into trouble. It’s blind agreement.
And the State of the Union address is scheduled for that night.
The pressure to launch is almost irresistible.
Houston, We Have Another Problem
The cost-driven project environment is the least pleasant place for technically driven professionals, because it forces the greatest compromises and adjustments in the values they take most seriously. How can performance not be uppermost in everyone’s mind, technical professionals ask.
Nobody chooses a cost-driven project environment, and there’s always something wrong when your project has cost as its driving force. Either the resources don’t exist, or there are substantial negative political forces opposing your project goals. The latter is the case here. What if NASA management and engineers had understood and accepted that moving from Apollo to the shuttle meant moving from a time/performance/cost dimension to a cost/time/performance dimension? Usefulness is, of course, the acid test for any model. Our essential questions are: What would have been different at NASA with this understanding? Would the Challenger disaster have been averted?
Meeting Redux
Let’s replay the critical teleconference between Morton-Thiokol and NASA the night before the launch. Morton-Thiokol recommends against launching. NASA is “appalled,” making it clear they consider launching a high priority. In practice, this translates as an increase in the level of acceptable risk. Now the ball is back in Morton-Thiokol’s court, but the question is different: Is the actual chance of catastrophic failure higher than the risk threshold NASA is willing to assume?
If for political reasons no one can name an actual allowable risk number, we still have an idea of what it is. Boisjoly and other engineers are asked to look at their data again, but they know it’s their job to determine whether the failure risk is unacceptably high based on an understanding of customer risk tolerance, and as a result, the engineering input is far more welcome and appropriate.
It’s possible that the launch decision might still have been “yes.” Because the original risk assessment was never properly done, and the allowable risk threshold never determined, it’s impossible to know. But regardless, the launch decision would have been more correct—that is, made for the right reasons and with the right methods—and may well have saved the lives of those seven astronauts.
TRIPLE CONSTRAINTS PRINCIPLES
1. Communicating outside your profession or to people with different interests or values meets resistance. Don’t rely on a single medium to get your message across. Think visual, auditory, kinesthetic—whatever works. (But get your data straight first.) Above all, ask WIIFM from the other party’s perspective.
2. Marketing is an unavoidable and important part of a project manager’s function. Understand how to sell your project by showing the specific benefits to different groups of stakeholders.
3. Making “bad” decisions is a key element of leadership. When there are no good choices, finding the least bad choice and acting on it is the best you can do. Messier problems tend to escalate. The higher you are, the fewer easy decisions come your way.
4. Language plays a huge role in how people perceive reality. You can’t always explain the situation in public. Some things may be true, but they aren’t official. If you can’t tell the world, you can tell the core group. And don’t put it in writing unless you are prepared to have it read.
5. Although some projects are “defined” in a traditional sense, it’s better to think of other projects as “discovered.” It’s not always up to you, the project manager, or even up to the customer or project sponsor, to decide what the project is. The project is what the project is, and you don’t have a choice in the matter.
6. Inertia is a business concept. Old perspectives hang around even when reality has moved on. The result is an organizational culture that finds it internally hard to shake off the previous mind-set.
7. When the technical authorities and managers continue to argue, it’s not the technical validity of their opinions that’s in dispute, it’s the relevance.
8. Pressure doesn’t always have to come from a single person. Sometimes it simply appears and becomes part of the general understanding. On your projects, be careful with such statements as “the boss wants …” or “the customer wants …” in the absence of evidence.