Don't Go Chasing Waterfalls: A More Agile Healthcare.gov

October 28, 2013

In 1945, Richard Feynman, the Nobel Prize-winning physicist, was a junior member of the Manhattan Project, working to develop the world’s first nuclear bomb, in Los Alamos. Feynman, a graduate student, was in charge of the computers. At the time, “computer” described a job: a person who performed calculations, either by hand or using a mechanical calculator. Feynman was in charge of supervising and organizing the staff who performed the calculations to model the potential explosive power of the weapon.

Feynman’s computers used Marchant calculators to do their work. In “Genius,” his biography of Feynman, James Gleick describes the Marchant calculator as “a clattering machine nearly as large as a typewriter, capable of adding, subtracting, multiplying, and with some difficulty dividing numbers of up to ten digits.” But as the date of the nuclear test drew closer, the project needed to calculate exactly how much energy would be released. The computers using Marchant calculators were working too slowly, so Stan Frankel, another graduate student in the computation group, recommended they try new I.B.M. tabulating machines, which used punch cards to perform calculations. Feynman agreed, but the I.B.M. machines and the technicians to maintain them were slow to arrive. In his 1975 lecture “Los Alamos From Below,” Feynman described his solution:

In this particular case, we worked out all the numerical steps that the machines were supposed to do—multiply this, and then do this, and subtract that. Then we worked out the program, but we didn’t have any machine to test it on. So we set up this room with girls in it. Each one had a Marchant. But she was the multiplier, and she was the adder, and this one cubed, and we had index cards, and all she did was cube this number and send it to the next one.
We went through our cycle this way until we got all the bugs out. Well, it turned out that the speed at which we were able to do it was a hell of a lot faster than the other way, where every single person did all the steps. We got speed with this system that was the predicted speed for the I.B.M. machine.

By organizing the human computers at Los Alamos as though they were I.B.M. tabulators, Feynman was able to match the speed of the I.B.M. machines. His “numerical steps” were literally a computer program, albeit a simple one carried out by people cranking noisy calculating machines. In 1945, Richard Feynman discovered something that remains true today: the problem of producing software is, first and foremost, a problem of organizing people.

The people being organized in modern software projects, of course, are the programmers and engineers writing the code, not human calculators performing it. But the development model for a software project can have just as large an impact on its success as Feynman’s system had on his room full of calculators. Since the difficult launch of Healthcare.gov on October 1st, the process used to develop it, as well as other large and troubled government software projects, has been sharply criticized. This process, sometimes called a sequential-design process, is better known by its nickname: the “waterfall model.”

Though he did not invent it, the software pioneer Winston Royce provided one of the clearest descriptions of waterfall development in his 1970 paper “Managing the Development of Large Software Systems.” Royce lists seven steps for software development: begin with a full description of what the software needs to do, create detailed specifications, analyze these specifications, create a program design, write the code, test it, and operate it.

Royce presents this method in the form of a diagram, with each of the seven steps arranged diagonally, from “System Requirements,” in the top left, to “Operations,” in the bottom right. Each box is linked to the next by a curved arrow, implying that the whole process should flow continuously from one step to the next, like a waterfall. It is worth noting that, immediately after presenting this diagram, Royce explains that while you can analyze requirements and describe how you expect the software to work, until the testing phase, you can never be sure how it will actually work. He goes on to predict the fate of many projects that would eventually use this development model, writing that, if testing fails, “the required design changes are likely to be so disruptive that … the development process has returned to the origin and one can expect up to a 100-percent overrun in schedule and/or costs.”

Royce’s own recommendation is to plan for the likelihood of doing the entire thing a second time. In a section called “Do it Twice,” Royce writes that “if the computer program in question is being developed for the first time,” that is, if the problem being solved is a new problem, as Healthcare.gov certainly was, “arrange matters so that the version finally delivered to the customer for operational deployment is actually the second version.” His recommendation is to build the software once, learn what you did wrong, throw the first draft away, and build it again correctly. In practice, waterfall-style software projects are virtually never run this way; time and budget constraints always leave participants gambling that the first draft will work well enough, and that anything that goes wrong can be fixed on the fly.

On October 24th, the C.G.I. Federal senior vice-president Cheryl Campbell testified before the House Committee on Energy and Commerce about the problems with Healthcare.gov. She said, “C.G.I. Federal, and the many other contractors selected to develop the Federal Exchange, perform under the direction and supervision of C.M.S.” C.M.S. is the Centers for Medicare and Medicaid Services, which Campbell described as the “systems integrator, or ‘quarterback,’ on this project, and … the ultimate responsible party for the end-to-end performance of the overall Federal Exchange.” She then noted that full testing of the complete Healthcare.gov site only occurred “in the last two weeks of September.”

Healthcare.gov involved fifty-five different contractors that each delivered a piece of the final system to C.M.S. for integration and testing. Whatever development processes those contractors used internally, the overall project was run according to waterfall principles. If the first testing of the complete system occurred in the last two weeks of September, then Healthcare.gov was only halfway through what Royce would have described as a complete waterfall development process. The experience for millions of Americans in the weeks since, predictably, has not been good.

In a Times editorial, the former Presidential Innovation Fellow Clay Johnson and the former Obama for America chief technology officer Harper Reed recommend the “adoption of modern, incremental software development practices, like a popular one called Agile, already used in the private sector” but rarely, if ever, used in government projects. Agile software development was first named in a brief document called the Agile Manifesto, which emerged from a meeting of seventeen programmers in February, 2001, at Snowbird Resort in Utah. The entire manifesto reads:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.

This concise statement sought to unify a number of so-called lightweight software-development methods that had been developed, in the nineteen-nineties, in opposition to heavyweight software-development techniques, particularly the waterfall model. In practice, agile development methods emphasize rapid iterations of planning, coding, and releasing software in close consultation with the end user or client. An agile version of the Healthcare.gov project, for example, might have first released just the log-in component, or “front door,” for public use, before developing any of the tools to find and buy insurance plans. After the public had interacted with this one small component, the next piece would be added. An agile Healthcare.gov would have evolved, over time, in incremental steps that were always subjected to real-world use and evaluation. Instead of the twenty-two-month timeline of the project as it was built, an agile Healthcare.gov would have had release cycles measured in weeks or days. The first version of the site was purely informational, but could have served as the initial iteration in an agile development process. Instead, it was replaced by the October 1st launch.

The waterfall development process derives from physical industries like construction and mechanical engineering, where a failure of design or implementation would either be extremely costly to fix, life-threatening, or both, resulting in a strong emphasis on detailed specifications at the start of any project. But underlying agile software development is the fact that building software is fundamentally not like building skyscrapers. Software is imaginary—it is text—and agile development treats software more like a story being written by the developer and the client than like a mechanical engineering project.

The irony is that the principles espoused in agile development are the very same principles Richard Feynman embraced in his work with the human computers at the Manhattan Project, a giant multinational, government-run technical program. His idea to run the I.B.M. programs on a “machine” made of people operating Marchant calculators encapsulates all four of the of the Agile Manifesto’s more-valued items: individuals and interactions, working software, collaboration, and responding to change. Feynman’s time at Los Alamos, and indeed the Manhattan Project in general, was marked by this kind of flexible response to technical challenges. With all the attention now focussed on the failure of Healthcare.gov, we could learn a lot from Feynman’s approach.

Rusty Foster is a computer programmer and writer who lives in Maine.

Photograph by Andrew Harrer/Bloomberg via Getty

Daily