AHyperproductiveTelecommunicationsDevelopmentTeam

This chapter is distilled from a paper prepared in the spring of 1994 shortly after Jim Coplien studied the team in question.

The exact identity of the team is withheld for two reasons. One relates to the propriety of information about the product at the time the study was done. But, furthermore, the team asked not to be identified. They were concerned at the time that if they were identified as having built a better mouse trap, that the world would beat a path to their door asking them for process advice. They didn't want to be in that business; they didn't want to be distracted from doing what they enjoyed doing. But other than omitting the name of the product, we've included many particulars of data that we hope will make the group more tangible, and that will answer questions about the viability of the group's approaches.

The report, as originally written follows:

I had the pleasure of meeting with the entire development team for a small network platform being built by a Network Systems organization in AT&T Bell Laboratories on February 17, 1994. This project is among the most interesting I have studied. The organization has some of the best team dynamics of any I have observed anywhere. The people find their work challenging, stimulating, and rewarding. This organization is likewise productive, with 200 KNCSL to their credit at the hands of six developers over 15 months. That interval includes conceptualization and design. That code count does not include a similar number of lines purchased externally or reused from existing internal projects.

Many of the tenets, practices, and characteristics of this project are eerily reminiscent of Borland's Quattro Pro for Windows (QPW) team, the most highly productive organization I have studied [BibRef-Coplien1994]. he project is unique in many of its own ways, too — unique, perhaps, in the sense that the experience could not be easily reproduced elsewhere. Nonetheless, this project provides another data point in our study of hyperprogramming (very productive) organizations (for a current total of two such data points). We noted that the two organizations resemble each other in many ways, ways that perhaps portend high productivity and quality of work life. These factors are worth exploring.

Might their development process have something to do with all this? Contemporary management thinking holds process to be a dominant factor in quality and productivity. The organization's process and organization are indeed the source of their power, but the process is off the beaten path. Our research was attracted to the organization because of its emphasis on parallelism, taken almost to extremes, with astounding results.

This organization has been around in one form or another for about 15 years. They have a long history of prototyping and building small systems. About four years ago, the organization started working on trials to prove in their product concepts. Development started in earnest about two years ago. The development team currently has about 8 people. Most (all but two at the debriefing) have families. The group is demographically diverse.

The project has an excellent history of meeting impossible delivery dates, owing to much hard work. The team typically works 50- to 60-hour weeks, some working 60- to 70-hour weeks over a five-month spurt. The team is egalitarian in the sense that everybody writes code, but is non-egalitarian in that everybody brings their own realm of expertise to the table (which is an important factor we explore later under "Code Ownership.")

People do much of their own risk management. As the meeting got started, Peter told how he was going to add line splitting to the architecture. That was going to make more work for Pat. Pat was playfully unhappy about the change, but it was a design change the team had decided some time ago that it had wanted. The project had been granted a one-month extension in its schedule, and Peter had taken the initiative to do a redesign of the part of the system that had been causing them to use resources inefficiently.

Parallelism is key to the organization's success. The organization got its start when presented with an ambitious scenario:

Conceptualize and deliver a system prototype in four to five months. Almost coincidentally, the requirements, testing, and design all converged on the same date. It worked, and converged faster than anyone imagined possible. The small team size, the excellence in systems engineering, and lack of dogmatism among team members were major factors in the success of the prototype. The organization became more introspective about their concurrent engineering approach to development, and turned it into a way of life for themselves. The technique that made the prototype successful was carried into the development of the product itself. It is this way of life that I had come to study.

And the introspection isn't complete. There is still a feeling that part of what makes them successful is purely instinctive. Peter even worried that if I surfaced process understanding into their consciousness, that it would affect the way they worked--because it would establish a new introspection framework--and potentially damage the delicate balance of magic that propelled them to success. While there is a slight chance that the Heisenberg phenomenon could take them in that direction, there is equal or greater probability that such a discussion could open their eyes to possibility for improvements.

The programming language is C, the development environment is UNIX. (Note that they use neither object-oriented approaches nor C++, staples that have become stereotypically associated with high productivity and best current practices.) The product has performed well in the field. Three installations have been running for 7 months, with only 3 unplanned outages to their credit totaling less than 8 hours: better than 99.94% uptime. The total number of faults found in the field has been about 25, out of which 20 have been addressed at this writing.

The organization's culture, self-image, and process have a rich human element that precipitates from the small team environment. These issues merit their own section later in this chapter.

Figure 1. RAD of the Process

Figure 1 shows a structured flowchart of the process, called a Role-Activity Diagram or RAD. The diagram misses many of the interesting interactions between people enacting the Developer role. It also misses the richness of interaction between the Ambassador and his 50 or so contacts external to the project. The Ambassador, like Allen's gatekeeper [BibRef-Allen1977], handles most of the external project technical interfaces.

Parallelism can be seen throughout. Design might start before system engineering. Requirements continue to change and accumulate after coding has started, and sometimes after performance verification.

Figure 2: Adjacency Diagram (Natural Force-Based Placement)

I analyzed the interview data using the Pasteur organizational analysis tools [BibRef-CainCoplien1993]. Figure 2 shows the adjacency diagram, or force-based communication network diagram, for the development organization. The graph has two communication "hubs" at the Developer and Ambassador roles, respectively. We generate coupling metrics from the same model used to build the adjacency diagram. Coupling per role is 41%, about at the median but far above the mode and mean for all processes we have studied. It is about half the value of 89% for QPW.

There is an amazingly even distribution of work across the project. The Mad Artichoke, Ambassador, Manager, and Service Development roles all share the same degree of coupling to the process as a whole. Hacker, Domain Experts, Service Management, Product Management and Performance Verification are slightly less coupled. As we have found in most organizations, the Developer role is more tightly coupled to the process as a whole than any other single role.

It is rare that we find an organization with an architect, and rarer still that the architect occupies a central position. In this network platform development, the Mad Artichoke (architect) role is more coupled to the process as a whole than any role except Developer, which links every role in the communications model (This, again, is reminiscent of QPW.) Much of the communication burden that normally falls on the developer's shoulders is taken on by Ambassador, which is a secondary hub in the organization structure. This role fits Allen's description of the ``gatekeeper'' role exactly: again, reminiscent of QPW.

The centrality of the architect is reminiscent of QPW. In QPW, Quality Assurance was more central than we find in this organization.

Figure 3: Interaction Grid

Figure 3 shows the interaction grid for the project. The picture is curiously asymmetric. The large blank space at the top occurs because roles outside the process are not approached to do work; they supply work, constraints, and input to the project. That anomaly aside, communication patterns are distributed evenly across the organization. Such an even spread of connectivity is rare in the processes we have studied, but it was a characteristic of the Borland QPW organization.

The process and culture have a richly human side. This shows up in how the group talks about itself, as well as in its organization and process. I explore three aspects of the human issue here. how people issues are integrated into the process, the anthropomorphizing of code, and management practices.

Engineering people issues into the process

The "high touch" flourish in this "high tech" environment became clear from the outset [BibRef-Naisbitt1984]. People invented outrageous role names to describe themselves: Mad Artichoke for the architect; Agitator; Code Police, Damage Control.

The "person-ality" of the project goes deeper. Consider the perceived responsibilities of role Agitator:

    • Keep team from getting too comfortable

    • Trigger discussions

    • Say things nobody wants to say

"Say things nobody wants to say"? Many conservative development organization cultures are loaded with "unmentionables" that plague progress by cutting off painful avenues of progress with taboos. In this organization, everything is open to criticism by anyone. As Steve Bauman (then a director at Bell Laboratories) once said, it is not "warm and fuzzy"; it is "open and productive." This behavior can be found in patterns like WiseFool and PublicCharacter. The Damage Control role has the following responsibility: "Repair inter-organizational and inter-personal damage".

The first-level manager plays a less authoritarian role in the process than in a typical corporate development setting. Most of his role is to provide support and to track project status, but "twisting arms" of people outside the team is also among his responsibilities. He sees his job as ensuring that the team has the best people possible, that they have the resources and time necessary to do a good job, and that outside interference and roadblocks are kept out of the developers' way. He also fills the Damage Control role.

Code Ownership and Programming Anthropomorphism

The project has strong code ownership that transcends release cycles. Everybody knows what everybody else is working on. Nobody changes anyone else's code, except in an emergency. If one programmer finds a bug in another's code, the person finding the bug asks the owner to make the change.

Code ownership creates an interesting project mentality that is difficult to codify, but which might be summed up in a wry comment from one developer: "We don't use ECMS or Sablime, [source management tools] so we need code ownership." Code ownership makes job responsibilities visible in the culture, rather than burying them in a tool.

Code ownership goes so deep that the project has anthropomorphized their software. Software anthropomorphizing is something taught in some analysis techniques, including the popular CRC technique for object-oriented analysis, [BibRef-Beck1991] but this project takes this to an unparalleled extreme. During scenario walk-throughs, you don't hear them saying, "the X module sends this message to the Y module," but rather, "A message comes in from Dara and goes over to Roman" or something analogous. "Now, Peter kills Pat" describes a signal sent between processes. One can go by the lab at night, and hear a programmer scream "Oh, Peter! Why did you do this?" as "Peter's" code reaches out and creates some system atrocity that makes the tester's life difficult. The code is strongly identified with the individual owning it.

Responsibility is a deep underlying value in this project. Ownership exists for its own sake, but if you own something, and you make a change, you have responsibility for it. Code ownership and the associated culture raise everybody's awareness, expectation, and assurance that such responsibility will be carried forward. It seems more powerful than having a tool to track down a change to an accountable individual: there is a mind-set that transcends the need or such version management tools in a small project. This makes site support and FOA activities easier: when a problem is found in the field, it is usually clear who needs to be brought in to fix it.

Will the project need version management as it grows? Possibly, but the market is constrained enough that multi-featurism may not become a serious problem. If they can coordinate releases for all sites (the market is believed able to bear 11 systems) then versioning may never be needed.

Growing a Garden

After the group session, I stopped by to debrief the department head on my findings. She briefly described her management philosophy, which she likened to gardening. Her main job, however, is to "keep the pests away." That, she said, is what a good project manager should do as well. Curiously enough, the role we ended up calling "project manager," we were initially going to call "smoke screen," because it distanced the development community from surrounding organizations.

On the way out, I ran into a manager from another project who in an unrelated context talked about "controlling the people who sit in the bleachers and throw rocks." Insulation appears to be an important and successful management strategy in our culture.

Rewarding Excellence

Traditional rewards like money and promotions are in short supply--but that doesn't constrain the intrinsic motivators which can be equally direct and even more effective. People enjoy their work here. Their talents are appreciated and the people are respected as individuals. The people are trusted: they are given much latitude, much responsibility, and are trusted to talk directly to customers. The issue of trust was also central to the Borland QPW team.

Their department head had this to say:

[M]uch of the reward is intangible...not something I as a manager give, but something I allow them to achieve. I give lots of personal attention to them mostly and try to create a fun, creative environment with challenging assignments. I try to personalize the whole set of interactions so that everyone thinks they are doing this to better ourselves and better our chances of getting more challenging, fun, creative work.

This approach is reminiscent of the "getting one's ticket punched" concept described in Soul of a New Machine, [BibRef-Kidder1981] a mentality common to Silicon Valley companies as well. The close coupling between influential management (in this case, a widely respected department head) and their reports is also reminiscent of the Borland environment.

As was true for Borland's QPW, this product is developed by a small team. Small teams can achieve results that would be impossible in a traditional organization. It makes anthropomorphism feasible. It gives everyone a feeling of connectedness. It smooths communication, and in fact enables communication dynamics that may lie at the heart of concurrent engineering.

"I think I'm going to need this soon," Bryan yells down the hall to Dave about a module with changes that must be coordinated with Bryan's fixes. It's mid-morning, and he knows that before the morning is over, he'll need Dave's module so he can test his own work out. Dave knows that unless he stops what he's doing and turns to the module Bryan asked for, that Bryan will become blocked. Dave drops what he's doing and moves to finish up the work he needs to do to support Bryan. At about 2:00, Dave yells down the hall, "Here it is." Bryan is now in shape to test after only a short delay, and Dave goes back to what he's doing, without ever having been idle.

An exceptional instance? No: interrupt mode is the modus operandi of the whole group, in the interest of minimizing wait states. It is the InterruptsUnjamBlocking pattern. Wait states can add substantially to product interval. The micro-parallelism of this process alleviates much of the blocking one finds in large projects. Just as in processor scheduling, interrupts reduce the latency to service a request. If the context-switch overhead is low enough, an interrupt-driven development's throughput will be about the same as for any other approach. It takes close-knit communications to make it work.

How do these communications take place? I asked if they held periodic team meetings. "Not if we can help it," was the reply. Team members have a small number of small three- or four-person meetings during the day. Yelling up and down the hallway is de rigueur. It is unlike Borland, where much more of the dialogue seems to have taken place at a round table under the banner of architecture. But the underlying principle — close-knit communication — is the same.

This approach leads to an unconventional view of time and schedule. Most software development projects are monochronic societies: They believe time adds up algebraically. This organization seems to be more polychronic: with parallelism and task shuffling, time becomes fluid and can be manipulated. The interrupt-driven nature can be somewhat nerve-racking, and carrying on in parallel with people outside the team (e.g., in front-end and back-end processes) can be uncomfortable. But the resulting productivity gains are high.

Code ownership can be maintained in the long term only if there is a solid high-level architecture with clear, explicit interfaces. This project should work to make their architecture more explicit, and to better formalize the interfaces. This will become increasingly important as development moves from initial product formulation to ongoing evolution.

Right now, there is no clearly identified role in this project to conduct arms-length black-box testing for faults. They are aware of this problem and are addressing it.

Bell Laboratories modular building construction may not be the most conducive to the team interactions that seem to nourish this team. Alternative architectures and room configurations might support the necessary interworking while maintaining the sense of "space" and privacy that has long been a valued aspect of the Bell Labs culture.

On a person-for-person basis, this organization is one of the most productive organizations we've studied. Such high productivity usually comes not only from good development and management practices, but from a high commitment of time and energy from its developers. Such behavior should be encouraged through the reward system, and by recognition, as it was at Borland.

The small team dynamics of this organization have been the dominant factor in its prodigious success: The high degree of parallelism, the interrupt-driven development, and the use of concurrent engineering, are all related to the team size. Other similarities to Borland QPW include the high degree of trust between members of the project; the tight coupling with respected and influential management; the centrality of the architecture function; tight code ownership and software anthropomorphism; and the even distribution of communication across all roles in the organization. These latter factors characterize a true team. Such distinguishing characteristics of organization and process should be carefully considered as key factors that differentiate highly productive organizations from most contemporary software development efforts, and the mature practices they use.