Key Players
You and I work for Amalgamated Mechanical Incorporated. The company is hot about creating a new medical procedure robot. We're on the reliability team.
It's critical that our robot get out there quickly, because our #1 competitor is on the verge of introducing a similar product. To get a jump on them, our product needs to hit the market in 10 monthsâ time.
According to the program plan, the Accelerated Life Testing (ALT) for the robot's arm should start next week. As far as we know, we expect to receive the arm samples in three months. If the ALT testing begins in three months, however, there's little chance we'll have accurate predictions on how and when the product may wear once it hits the market.
For all practical purposes, even if we provide that testâbased prediction for the arm's wearâout failure rate a few months before release, it will be of little value. There's no alarm big enough to sound that would delay release, based on premature wearâout. Even if we discovered that the product wore out, not in the promised five years of normal use but in one solitary year, the leaders would still release on schedule. (After all, they'd reason, we have to beat the competition to market.)
It's been said before by our VP: âWe'll release it now and get a fix out there quickly. We already have a punch list for version 2.0.â
In many ways, the project has been designed to fail. For me, it feels like trying to stop a freight train that's built up a head of steam. Stepping in front of it creates a mess, and the train still pulls into the station on time.
Half the program's reliability testing was to provide input for program decisions:
- âHow confident are we that the arm will reach its lifeâandâreliability goals?â
- âWhat's the robot's endâofâlife failure mode?â
- âShould we create a preventative maintenance cycle or shorten the robot's promised life to customers?â
This is critical information in a product development program and we don't have it yet.
Why does it always go this way? It's actually made me think about changing disciplines to something other than âreliability engineeringâ. Why have a career focus that doesn't improve products and is often just a checkâtheâbox nicety?
The product did indeed release on time. The reliability growth (RG) testing showed low statistical confidence in the goal reliability. This is the most critical assembly and we don't believe it will work as it should, and we're going to release it anyway. This is crazy! The ALT testing was never finished, because there was a design change and we didn't receive new arms to test. So we don't know when it'll wear out. How scary is that?
These arms could start failing in large numbers in the customers' hands, because of a predictable wearâout failure mode. Statistically, the majority of the population will fail at this point, with a nice bell curve outlining the full population. More than half of the Failure Mode and Effects Analysis (FMEA) highârisk actions weren't addressed. Some of these actions had âuser harmâ in the severity ranking.
We released on time. About four months after product release, the field failure rate began to spike. Two specific failure types were dominant. The linear X axis bearing and a plunger that penetrates a consumable. Both see high cycles in use, and both were known to be high risk due to changes in the most recent design.
These spiking field failures were the main topic in every Friday steering meeting and hallway conversation. If someone had information, the CEO wanted to hear it. If there were no updates on the root causes and fixes, he yelled about wanting to know what everyone was doing all day.
For a reliability manager, the entire process was depressing. No real value was delivered from our team's work. As a matter of fact, we were usually seen as a nuisance â almost as if we were an outside regulatory organization, but without authority. Something akin to a kid on a Big Wheels pulling over highway motorists and issuing traffic tickets in crayon.
Now that there are high fieldâfailure rates, people are murmuring, âWell, there's no single person to blame for all these failures⌠but⌠aren't you the reliability team? Why did you let this happen?â
As I said, I find myself thinking of abandoning the discipline I love, because it can feel pointless.
Taking a step back, I thought about the program's reliability experience from all the other roles involved. The project managers received bonuses for releasing the product on time. The R&D engineers were promoted and assigned to topânotch programs because of the features they developed. This was all celebrated at a fancy hotel with an offsite party and a band.
OK, that all happened at release. But what happened when the robotic arm assembly began to fail early in life? Surely, that was the moment of reckoning, wasn't it?
This is the team's experience when the failure rate spiked four months after release: they were called together as a âTiger Team.â This means they were borrowed from their new programs, because they were supposedly the only people who could save us: our âheroes!â
During this recovery phase, the Tiger Team got regular facetime with the CEO. Facetime with the CEO is a key element in someday climbing into upper management. Many of us would take a CEO faceâtoâface over a 20% raise. For the Tiger Team, this experience was pure gold.
Then, when the field issues were solved and the company was saved, there was a celebration with a festive banner and a big iceâcream cake.
As the legendary management leader Edwards Deming said: âOne gets a good rating for fighting a fire. The result is visible; can be quantified. If you do it right the first time, you are invisible. You satisfied the requirements. That is your job. Mess it up, and correct it later, you become a heroâ [1].
So, in summary, they were rewarded for casting reliability aside to enable meeting only one goal associated with their role: time to market, new features, or cost point. They were then rewarded again for fixing the field problems they themselves created.
Remarkably, the team was doubly incentivized to deliver a product that was unreliable. How could this be? Why did the program's executives engineer things this way? After all, it hurt them most.
But if I'm making it seem like everyone involved in this failed program was rewarded, I'm confusing the issue. People were indeed punished. Who? Those who really wanted to create a product that was reliable. What happens to those people? The next story shows that they have one of two paths.
Follow the Carrot or Get Out of the Race
The leadership of a large 90âyearâold company asked me to evaluate their culture. In my report, I included a story. It had passion, reward, and, most importantly, punishment â all the elements of a Greek tragedy. So I wrote the story and got a response better than I'd ever hoped for. Here's the story:
I began my investigation with the question: âWhy is reliability missing from most engineers' work, even though we promote it so assertively as a core value?â
Walking around the company's halls, I saw posters that underlined how seriously they took quality and reliability. These posters bore slogans like âOur customers count on us for reliable productsâ and âOur product reliability is YOUR legacy.â
The hierarchy even jammed the word âreliabilityâ as many times as they could into all their speeches. For instance, at the annual R&D offâsite the CEO delivered an 11âminute talk, and used âreliabilityâ eight times. That's almost one âreliabilityâ per minute. I'd already been there long enough to understand the hypocrisy. That's why I was counting.
The company liked to hand out reliability awards. But these awards were largely empty. They rarely included bonus money or anything that resembled actual career growth.
It was easy to see management's true motivation. The late quality guru Philip Crosby said, âAn organization is a reflection of its management team.â There's no hiding what the boss truly values. Where this is most obvious is with things like sizable bonuses or meaningful promotions.
At this company, I saw engineers and developers rewarded when their product was on budget and on schedule. There's nothing wrong with handing out rewards for these accomplishments. Unfortunately, these were the only things for which the engineers and developers were rewarded. More notable accomplishments â like excelling at the full set of program objectives â were ignored.
To the team, upper management's rewardâincentivized message was clear: âThis is what we really value.â The next level of management down had no choice, then, but to prioritize these same onâbudget, onâschedule metrics above all others.
The written report I would later send to the organization's hierarchy had only two characters, Engineer #1 and Engineer #2. (Those weren't their real names. If they had been, it would have shown some amazing foresight by their parents.) These characters were a composite of the actual engineers on that team.
They differed from each other in a very important way. While they were both good engineers:
- Engineer #1 focused on budget and schedule, because she wanted the bonus and was ambitious enough to crave a promotion. She was attuned to what her management team valued, so she behaved accordingly.
- Engineer #2 followed her inna...