Sunday, September 27, 2009

Peer review, Lehman Brothers, and the fate of semiconductor R/D

EDN Executive Editor Ron Wilson

Sep 21 2009 12:32PM | Permalink |Comments (6) |

This weekend I was talking with some friends and the topic of Lehman Brothers came up. I suppose it was timely, given our culture's morbid fascination with anniversaries of disasters. But being an engineering-inclined group, we moved fairly quickly from recrimination to diagnosis.

It seems clear, in the virtual reality of hindsight, that the economic collapse had several mechanisms. Some of these were underlying causes, which a very interesting Financial Times 16 September essay by banking executive William White—one of the few to warn ahead of time about the crisis--likened to accumulated fuel in a forest fire. These causes included inappropriately loose monetary policy and structural trade imbalances.

But other causes were more proximate, acting like detonators rather than fuel. Among these was a profound error in the statistical pricing models used to assign prices to derivative securities—especially those securities based on pools of mortgages. As I understand it, these models often assumed that the behavior of the individuals on the hook for the mortgage payments were statistically independent. The probability of one homeowner defaulting was not correlated with the number of other homeowners who had recently defaulted.

This assumption vastly simplifies the math in the model, and—tragically—it makes mortgage-backed securities appear far more valuable than they now are. So the error stayed in the models, even when some insiders pointed out that it was obviously silly and historically wrong. The result was that when a counter-example arose the market value of these securities suddenly dropped to about zero. Everyone stopped trusting anyone to tell them what the contracts were worth. That in turn undermined the solvency of institutions such as Lehman Brothers that both held a lot of these now-toxic things and was exposed to even more of them through various kinds of counterparty and buyback agreements. And that led to panic.

Enough finance. The reason I find this interesting is that a simple peer review process would have exposed the error in the models and made it so public that no investment-banking executive could have gotten away with telling his employees to forget about it and keep selling junk. But because these models were developed within competing companies, in secret, peer review had to be done by hint and rumor rather than open publication, and was not strong enough to prevent the catastrophe.

The same situation exists today—albeit with the potential for a much slower catastrophe—in our world. Much of the fundamental research that goes into physical models of devices, process models, and the mathematics within EDA tools has moved from places like universities, government laboratories, and open institutions such as Bell Labs or IBM Research into private R/D departments. This has put this work behind the security fence where it cannot receive the benefit of peer review. And that means when errors happen—and they will—the probability of spotting them before they deliver wrong results to a customer is reduced. The process of diagnosing them is necessarily slower and more expensive, and itself is error-prone.

This isn't a theoretical concern. For years, the EDA industry suffered so much from mathematical models that simply didn't apply to real-world silicon that it was nearly a standing joke. Individual companies moved to fix this not by opening their basic research to peer review, but by hiring silicon implementation people into their R/D teams. That puts one more pair of better-informed eyes on the job, but it doesn't do much to help the long tail of the error distribution. In this connection it's worth noting that almost the same statistical problem comes up in statistical static timing analysis as in securities pricing: you have to decide what to do about variations that are correlated, rather than uncorrelated, across the devices in a timing path. How you deal with that issue can have a big influence on the final timing distribution you come up with. And of course how each company chooses to approach the issue is a secret.

In conclusion, the withering away of open, peer-reviewed research in the semiconductor industry is not just the problem of long-term basic research that won't get done. It is the much more immediate problem that without adequate publication and peer review, the research that does get done has a higher probability of producing wrong results. And the first party to find out is like to be the customer, who may feel a lot like those investment managers holding huge piles of mortgage-backed securities.


No comments:

Post a Comment