Monday, June 13, 2016

On Empirical Studies of Judicial Opinions

I've always found it odd that we (and I include myself in this category) perform empirical studies of outcomes in judicial cases. There's plenty to be gleaned from studying the internals of opinions - citation analysis, judge voting, issue handling, etc., but outcomes are what they are. It should simply be tallying up what happened. Further, modeling those outcomes on the internals becomes the realest of realist pursuits.

And, yet, we undertake the effort, in large part because someone has to. Otherwise, we have no idea what is happening out there in the real world of litigation (and yes, I know there are detractors who say that even this isn't sufficient to describe reality because of selection effects).

But as data is easier to come by, studies have become easier. When I started gathering data for Patent Troll Myths in 2009, there was literally no publicly aggregated data about NPE activity. By the time my third article in the series, The Layered Patent System, hit the presses last month (it had been on SSRN for 16 months, mind you) there was a veritable cottage industry of litigation reporting - studies published by my IP colleagues at other schools, annual reports by firms, etc.

Even so, they all measure things differently, even when they are measuring the same thing. This is where Jason Rantanen's new paper comes in. It's called Empirical Analyses of Judicial Opinions: Methodology, Metrics and the Federal Circuit, and the abstract follows:

Despite the popularity of empirical studies of the Federal Circuit’s patent law decisions, a comprehensive picture of those decisions has only recently begun to emerge. Historically, the literature has largely consisted of individual studies that provide just a narrow slice of quantitative data relating to a specific patent law doctrine. Even studies that take a more holistic approach to the Federal Circuit’s jurisprudence primarily focus on their own results and address only briefly the findings of other studies. While recent developments in the field hold great promise, one important but yet unexplored dimension is the use of multiple studies to form a complete and rigorously supported understanding of particular attributes of the court’s decisions.

Drawing upon the empirical literature as a whole, this Article examines the degree to which the reported data can be considered in collective terms. It focuses specifically on the rates at which the Federal Circuit reverses lower tribunals — a subject whose importance is likely to continue to grow as scholars, judges, and practitioners attempt to ascertain the impact of the Supreme Court’s recent decisions addressing the standard of review applied by the Federal Circuit, including in the highly contentious area of claim construction. The existence of multiple studies purportedly measuring the same thing should give a sense of the degree to which researchers can measure that attribute.

Surprisingly, as this examination reveals, there is often substantial variation of reported results within the empirical literature, even when the same parameter is measured. Such variation presents a substantial hurdle to meaningful use of metrics such as reversal rates. This article explores the sources of this variability, assesses its impact on the literature and proposes ways for future researchers to ensure that their studies can add meaningful data (as opposed to just noise) to the collective understanding of both reversal rate studies and quantitative studies of appellate jurisprudence more broadly. Although its focus is on the Federal Circuit, a highly studied court, the insights of this Article are applicable to virtually all empirical studies of judicial opinions.
I liked this paper. It provides a very helpful overview of the different types of decisions researchers make that can affected how their empirical "measurement" (read counting) can be affect and thus inconsistent with others. It also provides some suggestions for solving this issue in the future.

My final takeaway is mixed, however. On the one hand, Rantanen is right that the different methodologies make it hard to combine studies to get a complete picture. More consistent measures would be helpful. On the other hand, many folks count the way they do because they see deficiencies with past methodologies. I know I did. For example, when counting outcomes, I was sure to count how many cases settled without a merits ruling either way (almost all of them). Why? Because "half of patents are invalidated" is very different than "half of the 10% of patents ever challenged are invalidated" are two very different outcomes.

Thus, I suspect one reason we see inconsistency is that each later researcher has improved on the methodology of those who went before, at least in his or her own mind. If that's true, the only way we get to consistency now is if we are in some sort of "post-experimental" world of counting. And if that's true, then I suspect we won't see multiple studies in the first place (at least not for the same time period). Why bother counting the same thing the same way a second time?