Rethinking Patent Citations

Patent citations are one of the coins of the economic analysis realm. Many studies have used which patents cite which others to determine value, technological relatedness, or other opaque information about a batch of patents. There are some drawbacks, of course, including recent work that questions the role of citations in calculating value or in predicting patent validity.

But what if citing itself has changed over the years? What if easier access to search engines, strategic behavior, or other factors have changed citing patterns? This would mean that citation analysis from the past might yield different answers than citation analysis today.

This is the question tackled by Jeffrey Kuhn and Kenneth Younge in Patent Citations: An Examination of the Data Generating Process, now on SSRN. Their abstract:
Existing measures of innovation often rely on patent citations to indicate intellectual lineage and impact. We show that the data generating process for patent citations has changed substantially since citation-based measures were validated a decade ago. Today, far more citations are created per patent, and the mean technological similarity between citing and cited patents has fallen significantly. These changes suggest that the use of patent citations for scholarship needs to be re-validated. We develop a novel vector space model to examine the information content of patent citations, and show that methods for sub-setting and/or weighting informative citations can substantially improve the predictive power of patent citation measures.
I haven't read the methods for improving predictive power carefully enough yet to comment on them, so I'll limit my comments to the factual predicate: that citation patterns are changing.

As I read the paper, they find that there is a subset of patents that cite significantly more patents than others, and that those citations are attenuated from the technology listed in those patents -- they are filler.

On the one hand, this makes perfect intuitive sense to me, for a variety of reasons. Indeed, in my own study of patents in litigation, I found that more citations were associated with invalidity findings. The conventional wisdom is the contrary, that more backward citations means the patent is strong, because the patent surmounted all that prior art. But if the prior art is filler, then there is no reason to expect a validity finding.

On the other hand, I wonder about the word matching methodology used here. While it's clever, might it represent patentee wordsmithing? People often think that patent lawyers use complex words to say simple ideas (mechanical interface device = plug). Theoretically this shouldn't matter if patentees wordsmith at the same rate over time, but if newer patents add filler words in addition to more cited patents, then perhaps lack of matching words also reflect changes in data over time.

These are just a few thoughts - the data in the paper is both fascinating and illuminating, and there are plenty of nice charts that illustrate it will, along with ideas for better analyzing citations that I think will deserve some close attention.

