Pages

Wednesday, January 13, 2016

A New Source for Using Patent Application Data for Empirical Research

Getting detailed patent application data is notoriously difficult. Traditionally, such information was only available via Public Pair, the PTO's useful, but clunky for bulk research, interface for getting application data. Thus, there haven't been too many such papers. Sampat & Lemley was an early and well known paper from 2009, which looked at a cross-section of 10,000 applications. That was surely daunting work at the time.

Since then, FOIA requests and bulk downloads have allowed for more comprehensive papers. Frakes & Wasserman have papers using a more comprehensive dataset, as does Tu.

But now the PTO has released an even more comprehensive dataset, available to the masses. This is a truly exciting day for people who have yearned for better patent application data but lacked the resources to obtain it. Here's an abstract introducing the dataset, by Graham, Marco & Miller -- The USPTO Patent Examination Research Dataset: A Window on the Process of Patent Examination:

A surprisingly small amount of empirical research has been focused on the process of obtaining a patent grant from the United States Patent and Trademark Office (PTO). The purpose of this document is to describe the Patent Examination Dataset (PatEX), make a large amount of information from the Public Patent Application Information Retrieval system (Public PAIR) more readily available to researchers. PatEX includes records on over 9 million US patent applications, with information complete as of January 24, 2015 for all applications included in Public PAIR with filing dates prior to January 1, 2015. Variables in PatEX cover most of the relevant information related to US patent examination, including characteristics of inventions, applications, applicants, attorneys, and examiners, and status codes for all actions taken, by both the applicant and examiner, throughout the examination process. A significant section of this documentation describes the selectivity issues that arise from the omission of “nonpublic” applications. We find that the selection issues were much more pronounced for applications received prior to the implementation of the American Inventors Protection Act (AIPA) in late 2000. We also find that the extent of any selection bias will be at least partially determined by the sub-population of interest in any given research project.
That's right, data on 9 million patent applications - the patents granted, and the patent applications not granted (after they became published in 2000). The paper does a comparison with the internal PTO records (which shows non-public applications) to determine whether there is any bias in the data. There are a few areas where there isn't perfect alignment, but the data is generally representative. That said, be sure to read the paper to make sure your application is representative (much older applications, for example, have more trouble aligning with USPTO internal data).

The data isn't completely straightforward - each "tab" in public pair is a different data file, so users will have to merge them as needed (easily done in any statistics package, sql, or even with excel lookup functions).

Thanks to Alan Marco, Chief Economist at the PTO, as well as anyone else involved in getting this project done. I believe it will be of great long term research value.

In my next post, I'll highlight a recent paper that uses granular examination data to useful ends.

No comments:

Post a Comment