Valuing Wikimedia Commons Images

Several years ago, both Lisa and I wrote about Heald, et al.'s study that attempted to value public domain photographs as used on Wikipedia. While I liked the study a lot, two of my chief critiques were small sample size and unclear value of hits on Wikipedia pages.

A new paper extends their study, and provides even more evidence of the extensive use of Wikimedia Commons photos. In What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use, Kris Erickson (University of Leeds), Felix Rodriguez Perez (Independent), and Jesus Rodriguez Perez (University of Glasgow), have attempted to generalize the findings from the prior study. The paper is published in an ACM conference proceeding, but is available without a paywall on SSRN. The abstract is here:
The Wikimedia Commons (WC) is a peer-produced repository of freely licensed images, videos, sounds and interactive media, containing more than 45 million files. This paper attempts to quantify the societal value of the WC by tracking the downstream use of images found on the platform. We take a random sample of 10,000 images from WC and apply an automated reverse-image search to each, recording when and where they are used ‘in the wild’. We detect 54,758 downstream uses of the initial sample and we characterize these at the level of generic and country-code top-level domains (TLDs). We analyze the impact of specific variables on the odds that an image is used. The random sampling technique enables us to estimate overall value of all images contained on the platform. Drawing on the method employed by Heald et al (2015), we find a potential contribution of USD $28.9 billion from downstream use of Wikimedia Commons images over the lifetime of the project.
In one fell swoop, the authors have answered my two concerns. The random sample is much larger, and their search went far beyond Wikipedia, to commercial and non-commercial uses. It turns out that the images were used a whopping 5.4 times each on average, which is a lot of usage when extrapolated to the millions of images in the Commons.

As with the prior study, estimating the value is a bit back of the envelope. Assuming that every commercial (and non-commercial) user would have paid the Getty Images fee is a big assumption, as many might have substituted to homegrown photos or maybe no photo at all. The authors note that this is a big assumption. Another issue is that not every item in the commons is within copyright, and my have been findable by other means.

That said, I do not think the assumption detracts from the value of the Wikimedia Commons for two reasons. First, they report Getty having revenues of nearly $1 billion per year, so finding $28 billion value over the lifetime of the WC is perhaps not far-fetched. Second, even if people would not pay the full amount, they might have been willing to pay less than the Getty fee (which also includes some public domain items). In the absence of WC, the differences between what they would have paid and what they get (either nothing or homegrown or search costs) is deadweight loss.

I frankly had no idea that Wikimedia Commons was used so much, but I'm glad that there's competition in the stock photo market. I'll finally note that the discussion about which images get used is an interesting one. It turns out-just like Netflix, Facebook, and Twitter-the stuff that gets curated for you is the stuff you wind up seeing and using.

