I find this dispiriting--I know that I'm not really paying attention to most of the research out there, and it makes me suspect that no one else is either. As I contemplate another day in the lab, I consider the strong probability that my next paper, like all my previous papers, will have the same impact and significance as a rain drop falling in the middle of the Atlantic. It's at this point I often find myself walking out of the lab to fritter my day away on the internet. Why bother?
This dour line of thought got me thinking--are we already approaching the completion of Borges' library? Could we be so awash in science that we've actually produced some measurable fragment of all possible science articles? I decided to crunch some numbers, hoping it would cheer me up.
First the depressing numbers. In a fantastic article, Arif E. Jinha estimates that in 2009 we reached the mark of 50 million for scholarly journal articles in existence (Jinha, 2010). Incredibly, this number is growing at an ever-increasing rate, with an estimated of 1.3 million articles published in 2006 (Bjork, Roos & Lauri, 2009), representing a 2.6% growth rate. Combining these two studies, I'd estimate about 52 million journal articles in existence here in the middle of 2011.
How well would these 52 million articles fill Borges' library? I've been thinking of some different ways to answer this question, and figured I'd start by narrowing the library down to titles and abstracts. The APA limits titles to 13 words and abstracts to between 150-250 words (the APA has a fetish for minutia like this; apparently this is what psychologists do instead of science). This is extremely stingy, but seems to be a good starting point.
First, titles. The OED estimates that there are at least 250,000 words in the English language. So, the number of total possible (not to say grammatical) titles would be 250,00013 = 1.49x1070. The current crop of 52 million articles would thus represent 3.5x10-63 of the problem space for titles. In other words, the bulk of all the world's collective scientific wisdom represents only a ludicrously small fraction of all the possible scientific output allowed by the APA's 13-word title rule.
Obviously, the numbers for abstracts are even more impossible to fathom. With the lower limit of 150 words, there would be 2.5x10750, a number MSExcel refuses to even contemplate. The current crop of 52 million articles would thus represent 1.25x10739% of all possible abstracts.
There are all kinds of caveats to these numbers, most of which are pretty boring (problems with repeats in titles/abstracts, many scientific terms are not in the OED, yadda, yadda, yadda). In turning this over in my mind, however, a couple of points seemed interesting to me:
- Borges imagined, as I've calculated here, a combinatorial library, with every possible permutation of a language expressed. How much smaller, however, would a grammatical library be? That is, what % of the possible word space is actually sensible? Moreover, how would you go about determining this number? I've thought of two strategies, which I may try. One would be actually randomly sampling some subset of the title space and reading the results. The other is doing a part-of-speech analysis on current titles to see if I can constrain the set of likely titles by existing patterns in word-usage.
- When we've figured it all out, what will be the title of the final science paper (in APA-mandated 13-words or less)? Will an ultimate theory of the universe be expressible in English (or any other existing language)?
References:
Björk, B-C., Roos, A. & Lauri, M. (2009). "Scientific journal publishing: yearly volume and open access availability" Information Research, 14(1) paper 391. [Available from 12 January, 2009 at http://InformationR.net/ir/14-1/paper391.html]
Jinha, Arif E. Article 50 million: an estimate of the number of scholarly articles in existence. Learned Publishing 23(3): 58-263. http://openurl.ingenta.com/content/xref?genre=article&issn=0953-1513&volume=23&issue=3&spage=258.
Wow this article really made me think!
ReplyDelete