On getting XKCD about Wikipedia pages leading to philosophy
The blogosphere, twitterverse, hacker community and social media sites have been abuzz with discussion about yet another xkcd posting. The image simply hints at the perceived “increase” in IQ from surfing Wikpedia. The hover text of the is generating the discussion spike on how repeatedly following the first link on Wikipedia pages will eventually lead wiki surfers to the philosophy page:
![]() |
However, this bit of Wikipedia trivia lorewhich is really just another version of Wiki Golf, has been known for some time. For instance, the Wikipedia page devoted to the philosophy link surfing game has existed since 2008. This game is also only one of many Wikipedia games, to include the image-guessing game suggested recently on Thought Puzzle. The philosophy game often appears on social media sites, such as reddit (at least twice), Digg, FunnyCrave, and a Google Profile. One comment writer also made the astute tongue-in-cheek observation that “an infinite Greek loop *is* philosophy.”
In the spirit of determining Kevin Bacon and Erdös numbers, this game is also simply referred to as determining the philosophy number of a Wikipedia page. Wiki editing wars also appear to be experiencing a brief spike to skew the philosophy number, where some edits are performed in an attempt to force create yet another link [eventually] leading to philosophy, and other edits to disrupt the so-called phenomenon. That said, numerous examples have survived the editing wars, such as:
- God → English → West Germanic languages → Germanic languages → Indo-European languages → Language family→ Languages → Language → Human → Extant taxon → Biology → Natural Science → Science → Knowledge → Fact → Information → Sequence → Mathematics → Quantity → Property (philosophy) → Modern philosophy → Philosophy (21)
- Kevin Bacon→ Animal House → Comedy film → Film → Recording → Process → Mortgage loan → Loan → Debt → Asset(ignored owed, which goes off-site) → Financial Accountancy → Accountancy → Business → Organization → Social group → Social cohesion → Social policy → Quality of life → International development → Concept → Cognitive → Science → Knowledge → Fact → Information → Sequence → Math → Quantity → Property (philosophy) → Modern philosophy → Philosophy (31)
- Paul Erdös → Hungary → Landlocked country → Country → Geography → Earth → Planet → Orbit → Physics → Natural science → Science → Knowledge → Fact → Information → Sequence → Mathematics → Quantity → Property (philosophy) → Modern philosophy → Philosophy (20)
- Rome → Capital city → Seat of local government → Local government → State → Social sciences → List of academic disciplines → Academia → Community → Interaction → Causality → Events → Philosophy (13)
Once at philosophy, the reader will enter a loop that currently consists of Philosophy → Metaphysics → Philosophy. Some pages simply lead to other loops, of which one current example consists of Numerary → Supernumerary → Numerary. Many macro-pointers also exist, for insistence any page leading to Science eventually leads to Mathematics, which eventually leads to Philosophy.
Some folks have implemented the philosophy game, such as this code base at github. Another enterprising individual graphed the distance from philosophy for 50 random Wikipedia pages, and again for 200 random Wikipedia pages. Perhaps the most enterprising implementation is a site that automaticallys traverse the first-link paths to determine the path to philosophy. A similar capability also appears to be available from Ryan Elmquist’s site. We note these examples often implement a variant of the philosophy game, in that they do not always appear to follow the first link on each page.
So why does the philosophy game appear to work? From an intuitive perspective, the general writing style for Wikipedia pages is to describe a given topic as an instance of another [broader] topic. So the general theory percolating around the web is that most topics will hierarchically be placed under some broad topic umbrella, such as philosophy. For instance, the limit set topic begins: “in mathematics, especially in the study of dynamical systems, a limit set…” Similarly, the comedy film topic begins: “comedy film is a genre of film in which the main emphasis is on humour…”
More rigorously stated, an attractor set is the limit set towards which a dynamic system evolves, e.g., pages such as philosophy and numerary. Similarly, a limit cycle describes the behavior whereupon a surfer arrives at philosophy and follows the first link to metaphysics, only to loop back to philosophy. Numerary → Supernumerary → Numerary is also a limit cycle, whereupon arriving at numerary or supernumerary will cause the surfer to oscillate between those pages ad infinitum [unless perturbed by a Wiki edit].
As with most web link traversal analyses, we can interpret the philosophy game onto a restricted Markov chain, whereupon the next state is based on following the first link from each page. Sending this input to the venerated PageRank algorithm, and allowing our random surfer to escape a given limit cycle by randomly visiting any other site, we can determine which Wikipedia page(s) we are most likely to visit, i.e., compute the stationary distribution with respect to following the first link from each page.
To determine the stationary distribution of the Wikipedia data set would require extracting the first link on each page, and then constructing a sparse matrix of those links. Given the English version of Wikipedia contains over 3,645,000 articles at last count, enterprising folks may want to start with a subset of this version or a version of Wikipedia in a different language. Those undeterred can access archived database dumps of these sites in English or for other languages.
We can also estimate this ranking empirically, at least for the top k pages at which we are most likely to arrive, as done by Kevin Stock, who reports that 93.39% of the links he tested lead to philosophy. His analysis reports link statistics, i.e., 93.39% of pages yield a path that reaches philosophy. However, he does not appear to have applied the PageRank algorithm, i.e., computed a stationary distribution with respect to each page’s “first link”.
Of course, the philosophy game can be played elsewhere around the web, e.g., by following the first word in the definition of a word at Wiktionary (for example, gable roof eventually yields a limit cycle containing course and racecourse). This version of the game should be familiar to those folks who spent time chasing circular definitions in a dictionary or thesaurus. This game can also be played by simply surfing the first link on sites around the web.
Such restricted link traversals might add an interesting additional signal to measure for search engines such as Google. However, such link games also serve to highlight why link traversals and position are not the only signal that should be used to determine if the “next page” is relevant to your current search. Bottom line, just because a link is first doesn’t mean its important.
– browse Social Media channel or 5/30/2011 entries
– print
entry written by Chris Augeri
– subscribe to Thought Puzzle
