Suppose we have these ten “pages”: A, B, …, J. They are listed in the table as both rows and columns. The links are the entries in the table.
First we suppose that they link to each other consecutively: A → B → C → … → J.
The PageRank scores, as well as the positions, of the pages are computed and displayed on the right. The scores for A, …, J are increasing in the obvious way: from 0 to .1. In particular, F is #5.
Next, suppose F adds a link to A: F → A. You can see that 1 has appeared in the first column of the table:
Now, suddenly F ranks first!
Yes, adding an outbound link has brought this page from #5 to #1.
The result certainly goes against our intuition. It shouldn’t be surprising though considering other problems with the math of PageRank.
This isn’t a new discovery of course. These are just a couple of references for you:
- The effect of new links on Google PageRank by Konstantin Avrachenkov and Nelly Litvak, Stochastic Models, 22:319–331, 2006.
- Maximizing PageRank via outlinks by Cristobald de Kerchove, Laure Ninove, and Paul van Dooren, Linear Algebra and its Applications 429 (2008) 1254–1276.
The conclusion is that the appearance of a cycle is what seems to be the reason for this behavior…
You won’t get a straight answer from Google on this. When asked to explain what PageRank is, they say it’s hardly a secret and refer you to the 1998 paper (or rehash its contents like here). However, if you criticize those ideas, the answer is: it’s irrelevant because PageRank isn’t that anymore, it is now composed of 200 various “signals”. So, what’s PageRank anyway? Glad you asked, it’s all explained in that paper from 1998…