July 22, 2011

Calculus 1 lectures finished

Filed under: education,mathematics,site — Peter Saveliev @ 10:41 pm

The multi-stage procedure is completed:

  1. prepare for the lecture by creating the initial notes;
  2. deliver the lecture (with a tablet) which always ends up very different from the textbook or the notes;
  3. save the notes as a Journal file (and also in pdf, for the web);
  4. edit the Journal notes: rewrite in full sentences to make the text more readable, clean up illustrations, proofread;
  5. transcribe the notes into TeX, put the result online with illustrations inserted;
  6. edit this text: make it more narrative, improve formatting, add links, proofread again.

The whole procedure is more time consuming than I expected initially. Stage 5 is especially hard and has required help (thanks Tom!). Stage 7 should be further polishing and more proofreading. I hope one day to be able to add stage 8: make the illustrations more professional.

Besides Calculus 1, other courses are at various stage of completion:

July 15, 2011

To attend the Canadian Conference on Computational Geometry

Filed under: computer vision/machine vision/AI,mathematics,news — Peter Saveliev @ 6:25 pm

I will be attending the 23rd Canadian Conference on Computational Geometry held in Toronto, August 10-12, 2011.

I’ll give a talk called Robustness of topology of digital images and point clouds.

Abstract. Such modern applications of topology as digital image analysis and data analysis have to deal with noise and other uncertainty. In this environment, the data structures often appear “filtered” into a sequence of cell complexes. We introduce the homology group of the filtration as the group of all possible homology classes of all elements of the filtration, without double count. The second step of analysis is to discard the features that lie outside the user’s choice of the acceptable level of noise.

July 5, 2011

PageRank is bad math: discussion

Filed under: data analysis,mathematics,reviews — Peter Saveliev @ 10:48 pm

My previous post on the subject was “PageRank is an abomination (mathematically)”. My thesis was:

PageRank, as described by Google, is bad math.

Why? It appears that initially they made an arbitrary choice of the damping constant even though the choice affects the rankings.

The parameter is made-up and hidden from the user.

What has happened since, I don’t know. It’s a secret. I certainly haven’t heard an alternative story so far. And I can rely only on how it is described by Google – in the original papers and on their site. Below is a summary of the reaction to my post.

Re-reading the discussion, I can see now how many times I was distracted from my thesis by the “Google search works great” argument. I should’ve just replied: “PageRank is bad math. Please comment.” I admit though, I went beyond my main thesis and conjectured that PageRank’s problems are the cause of the problems of the whole search algorithm. I’ll try to make a case for that in a future post.

The response at Reddit was mostly along the lines: “The article is terrible, because Google search works great.” A surprising reaction to my thesis. Digging a bit deeper I realized that these people:

  1. love Google;
  2. assume that Google search == PageRank (or think that I do);
  3. think that “bad math” means 1+1=3.

The crowd is “technical”, I suppose. So, #1 is understandable, but #2 is not, while #3 is very, very common.

The praise for Google was very homogeneous:

“PR-based algorithms seem to work (very) well in the real world”; “yield very good search results”; “Google’s search algorithm works amazingly well”; “Google’s algorithm works extremely well”; “The page rank algorithm is actually extremely impressive, and obviously works well.”

What I see is that this attitude has shown cracks recently. There have been public complaints about Google search results: the JCPenney story, content farms, scrapers, Google’s own properties ranked above others, etc. These are recent complains from site owners  about the Panda update. And just because Google dominates other search engines, also based on the PageRank idea, doesn’t make it good math.

There were other arguments against my thesis.

“[H]is main concern is over the fact that Google arbitrarily picked a constant for the decay factor, but that’s not actually a bad thing, and is done all the time in mathematical modeling… in no way upsets me as a mathematician.”

Reddit is anonymous by default, so I don’t know… A mathematician would ask about the effects of such a choice and try to prove (or disprove) that there isn’t any. Unfortunately, the choice does affect the order of pages that you get as the referenced paper indicated. As for the “all the time” comment, so what? If you do that, to me it means that you can’t come up with a better model (I think there is a way).

“[T]he graph used in the paper [that shows dependence of the rankings on the damping factor] is fairly contrived, and we’re mostly interested in what the algorithm does on real-world data.”

A fair point but one still has to ask for some evidence. Is there a case study where a large number of websites are analyzed and it is shown that changing the damping factor DOES NOT  significantly affect the ranking of these sites? Well, at least this person didn’t claim to be a mathematician… I have to add then that even such a study wouldn’t resolve the bad math issue. What would? You’d need to

  1. define a class of graphs, let’s call them non-contrived;
  2. prove that the PageRank of a non-contrived graph is independent of the damping factor (or at least the top results aren’t);
  3. provide evidence that the Internet, now and in the future, is non-contrived.

“Pagerank is a starting point; it provides a rough sketch of page importance which is fine tuned by other more specific algorithms”.

Let’s consider this analogy: π =3 is bad math, but it’s “a very good starting point” for solving many real life problems. For example, you can build a hut, no problem. But what if you want to do something more sophisticated like building an airplane? With π =3 your plane will drop like a brick. And this will keep happening, no matter how much you fine tune your engineering. Suppose now that you replace π =3 with π = 3.14159265358979. OK, you’ve replaced bad math with better math, or maybe even good enough math (you can build your plane now). But π=3.14159265358979 is a time bomb! Sooner or later it will fail you when it’s not accurate enough anymore. Sooner or later you will need to understand what π is. Sooner or later you will need good math… (Is this what’s happened to Google?)

More to come

This stuff will end up in the main site under PageRank and, yes, Bad math.