January 25, 2010

Topological data analysis

Filed under: computer vision/machine vision/AI, mathematics — Peter @ 1:11 pm

Below is the abstract of a paper I am working on.

Suppose we have conducted 1000 experiments with a set of 100 various measurements in each. Then each experiment is a string of 100 numbers or simply a vector of dimension 100. The result is a collection of disconnected 1000 points (aka point cloud) in the 100-dimensional Euclidean space.

It is impossible to visualize this data as any representation that one can see is lim

ited to dimension 3 (by using colors one gets 6, time – 7). Yet we still need to answer the same questions about the object behind the point cloud: is it one piece or more? Is there a tunnel or a void? And what about possible 100-dimensional topological features?

This is a common approach to the problem.

For a point cloud in a euclidean space, suppose we are given a threshold r so that any two points within r from each other are to be considered “close”. Then each pair of such points is connected by an edge. If three points are “close”, we add a face, etc. The result is a cell complex (more precisely, simplicial complex) that approximates the manifold M behind the point cloud.

   

We want to count the number of topological features in M by means of the Betti numbers: the number of connected components in M, the number tunnels, the number of voids, etc. This information is contained in the homology of the complex.

Further, to deal with noise and other uncertainty one needs to evaluate the significance of these topological features. For each value of the threshold r we build a separate cell complex, then combine the homology groups of these complexes in a single structure, and count the features with a high measure of robustness. This measure, called persistence, is the length of the interval of values of r for which each of the topological features is present.

Even more important than these “global” properties may be the local topology of the data. For example, in both of the images above the datasets are 3-dimensional but what’s behind is 2-dimensional (surfaces). This is called dimensionality reduction.

Most of the links here are dead but the article will be fixed by the time I am done with the topology course.

A more detailed outline is here: Homological methods in manifold learning (warning: heavy math).

One Response to “Topological data analysis”

  1. Have you heard about Locally Linear Embedding? It does some kind of non-linear dimentionality reduction, though I’m not sure it can help to investigate topological properties. Hope it will help you!