Tuesday, July 27, 2010

An algorithm for sample quantiles in map reduce

A simple but often occurring problem is computing sample quantiles, sometimes named top $k$ elements, in a large data set. Here I show a solution for the MapReduce model of computation.

Monday, July 19, 2010

A map reduce algorithm for connected components

In a recently published book about algorithms for the map reduce model of computation, a simple connected components algorithm based on lablel propagation is proposed, but its complexity depends on the diameter of the graph, which can be very large. It turns out we can get rid of that dependency with a completely different algorithm, ported from the PRAM model.

Friday, July 16, 2010

Rapleaf Array Absurdity or On streaming problems in disguise

From the interview challenges of an up and coming web startup, three problems that range from the trivial to the impossible. The key to the the solution is to recognize that the setting is close to that of streaming algorithms, which allows for very limited space resources compared to the size of the input. This challenge assumes the data is in an array, an additional benefit not assumed in the streaming literature, but it's close enough to use some of those ideas, since random access to the array doesn't seem to help for the specific questions.