A simple but often occurring problem is computing sample quantiles, sometimes named top $k$ elements, in a large data set. Here I show a solution for the MapReduce model of computation.
Tuesday, July 27, 2010
Monday, July 19, 2010
In a recently published book about algorithms for the map reduce model of computation, a simple connected components algorithm based on lablel propagation is proposed, but its complexity depends on the diameter of the graph, which can be very large. It turns out we can get rid of that dependency with a completely different algorithm, ported from the PRAM model.
Posted by Antonio Piccolboni on Monday, July 19, 2010
Friday, July 16, 2010
From the interview challenges of an up and coming web startup, three problems that range from the trivial to the impossible. The key to the the solution is to recognize that the setting is close to that of streaming algorithms, which allows for very limited space resources compared to the size of the input. This challenge assumes the data is in an array, an additional benefit not assumed in the streaming literature, but it's close enough to use some of those ideas, since random access to the array doesn't seem to help for the specific questions.
Posted by Antonio Piccolboni on Friday, July 16, 2010