CS

Cheaper by the Dozen: Batched Algorithms

Report ID: TR-627-00

Date: 2000-11-00

Pages: 10

Download Formats: |Postscript|

Abstract:

In this paper we develop the idea of batching, processing several queries at a time, for more efficient algorithms. The prominence of massive data sets in computation has necessitated new ideas in algorithms. Consider a webpage that answers queries on a large data set. Instead of answering these queries one at a time, which can result in a substantial bottleneck, we wait for several queries to accumulate, and then apply a batched algorithm that can answer them significantly faster. We give an example of the power batched algorithms by showing an O(nbd^{.3}) time batched algorithm which answers b d-dimensional Hamming nearest neighbor queries on a dataset of size n, given that b \ge d.

To better understand the power of batching we give batched algorithms for the classical problems of: string matching, range searching, selection, 1D and 2D nearest neighbor, convex hull membership, and halfplane intersection. Our batched algorithms answer b queries on a dataset of size n >> b in 0(nlogb) time, using o(n) storage and at most 2n/B I/O's (where B is the block size). We use two techniques, query data structures and sampling, in the design of our batched algorithms. The advantages of these batched algorithms algorithms, over putting the massive dataset into a classical data structure, are threefold: improved asymptotic performance, significantly smaller data structures, and a number of I/O's which is linear in the size of the massive dataset.