Description: An overview of how a web search engine is organized is provided. A key component of the AltaVista search engine: its indexing library, is described in more depth. The library manages a set of inverted files, and provides mechanisms to construct and optimize complex queries on those inverted files. The design goals were to enable efficient queries on bodies of text up to a few hundred gigabytes in size (e.g. AltaVista) without sacrificing too much generality, and without giving up on small applications (e.g. mail directories).
Speaker(s):
Mike Burrows, Compaq SRC
|