Scholarship and Professional Work - LAS

On Searching Compressed String Collections Cache-Obliviously

Ankur Gupta, Butler UniversityFollow
Paolo Ferragina
Roberto Grossi
Rahul Shah
Jeffrey Vitter

Document Type

Article

Publication Date

May 2008

Abstract

Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious of the query distribution. Finally, we combine those data structures with cache-oblivious tries [2, 5] and obtain a succinct variant whose space is close to the information-theoretic minimum.

Recommended Citation

Gupta, Ankur; Ferragina, Paolo; Grossi, Roberto; Shah, Rahul; and Vitter, Jeffrey, "On Searching Compressed String Collections Cache-Obliviously" / (2008): -.
Available at https://digitalcommons.butler.edu/facsch_papers/1209

Link to Full Text

COinS

Scholarship and Professional Work - LAS

On Searching Compressed String Collections Cache-Obliviously

Document Type

Publication Date

Abstract

Recommended Citation

Search

Links

Browse

Participate

Scholarship and Professional Work - LAS

On Searching Compressed String Collections Cache-Obliviously

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Share

Search

Links

Browse

Participate