Title: Graph-based Extension of FM-index for Domain-specific Compression Boosting
FM-index is one of the popular data structure for string matching in compressed space. It uses the Burrows-Wheeler transform of the given text string in order to reduce the space occupancy of the index while maintaining the functionality of the suffix array. Compression boosting exploits a low-order compression technique on the partition of the Burrows-Wheeler transformed text in order to compress the FM-index further more. Motivated from the observation that a careful pre-partitioning based on the prior knowledge of the text domain can reduce the space occupancy in practice, in this thesis a graph-based extension of the FM-index is proposed. The proposed method uses a graph representation that captures the characteristics of the text string in order to compress the text in the context-aware manner. We also show that the proposed method is also a general and flexible framework that can give a illustrative description for many existing FM-index variants and related data structures.