Abstract
The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let \(n\), \(\sigma \), and \(H_k\) be the text length, the alphabet size, and the \(k\)-th order empirical entropy of the text, respectively. Moreover, let \(H_k^*=\min \{H_k+1,\lceil \log \sigma \rceil \}\). Under the word RAM model with word size \(w\in \Theta (\log n)\), our algorithm builds the BWT in average \(\mathcal {O}(nH_k^*)\) time using \(nH_k^* + o(nH_k^*)\) bits of space, where \(k=\log _\sigma (n/\log ^2 n)-1\). We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Belazzougui, D.: Linear time construction of compressed text indices in compact space. arXiv preprint arXiv:1401.0936 (2014)
Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the burrows-wheeler transform. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 5–16. Springer, Heidelberg (2013)
de Bruijn, N.G., Erdos, P.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49(49), 758–764 (1946)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)
Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)
Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM (JACM) 52(4), 688–713 (2005)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)
Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing, pp. 345–354. ACM (1989)
Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)
González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)
He, M., Munro, J.I.: Succinct representations of dynamic strings. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 334–346. Springer, Heidelberg (2010)
Huffman, D.A., et al.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Lippert, R.A., Mobarry, C.M., Walenz, B.P.: A space-efficient construction of the burrows-wheeler transform for genomic data. Journal of Computational Biology 12(7), 943–951 (2005)
Manzini, G.: An analysis of the burrows-wheeler transform. Journal of the ACM (JACM) 48(3), 407–430 (2001)
Navarro, G.: Wavelet trees for all. Journal of Discrete Algorithms 25, 2–20 (2014)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)
Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM Journal on Computing 43(5), 1781–1806 (2014)
Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Transactions on Algorithms (TALG) 10(3), 16 (2014)
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Data Compression Conference, DCC 2009, pp. 193–202. IEEE (2009)
Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)
Tischler, G.: Faster average case low memory semi-external construction of the burrows-wheeler transform. In: CEUR Workshop Proceedings, vol. 1146, pp. 61–68 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Policriti, A., Gigante, N., Prezza, N. (2015). Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform. In: Dediu, AH., Formenti, E., MartÃn-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2015. Lecture Notes in Computer Science(), vol 8977. Springer, Cham. https://doi.org/10.1007/978-3-319-15579-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-15579-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15578-4
Online ISBN: 978-3-319-15579-1
eBook Packages: Computer ScienceComputer Science (R0)