Skip to main content

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

  • Conference paper
  • First Online:
Language and Automata Theory and Applications (LATA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8977))

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let \(n\), \(\sigma \), and \(H_k\) be the text length, the alphabet size, and the \(k\)-th order empirical entropy of the text, respectively. Moreover, let \(H_k^*=\min \{H_k+1,\lceil \log \sigma \rceil \}\). Under the word RAM model with word size \(w\in \Theta (\log n)\), our algorithm builds the BWT in average \(\mathcal {O}(nH_k^*)\) time using \(nH_k^* + o(nH_k^*)\) bits of space, where \(k=\log _\sigma (n/\log ^2 n)-1\). We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belazzougui, D.: Linear time construction of compressed text indices in compact space. arXiv preprint arXiv:1401.0936 (2014)

  2. Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the burrows-wheeler transform. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 5–16. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. de Bruijn, N.G., Erdos, P.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49(49), 758–764 (1946)

    MATH  Google Scholar 

  4. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)

    Google Scholar 

  5. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  6. Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM (JACM) 52(4), 688–713 (2005)

    Article  MathSciNet  Google Scholar 

  7. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)

    Google Scholar 

  8. Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing, pp. 345–354. ACM (1989)

    Google Scholar 

  9. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  10. González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  11. He, M., Munro, J.I.: Succinct representations of dynamic strings. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 334–346. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Huffman, D.A., et al.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  Google Scholar 

  13. Lippert, R.A., Mobarry, C.M., Walenz, B.P.: A space-efficient construction of the burrows-wheeler transform for genomic data. Journal of Computational Biology 12(7), 943–951 (2005)

    Article  Google Scholar 

  14. Manzini, G.: An analysis of the burrows-wheeler transform. Journal of the ACM (JACM) 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  15. Navarro, G.: Wavelet trees for all. Journal of Discrete Algorithms 25, 2–20 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  16. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)

    Article  Google Scholar 

  17. Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM Journal on Computing 43(5), 1781–1806 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  18. Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Transactions on Algorithms (TALG) 10(3), 16 (2014)

    MathSciNet  Google Scholar 

  19. Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Data Compression Conference, DCC 2009, pp. 193–202. IEEE (2009)

    Google Scholar 

  20. Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  21. Tischler, G.: Faster average case low memory semi-external construction of the burrows-wheeler transform. In: CEUR Workshop Proceedings, vol. 1146, pp. 61–68 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Prezza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Policriti, A., Gigante, N., Prezza, N. (2015). Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform. In: Dediu, AH., Formenti, E., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2015. Lecture Notes in Computer Science(), vol 8977. Springer, Cham. https://doi.org/10.1007/978-3-319-15579-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15579-1_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15578-4

  • Online ISBN: 978-3-319-15579-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics