Skip to main content

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

  • Conference paper
  • First Online:
SOFSEM 2021: Theory and Practice of Computer Science (SOFSEM 2021)

Abstract

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as r) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the r parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures.

In this paper, we investigate \(\rho (v)\), the ratio of r and of the number of runs of the BWT of the reverse of v. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as \(\rho (v) = O(\log ^2(n))\), for any string v of length n. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which \(\rho (v) = \varTheta (\log n)\) holds, thus giving the first non-trivial lower bound on \(\rho (n)\), the maximum over all strings of length n.

Our results suggest that r is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string’s combinatorial properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that this is in general not the same as the suffix array SA, since here we have the conjugates and not the suffixes.

References

  1. Bannai, H., Gagie, T.: Online LZ77 parsing and matching statistics with RLBWTs. In: Annual Symposium on Combinatorial Pattern Matching (CPM 2018), vol. 105, pp. 7:1–7:12 (2018)

    Google Scholar 

  2. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), pp. 26–39 (2015)

    Google Scholar 

  3. Berstel, J., de Luca, A.: Sturmian words, Lyndon words and trees. Theoretical Comput. Sci. 178(1–2), 171–203 (1997)

    Article  MathSciNet  Google Scholar 

  4. Blumer, A., Blumer, J., Haussler, D., McConnell, R.M., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987)

    Article  MathSciNet  Google Scholar 

  5. Borel, J., Reutenauer, C.: On christoffel classes. RAIRO Theor. Inf. Appl. 40(1), 15–27 (2006)

    Article  MathSciNet  Google Scholar 

  6. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, DIGITAL System Research Center (1994)

    Google Scholar 

  7. Castiglione, G., Restivo, A., Sciortino, M.: Hopcroft’s algorithm and cyclic automata. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 172–183. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88282-4_17

    Chapter  MATH  Google Scholar 

  8. Castiglione, G., Restivo, A., Sciortino, M.: Circular Sturmian words and Hopcroft’s algorithm. Theor. Comput. Sci. 410(43), 4372–4381 (2009)

    Article  MathSciNet  Google Scholar 

  9. de Luca, A.: A combinatorial property of the Fibonacci words. Inf. Process. Lett. 12(4), 193–195 (1981)

    Article  MathSciNet  Google Scholar 

  10. Luca, A.: Combinatorics of standard sturmian words. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds.) Structures in Logic and Computer Science. LNCS, vol. 1261, pp. 249–267. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63246-8_15

    Chapter  Google Scholar 

  11. de Luca, A.: Sturmian words: structure, combinatorics, and their arithmetics. Theor. Comput. Sci. 183(1), 45–82 (1997)

    Article  MathSciNet  Google Scholar 

  12. de Luca, A., Mignosi, F.: Some combinatorial properties of sturmian words. Theor. Comput. Sci. 136(2), 361–385 (1994)

    Article  MathSciNet  Google Scholar 

  13. Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020)

    Article  MathSciNet  Google Scholar 

  14. Kempa, D.: Optimal construction of compressed indexes for highly repetitive texts. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2019), pp. 1344–1357 (2019)

    Google Scholar 

  15. Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. CoRR, abs/1910.10631, Accepted to the 61st Annual Symposium on Foundations of Computer Science (FOCS 2020) (2019)

    Google Scholar 

  16. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2018), pp. 827–840 (2018)

    Google Scholar 

  17. Kieffer, J.C., Yang, E.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)

    Article  MathSciNet  Google Scholar 

  18. Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  Google Scholar 

  19. Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive measure of repetitiveness. In: Proceedings of the 14th Latin American Symposium on Theoretical Informatics (LATIN 2020) (2020)

    Google Scholar 

  20. Lagarde, G., Perifel, S.: Lempel-Ziv: a “one-bit catastrophe" but not a tragedy. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2018), pp. 1478–1495 (2018)

    Google Scholar 

  21. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)

    Article  MathSciNet  Google Scholar 

  22. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)

    Book  Google Scholar 

  23. Mantaci, S., Restivo, A., Sciortino, M.: Burrows-wheeler transform and sturmian words. Inf. Process. Lett. 86(5), 241–246 (2003)

    Article  MathSciNet  Google Scholar 

  24. Ohno, T., Sakai, K., Takabatake, Y., Tomohiro, I., Sakamoto, H.: A faster implementation of online RLBWT and its application to LZ77 parsing. J. Discrete Algorithms 52, 18–28 (2018)

    Article  MathSciNet  Google Scholar 

  25. Policriti, A., Prezza, N.: From LZ77 to the run-length encoded Burrows-Wheeler transform, and back. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), vol. 78, pp. 1–10 (2017)

    Google Scholar 

  26. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Zs.L. and M.S. wish to thank Dominik Kempa for getting them interested in the problem treated in this paper. We thank Gabriele Fici and Daniele Greco for interesting discussions, Akihiro Nishi for preliminary experiments, and the anonymous referees for their careful reading of the paper. Finally, we thank the Leibniz Zentrum für Informatik for the possibility of participating at Dagstuhl Seminar no. 19241 in June 2019, where some of the authors started collaborating on this problem.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Giuliani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Giuliani, S., Inenaga, S., Lipták, Z., Prezza, N., Sciortino, M., Toffanello, A. (2021). Novel Results on the Number of Runs of the Burrows-Wheeler-Transform. In: Bureš, T., et al. SOFSEM 2021: Theory and Practice of Computer Science. SOFSEM 2021. Lecture Notes in Computer Science(), vol 12607. Springer, Cham. https://doi.org/10.1007/978-3-030-67731-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67731-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67730-5

  • Online ISBN: 978-3-030-67731-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics