Skip to main content

Space-Time Trade-Offs for the LCP Array of Wheeler DFAs

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2023)

Abstract

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires \( O(n \log n) \) bits, n being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant.

In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time trade-off to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a k-th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in k, thus improving a previous bound by Boucher et al. which was linear in k [DCC 2015].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Regular languages meet prefix sorting. In: Proceedings of the 31st Symposium on Discrete Algorithms, (SODA 2020), pp. 911–930. SIAM (2020). https://doi.org/10.1137/1.9781611975994.55

  2. Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Wheeler languages. Inf. Comput. 281, 104820 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bankevich, A., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012). https://doi.org/10.1089/cmb.2012.0021, pMID: 22506599

  4. Boucher, C., Bowe, A., Gagie, T., Puglisi, S.J., Sadakane, K.: Variable-order de Bruijn graphs. In: 2015 Data Compression Conference, pp. 383–392 (2015). https://doi.org/10.1109/DCC.2015.70

  5. Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33122-0_18

    Chapter  Google Scholar 

  6. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  7. Conte, A., Cotumaccio, N., Gagie, T., Manzini, G., Prezza, N., Sciortino, M.: Computing matching statistics on Wheeler DFAs. In: 2023 Data Compression Conference (DCC), pp. 150–159 (2023). https://doi.org/10.1109/DCC55655.2023.00023

  8. Cotumaccio, N.: Graphs can be succinctly indexed for pattern matching in \(O(\vert E\vert ^{2}+\vert V\vert ^{5/2})\) time. In: 2022 Data Compression Conference (DCC), pp. 272–281 (2022). https://doi.org/10.1109/DCC52660.2022.00035

  9. Cotumaccio, N., Prezza, N.: On indexing and compressing finite automata. In: Proceedings of the 32nd Symposium on Discrete Algorithms, (SODA 2021), pp. 2585–2599. SIAM (2021). https://doi.org/10.1137/1.9781611976465.153

  10. Cotumaccio, N., D’Agostino, G., Policriti, A., Prezza, N.: Co-lexicographically ordering automata and regular languages - part I. J. ACM 70, 1–73 (2023). https://doi.org/10.1145/3607471

  11. Díaz-Domínguez, D., Gagie, T., Navarro, G.: Simulating the DNA overlap graph in succinct space. In: Pisanti, N., Pissis, S.P. (eds.) 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), vol. 128, pp. 26:1–26:20. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2019). https://doi.org/10.4230/LIPIcs.CPM.2019.26, http://drops.dagstuhl.de/opus/volltexte/2019/10497

  12. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), pp. 184–193 (2005). https://doi.org/10.1109/SFCS.2005.69

  13. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), pp. 390–398 (2000). https://doi.org/10.1109/SFCS.2000.892127

  14. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1) (2009). https://doi.org/10.1145/1613676.1613680

  15. Fischer, J.: Optimal succinctness for range minimum queries. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 158–169. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12200-2_16

    Chapter  Google Scholar 

  16. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759

  17. Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017). https://doi.org/10.1016/j.tcs.2017.06.016, https://www.sciencedirect.com/science/article/pii/S0304397517305285, algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo)

  18. Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)

    Article  Google Scholar 

  19. Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–72 (2009). https://doi.org/10.1101/gr.097261.109

  20. Liu, M., Yu, H.: Lower bound for succinct range minimum query. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1402–1415. STOC 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3357713.3384260

  21. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058

    Article  MathSciNet  MATH  Google Scholar 

  22. Navarro, G.: Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Comput. Surv. 46(4) (2014). https://doi.org/10.1145/2535933

  23. Navarro, G.: Compact Data Structures - A Practical Approach. Cambridge University Press (2016). http://www.cambridge.org/de/academic/subjects/computer-science/algorithmics-complexity-computer-algebra-and-computational-g/compact-data-structures-practical-approach?format=HB

  24. Peng, Yu., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – a practical iterative de bruijn graph de novo assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12683-3_28

    Chapter  Google Scholar 

  25. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Nat. Acad. Sci. 98(17), 9748–9753 (2001). https://doi.org/10.1073/pnas.171285098

    Article  MathSciNet  MATH  Google Scholar 

  26. Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41(4), 589–607 (2007). https://doi.org/10.1007/s00224-006-1198-x

    Article  MathSciNet  MATH  Google Scholar 

  27. Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009). https://doi.org/10.1101/gr.089532.108

    Article  Google Scholar 

  28. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973). https://doi.org/10.1109/SWAT.1973.13

Download references

Acknowledgements

Travis Gagie: funded by National Institutes of Health (NIH) NIAID (grant no. HG011392), the National Science Foundation NSF IIBR (grant no. 2029552) and a Natural Science and Engineering Research Council (NSERC) Discovery Grant (grant no. RGPIN-07185-2020). Dominik Köppl: supported by JSPS KAKENHI with No. JP21K17701 and JP23H04378. Nicola Prezza: funded by the European Union (ERC, REGINDEX, 101039208). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Cotumaccio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cotumaccio, N., Gagie, T., Köppl, D., Prezza, N. (2023). Space-Time Trade-Offs for the LCP Array of Wheeler DFAs. In: Nardini, F.M., Pisanti, N., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2023. Lecture Notes in Computer Science, vol 14240. Springer, Cham. https://doi.org/10.1007/978-3-031-43980-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43980-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43979-7

  • Online ISBN: 978-3-031-43980-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics