From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back

Authors Alberto Policriti, Nicola Prezza



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.17.pdf
  • Filesize: 481 kB
  • 10 pages

Document Identifiers

Author Details

Alberto Policriti
Nicola Prezza

Cite AsGet BibTex

Alberto Policriti and Nicola Prezza. From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 17:1-17:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.CPM.2017.17

Abstract

The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes z and r closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. Let n be the text length. We show that RLBWT can be converted to LZ77 in O(n log r) time and O(r) words of working space. Conversely, we provide an algorithm to convert LZ77 to RLBWT in O(n(log r + log z)) time and O(r+z) words of working space. Note that r and z can be constant if the text is highly repetitive, and our algorithms can operate with (up to) exponentially less space than naive solutions based on full decompression.
Keywords
  • Lempel-Ziv
  • Burrows-Wheeler transform
  • compressed computation
  • repetitive text collections

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Paweł Gawrychowski, Shunsuke Inenaga, and Masayuki Takeda. Converting SLP to LZ78 in almost linear time. In Johannes Fischer and Peter Sanders, editors, Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), volume 7922 of LNCS, pages 38-49. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_6.
  2. Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Efficient LZ78 factorization of grammar compressed text. In Liliana Calderón-Benavides, Cristina N. González-Caro, Edgar Chávez, and Nivio Ziviani, editors, Proceedings of the 19th International Symposium on String Processing and Information Retrieval (SPIRE 2012), volume 7608 of LNCS, pages 86-98. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-34109-0_10.
  3. Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, and Mathieu Raffinot. Composite repetition-aware data structures. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 26-39. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_3.
  4. Michael Burrows and David J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Systems Research Center, 1994. URL: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf.
  5. Ho-Leung Chan, Wing-Kai Hon, Tak-Wah Lam, and Kunihiko Sadakane. Compressed indexes for dynamic text collections. ACM Trans. Algorithms, 3(2):21, 2007. URL: http://dx.doi.org/10.1145/1240233.1240244.
  6. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554-2576, 2005. URL: http://dx.doi.org/10.1109/TIT.2005.850116.
  7. Wing-Kai Hon, Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung, and Siu-Ming Yiu. A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica, 48(1):23-36, 2007. URL: http://dx.doi.org/10.1007/s00453-006-1228-8.
  8. Sebastian Kreft and Gonzalo Navarro. On compressing and indexing repetitive sequences. Theor. Comput. Sci., 483:115-133, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.02.006.
  9. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol., 17(3):281-308, 2010. URL: http://dx.doi.org/10.1089/cmb.2009.0169.
  10. Alberto Policriti and Nicola Prezza. Computing LZ77 in run-compressed space. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2016 Data Compression Conference (DCC 2016). IEEE, 2016. URL: http://dx.doi.org/10.1109/DCC.2016.30.
  11. Nicola Prezza. A framework of dynamic data structures for string processing, 2017. URL: http://arxiv.org/abs/1701.07238.
  12. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00777-6.
  13. Jouni Sirén. Compressed full-text indexes for highly repetitive collections. PhD thesis, University of Helsinki, June 2012. URL: http://urn.fi/URN:ISBN:978-952-10-8052-4.
  14. Jouni Sirén, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro. Run-length compressed indexes are superior for highly repetitive sequence collections. In Amihood Amir, Andrew Turpin, and Alistair Moffat, editors, Proceedings of the 15th International Symposium on String Processing and Information Retrieval (SPIRE 2008), volume 5280 of LNCS, pages 164-175. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-540-89097-3_17.
  15. Yuya Tamakoshi, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masanori Takeda. From run length encoding to LZ78 and back again. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2013 Data Compression Conference (DCC 2013), pages 143-152. IEEE, 2013. URL: http://dx.doi.org/10.1109/DCC.2013.22.
  16. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337-343, 1977. URL: http://dx.doi.org/10.1109/TIT.1977.1055714.
  17. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24(5):530-536, 1978. URL: http://dx.doi.org/10.1109/TIT.1978.1055934.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail