Skip to main content

Hashing and Indexing: Succinct DataStructures and Smoothed Analysis

  • Conference paper
  • First Online:
Book cover Algorithms and Computation (ISAAC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8889))

Included in the following conference series:

Abstract

We consider the problem of indexing a text \(T\) (of length \(n\)) with a light data structure that supports efficient search of patterns \(P\) (of length \(m\)) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively.

We use our succinct hash data structure to solve the \(k\)-mismatch search problem in \(2n\log \sigma + o(n\log \sigma )\) bits of space with a randomized algorithm having smoothed complexity \(\mathcal {O}((2\sigma )^k(\log n)^k(\log m+\xi ) + (occ+1)\cdot m)\), where \(\sigma \) is the alphabet size, \(occ\) is the number of occurrences, and \(\xi \) is a term depending on \(m\), \(n\), and on the amplitude \(\epsilon \) of the noise perturbing text and pattern. Significantly, we obtain that for any \(\epsilon >0\), for \(m\) large enough, \(\xi \in \mathcal {O}(\log m)\): our results improve upon previous linear-space solutions of the \(k\)-mismatch problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  2. Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)

    Article  Google Scholar 

  3. Vezzi, F., Del Fabbro, C., Tomescu, A.I., Policriti, A.: rNA: a fast and accurate short reads numerical aligner. Bioinformatics 28(1), 123–124 (2012)

    Article  Google Scholar 

  4. Policriti, A., Tomescu, A.I., Vezzi, F.: A randomized numerical aligner (rna). J. Comput. Syst. Sci. 78(6), 1868–1882 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  5. Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, pp. 91–100. ACM (2004)

    Google Scholar 

  6. Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 49–59. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, p. 181. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Maaß, M.G., Nowak, J.: Text indexing with errors. Journal of Discrete Algorithms 5(4), 662–681 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  9. Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1(1), 205–239 (2000)

    MathSciNet  Google Scholar 

  10. Li, R., Li, Y., Kristiansen, K., Wang, J.: SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714 (2008)

    Article  Google Scholar 

  11. Weiner, P.: Linear pattern matching algorithms. In: IEEE Conference Record of 14th Annual Symposium on Switching and Automata Theory, SWAT 2008, pp. 1–11. IEEE (1973)

    Google Scholar 

  12. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  13. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)

    Article  Google Scholar 

  14. Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science, pp. 549–554. IEEE (1989)

    Google Scholar 

  15. Spielman, D.A., Teng, S.H.: Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Communications of the ACM 52(10), 76–84 (2009)

    Article  Google Scholar 

  16. Spielman, D., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In: Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, pp. 296–305. ACM (2001)

    Google Scholar 

  17. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)

    Google Scholar 

  18. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850. Society for Industrial and Applied Mathematics (2003)

    Google Scholar 

  19. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Andoni, A., Krauthgamer, R.: The smoothed complexity of edit distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 357–369. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Prezza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Policriti, A., Prezza, N. (2014). Hashing and Indexing: Succinct DataStructures and Smoothed Analysis. In: Ahn, HK., Shin, CS. (eds) Algorithms and Computation. ISAAC 2014. Lecture Notes in Computer Science(), vol 8889. Springer, Cham. https://doi.org/10.1007/978-3-319-13075-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13075-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13074-3

  • Online ISBN: 978-3-319-13075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics