Hashing and Indexing: Succinct DataStructures and Smoothed Analysis

Policriti, Alberto; Prezza, Nicola

doi:10.1007/978-3-319-13075-0_13

Alberto Policriti^15,16 &
Nicola Prezza¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8889))

Included in the following conference series:

International Symposium on Algorithms and Computation

1591 Accesses
4 Citations

Abstract

We consider the problem of indexing a text $T$ (of length $n$) with a light data structure that supports efficient search of patterns $P$ (of length $m$) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively.

We use our succinct hash data structure to solve the $k$-mismatch search problem in $2n\log \sigma + o(n\log \sigma )$ bits of space with a randomized algorithm having smoothed complexity $\mathcal {O}((2\sigma )^k(\log n)^k(\log m+\xi ) + (occ+1)\cdot m)$, where $\sigma $ is the alphabet size, $occ$ is the number of occurrences, and $\xi $ is a term depending on $m$, $n$, and on the amplitude $\epsilon $ of the noise perturbing text and pattern. Significantly, we obtain that for any $\epsilon >0$, for $m$ large enough, $\xi \in \mathcal {O}(\log m)$: our results improve upon previous linear-space solutions of the $k$-mismatch problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Property-Preserving Hash Functions for Hamming Distance from Standard Assumptions

Robust Property-Preserving Hash Functions for Hamming Distance and More

Universal Hashing via Integer Arithmetic Without Primes, Revisited

References

Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Article Google Scholar
Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)
Article Google Scholar
Vezzi, F., Del Fabbro, C., Tomescu, A.I., Policriti, A.: rNA: a fast and accurate short reads numerical aligner. Bioinformatics 28(1), 123–124 (2012)
Article Google Scholar
Policriti, A., Tomescu, A.I., Vezzi, F.: A randomized numerical aligner (rna). J. Comput. Syst. Sci. 78(6), 1868–1882 (2012)
Article MATH MathSciNet Google Scholar
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, pp. 91–100. ACM (2004)
Google Scholar
Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 49–59. Springer, Heidelberg (2006)
Chapter Google Scholar
Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, p. 181. Springer, Heidelberg (2002)
Chapter Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors. Journal of Discrete Algorithms 5(4), 662–681 (2007)
Article MATH MathSciNet Google Scholar
Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1(1), 205–239 (2000)
MathSciNet Google Scholar
Li, R., Li, Y., Kristiansen, K., Wang, J.: SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714 (2008)
Article Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: IEEE Conference Record of 14th Annual Symposium on Switching and Automata Theory, SWAT 2008, pp. 1–11. IEEE (1973)
Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)
Article Google Scholar
Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science, pp. 549–554. IEEE (1989)
Google Scholar
Spielman, D.A., Teng, S.H.: Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Communications of the ACM 52(10), 76–84 (2009)
Article Google Scholar
Spielman, D., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In: Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, pp. 296–305. ACM (2001)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)
Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850. Society for Industrial and Applied Mathematics (2003)
Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)
Chapter Google Scholar
Andoni, A., Krauthgamer, R.: The smoothed complexity of edit distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 357–369. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Informatics, University of Udine, Udine, Italy
Alberto Policriti & Nicola Prezza
Istituto di Genomica Applicata, Udine, Italy
Alberto Policriti

Authors

Alberto Policriti
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Prezza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Prezza .

Editor information

Editors and Affiliations

Pohang University of Science and Technology, Pohang, Korea, Republic of (South Korea)
Hee-Kap Ahn
Hankuk University of Foreign Studies, Yongin-si, Korea, Republic of (South Korea)
Chan-Su Shin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Policriti, A., Prezza, N. (2014). Hashing and Indexing: Succinct DataStructures and Smoothed Analysis. In: Ahn, HK., Shin, CS. (eds) Algorithms and Computation. ISAAC 2014. Lecture Notes in Computer Science(), vol 8889. Springer, Cham. https://doi.org/10.1007/978-3-319-13075-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-13075-0_13
Published: 08 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13074-3
Online ISBN: 978-3-319-13075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics