Abstract
The problem of clustering fingerprint vectors with missing values is an interesting problem in Computational Biology that has been proposed in Figueroa et al. (J. Comput. Biol. 11(5):887–901, 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of variants of the biological problem. Moreover, we have studied two additional variants of the original problem proposed in Figueroa et al. (Proc. 11th Computing: The Australasian Theory Symposium (CATS), CRPIT, vol. 41, pp. 57–60, 2005). We prove that all such problems are APX-hard even when each fingerprint contains only two unknown positions and we present a greedy algorithm that has constant approximation factors for these variants. Despite the hardness of these restricted versions of the problem, we show that the general clustering problem on an unbounded number of missing values such that they occur for every fixed position of an input vector in at most one fingerprint is polynomial time solvable.
Article PDF
Similar content being viewed by others
References
Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theor. Comput. Sci. 237(1–2), 123–134 (2000)
Ausiello, G., Crescenzi, P., Gambosi, V., Kann, G., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Berlin (1999)
Chlebík, M., Chlebíková, J.: Complexity of approximating bounded variants of optimization problems. Theor. Comput. Sci. 354(3), 320–338 (2006)
Drmanac, R.: cDNA screening by array hybridization. Methods Enzymol. 303, 165–178 (1999)
Drmanac, S., Drmanac, R.: Processing of cDNA and genomic kilobase-size clones for massive screening mapping and sequencing by hybridization. Biotechniques 17, 328–336 (1994)
Drmanac, S., Stavropoulos, N., Labat, I., Vonau, J., Hauser, B., Soares, M., Drmanac, R.: Gene-representation cDNA clusters defined by hybridization of 57 419 clones from infant brain libraries with short oligonucleotite probes. Genomics 37, 29–40 (1996)
Figueroa, A., Borneman, J., Jiang, T.: Clustering binary fingerprint vectors with missing values for dna array data analysis. J. Comput. Biol. 11(5), 887–901 (2004)
Figueroa, A., Goldstein, A., Jiang, T., Kurowski, M., Lingas, A., Persson, M.: Approximate clustering of fingerprint vectors with missing values. In: Proc. 11th Computing: The Australasian Theory Symposium (CATS). CRPIT, vol. 41, pp. 57–60 (2005)
Papadimitriou, C.H.: Computational Complexity. Addison–Wesley, Reading (1993)
Valinsky, L., Della Vedova, G., Jiang, T., Borneman, J.: Oligonucleotide fingerprinting of rRNA genes for analysis of fungal community composition. Appl. Environ. Microbiol. 68(12), 5999–6004 (2002)
Valinsky, L., Della Vedova, G., Scupham, A., Alvey, S., Figueroa, A., Yin, B., Hartin, R., Chrobak, M., Crowley, D., Jiang, T., Borneman, J.: Analysis of bacterial microbial community composition by oligonucleotide fingerprinting of rRNA genes. Appl. Environ. Microbiol. 68(7), 3243–3250 (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bonizzoni, P., Della Vedova, G., Dondi, R. et al. Fingerprint Clustering with Bounded Number of Missing Values. Algorithmica 58, 282–303 (2010). https://doi.org/10.1007/s00453-008-9265-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-008-9265-0