Inference about the number of contributors to a DNA mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method
Introduction
“Given typing results on one or several loci, what assumptions should be made about the number of contributors?” This is a recurrent question that arises in the context of DNA typing of biological staining, in particular when allelic configurations are observed that cannot be explained by a single contributor. In a strict sense, the true number of contributors to a given sample cannot – in view of the currently used STR polymorphisms – be known with certainty. Because of possible effects of masking [1], the fact that no more than two alleles are observed at any locus in a profile does not imply that a stain could not be a mixture.
Notwithstanding, a widely followed conventional approach to mixture assessment considers explicit assumptions about the unknown (untyped) contributors to an evidential mixture under each of the competing propositions. Most often, assessing a mixture consists of a comparison between the probabilities of obtaining the typing results given that a specified individual – that is a named suspect – is (is not) the source of the crime stain, along with a certain number of additional untyped individuals e.g., [2]. The result of such a comparison is a likelihood ratio that provides an expression of the degree of discrimination among the competing propositions.
In the context, literature also reported on procedures that allow one to obtain an upper bound for the number of unknown persons that need to be considered as contributors to a mixed stain [3]. Much of current discussions on the ‘number of contributors’ issue involves, however, a breach of argument. Terms such as ‘determining’ the numbers of contributors are regularly encountered but this opposes to the scientist's practical impossibility to set numbers for contributors accurately. Besides, this also suggests a deterministic view about the unknown number of contributors. It will be part of this paper to take some closer look at the relative performance of a typical categoric perspective of this kind, that is a method known as the maximum allele count. In a strict sense, this method is not a proper inference procedure, but merely a rule that sets the lower bound on the number of contributors to the minimum required to explain the set of alleles observed in a mixture.
Yet other contributions emphasise the idea of ‘estimating’ the number of contributors. There are frequentist procedures, for instance, that have a given number of contributors as their output. An example for this is the recent report on the maximum likelihood estimator [4], [5]. This procedure selects that number of contributors for which the probability of the observed allelic configuration is maximal. Here, this approach is not pursued because it does not lead to a genuine expression of uncertainty about the number of contributors to a mixed stain (i.e., in terms of a probability distribution).
Approaches that have a fixed number of contributors as their output have some appeal because of their ease of application. In particular, they allow scientists to calculate the probability of an allelic configuration in a single step. This makes it unnecessary to account for several different numbers of contributors along with their respective probability, as required, for example, by the likelihood ratio approach of Brenner et al. [6]. It is questionable, however, whether the sole argument of ease of application should be considered as sufficient to justify a practical application. In fact, forensic scientists may be required to inform recipients of expert evidence about how well a chosen procedure performs. But this, in order to be of some value, asks for a comparison with the performance of an alternative procedure.
This paper intends to approach this aspect by an investigation and comparison of the potential of two procedures, based on simulated mixtures with varying numbers of contributors. One approach is the maximum allele count method, chosen as an example for a deterministic procedure. As a second approach, Bayesian inference is chosen as a probabilistic alternative. Bayesian inference is retained here as a method for belief revision about different numbers of contributors based on a mixture's allelic configuration as well as the relative rarity of the various observed alleles. It is such beliefs that are needed to weight the probability of observing a mixture's allelic configuration according to various numbers of contributors [6]. As an aside, this will also serve as an argument in support of the feasibility of an informed specification of a probability distribution for the number of contributors. This is of interest because practitioners sometimes criticise or do not use the approach of Brenner et al. [6] because of its involvement of probabilities (for numbers of contributors) that are claimed to be difficult to find.
This paper is structured as follows. Section 2 presents the general methodology and (computational) procedures (based, in part, on graphical models) used for (i) the simulation of DNA mixture profiles, (ii) the revision of beliefs about various numbers of contributors (for each sampled mixture profile) and (iii) the scoring of these inferences. The results of these analyses are presented in Section 3. A discussion with reference to a case example and conclusions are given in Section 4.
Section snippets
Software
The general computational environment chosen for this study is R, a widely used free software for statistical computing and graphics [7]. R was used for simulating STR profiling data and combining these in order to produce DNA mixture profiles. The program R was also used to write and apply a routine for processing the simulated DNA mixture data for finding the maximum allele count and, thus, the minimum number of contributors that, in combination, could have produced each conceptual DNA
Ten loci DNA mixture profiles
Four different sets of 100 DNA mixture profiles at 10 loci were simulated. For the first set of 100 profiles, the number of contributors for each mixture, that is two, three or four, was sampled with equal probability. For the three other sets of 100 DNA mixtures, the number of contributors was sampled with, respectively, the vectors of probabilities {0.4, 0.4, 0.2}, {0.5, 0.4, 0.1} and {0.1, 0.45, 0.45}. Applying the maximum allele count and Bayesian inference to each mixture profile, and
Need for a balanced approach
The widely used maximum allele count method owes much of its popularity to its ease and rapidity of application. Although there is evidence that suggests that this procedure has some appealing performance when the mixture involves a low number of contributors (e.g., [4]), this cannot be put forward as a justification for using this method in practice because in actual casework, the true number of contributors is typically unknown. According to the observations in this study, it may be generally
Conflict of interest
None of the authors A. Biedermann, S. Bozza, K. Konis, F. Taroni has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the paper entitled “Inference about the number of contributors to a DNA mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method”.
References (23)
- et al.
Towards understanding the effect of uncertainty in the number of contributors to DNA stains
Forensic Sci. Int.: Genet.
(2007) - et al.
Bounding the number of contributors to mixed DNA stains
Forensic Sci. Int.
(2002) - et al.
The predictive value of the maximum likelihood estimator of the number of contributors to a DNA mixture
Forensic Sci. Int.: Genet.
(2011) - et al.
The interpretation of low level DNA mixtures
Forensic Sci. Int.
(2012) - et al.
DNA mixtures in forensic casework: a 4-year retrospective study
Forensic Sci. Int.
(2003) - et al.
Identification and separation of DNA mixtures using peak area information
Forensic Sci. Int.
(2007) - et al.
Object-oriented Bayesian networks for complex forensic DNA profiling problems
Forensic Sci. Int.
(2007) - et al.
Bayesian networks for evaluating forensic DNA profiling evidence: a review and guide to literature
Forensic Sci. Int.: Genet.
(2012) - et al.
Interpreting DNA Evidence
(1998) - et al.
Estimating the number of contributors to forensic DNA mixtures: does maximum likelihood perform better than maximum allele count?
J. Forensic Sci.
(2011)