Inference about the number of contributors to a DNA mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method

https://doi.org/10.1016/j.fsigen.2012.03.006Get rights and content

Abstract

In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to ‘explain’ an observed set of alleles. The other procedure is probabilistic using Bayes’ theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.

Introduction

“Given typing results on one or several loci, what assumptions should be made about the number of contributors?” This is a recurrent question that arises in the context of DNA typing of biological staining, in particular when allelic configurations are observed that cannot be explained by a single contributor. In a strict sense, the true number of contributors to a given sample cannot – in view of the currently used STR polymorphisms – be known with certainty. Because of possible effects of masking [1], the fact that no more than two alleles are observed at any locus in a profile does not imply that a stain could not be a mixture.

Notwithstanding, a widely followed conventional approach to mixture assessment considers explicit assumptions about the unknown (untyped) contributors to an evidential mixture under each of the competing propositions. Most often, assessing a mixture consists of a comparison between the probabilities of obtaining the typing results given that a specified individual – that is a named suspect – is (is not) the source of the crime stain, along with a certain number of additional untyped individuals e.g., [2]. The result of such a comparison is a likelihood ratio that provides an expression of the degree of discrimination among the competing propositions.

In the context, literature also reported on procedures that allow one to obtain an upper bound for the number of unknown persons that need to be considered as contributors to a mixed stain [3]. Much of current discussions on the ‘number of contributors’ issue involves, however, a breach of argument. Terms such as ‘determining’ the numbers of contributors are regularly encountered but this opposes to the scientist's practical impossibility to set numbers for contributors accurately. Besides, this also suggests a deterministic view about the unknown number of contributors. It will be part of this paper to take some closer look at the relative performance of a typical categoric perspective of this kind, that is a method known as the maximum allele count. In a strict sense, this method is not a proper inference procedure, but merely a rule that sets the lower bound on the number of contributors to the minimum required to explain the set of alleles observed in a mixture.

Yet other contributions emphasise the idea of ‘estimating’ the number of contributors. There are frequentist procedures, for instance, that have a given number of contributors as their output. An example for this is the recent report on the maximum likelihood estimator [4], [5]. This procedure selects that number of contributors for which the probability of the observed allelic configuration is maximal. Here, this approach is not pursued because it does not lead to a genuine expression of uncertainty about the number of contributors to a mixed stain (i.e., in terms of a probability distribution).

Approaches that have a fixed number of contributors as their output have some appeal because of their ease of application. In particular, they allow scientists to calculate the probability of an allelic configuration in a single step. This makes it unnecessary to account for several different numbers of contributors along with their respective probability, as required, for example, by the likelihood ratio approach of Brenner et al. [6]. It is questionable, however, whether the sole argument of ease of application should be considered as sufficient to justify a practical application. In fact, forensic scientists may be required to inform recipients of expert evidence about how well a chosen procedure performs. But this, in order to be of some value, asks for a comparison with the performance of an alternative procedure.

This paper intends to approach this aspect by an investigation and comparison of the potential of two procedures, based on simulated mixtures with varying numbers of contributors. One approach is the maximum allele count method, chosen as an example for a deterministic procedure. As a second approach, Bayesian inference is chosen as a probabilistic alternative. Bayesian inference is retained here as a method for belief revision about different numbers of contributors based on a mixture's allelic configuration as well as the relative rarity of the various observed alleles. It is such beliefs that are needed to weight the probability of observing a mixture's allelic configuration according to various numbers of contributors [6]. As an aside, this will also serve as an argument in support of the feasibility of an informed specification of a probability distribution for the number of contributors. This is of interest because practitioners sometimes criticise or do not use the approach of Brenner et al. [6] because of its involvement of probabilities (for numbers of contributors) that are claimed to be difficult to find.

This paper is structured as follows. Section 2 presents the general methodology and (computational) procedures (based, in part, on graphical models) used for (i) the simulation of DNA mixture profiles, (ii) the revision of beliefs about various numbers of contributors (for each sampled mixture profile) and (iii) the scoring of these inferences. The results of these analyses are presented in Section 3. A discussion with reference to a case example and conclusions are given in Section 4.

Section snippets

Software

The general computational environment chosen for this study is R, a widely used free software for statistical computing and graphics [7]. R was used for simulating STR profiling data and combining these in order to produce DNA mixture profiles. The program R was also used to write and apply a routine for processing the simulated DNA mixture data for finding the maximum allele count and, thus, the minimum number of contributors that, in combination, could have produced each conceptual DNA

Ten loci DNA mixture profiles

Four different sets of 100 DNA mixture profiles at 10 loci were simulated. For the first set of 100 profiles, the number of contributors for each mixture, that is two, three or four, was sampled with equal probability. For the three other sets of 100 DNA mixtures, the number of contributors was sampled with, respectively, the vectors of probabilities {0.4, 0.4, 0.2}, {0.5, 0.4, 0.1} and {0.1, 0.45, 0.45}. Applying the maximum allele count and Bayesian inference to each mixture profile, and

Need for a balanced approach

The widely used maximum allele count method owes much of its popularity to its ease and rapidity of application. Although there is evidence that suggests that this procedure has some appealing performance when the mixture involves a low number of contributors (e.g., [4]), this cannot be put forward as a justification for using this method in practice because in actual casework, the true number of contributors is typically unknown. According to the observations in this study, it may be generally

Conflict of interest

None of the authors A. Biedermann, S. Bozza, K. Konis, F. Taroni has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the paper entitled “Inference about the number of contributors to a DNA mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method”.

References (23)

  • C.H. Brenner et al.

    Likelihood ratios for mixed stains when the number of donors cannot be agreed

    Int. J. Legal Med.

    (1996)
  • Cited by (0)

    View full text