The database search problem: A question of rational decision making

https://doi.org/10.1016/j.forsciint.2012.05.023Get rights and content

Abstract

This paper applies probability and decision theory in the graphical interface of an influence diagram to study the formal requirements of rationality which justify the individualization of a person found through a database search. The decision-theoretic part of the analysis studies the parameters that a rational decision maker would use to individualize the selected person. The modeling part (in the form of an influence diagram) clarifies the relationships between this decision and the ingredients that make up the database search problem, i.e., the results of the database search and the different pairs of propositions describing whether an individual is at the source of the crime stain. These analyses evaluate the desirability associated with the decision of ‘individualizing’ (and ‘not individualizing’). They point out that this decision is a function of (i) the probability that the individual in question is, in fact, at the source of the crime stain (i.e., the state of nature), and (ii) the decision maker's preferences among the possible consequences of the decision (i.e., the decision maker's loss function). We discuss the relevance and argumentative implications of these insights with respect to recent comments in specialized literature, which suggest points of view that are opposed to the results of our study.

Introduction

The ‘classical’ database search problem, as it is known throughout forensic and legal theory and practice, relates to a question of the following kind: “What is the strength of the evidence against a given individual found through a database search, when that individual is the only person in the database who presents the same analytical characteristics (such as a DNA profile) as those observed on a crime stain?” After intense and controversial debates, starting in the mid-1990s, and which seemed to have been settled during the last decade, the database search problem has once more become the object of several publications [1], [2], [3], [4]. In particular, Schneider et al. [1] and Fimmers et al. [3] recently claimed that a single matching profile found through a database search reduces the evidential value of this match compared to a match found by other investigational means (i.e., a situation in which no database search was conducted). However, there are now ample counterarguments demonstrating that this is a misconception [5], [6], [7], [8], [9], [10] dating back to the NRC reports [11], [12] and writings by Stockmarr [13]. These latter accounts are constructed around a conceptually unsuitable pair of propositions, defined as follows:

  • Hdb: the source of the crime stain is in the database;

  • ¬Hdb: the source of the crime stain is not in the database.

This contradicts probabilistic arguments that demonstrate an increase in the evidential value of a single database match when one considers the conventional and procedurally appropriate pair of propositions, which take the form of:

  • Hi: the crime stain comes from individual i;

  • ¬Hi: the crime stain comes from someone else unrelated to individual i;

where i = 1, …, N, and N is the size of the population of individuals who could have been at the source of the crime stain. The increase for this pair of propositions is due to the exclusion of n  1 non-matching profiles (where n denotes the size of the database searched). This argument is now covered to a great extent in existing literature [5], [6], [7], [8], [9], [10], [14], [15], and currently appears to accumulate the most widespread support.

In their recent publication, Fimmers et al. [3] seek to take their argument in support of a decrease in the value of a database match a step further: they addressed the act of convicting a suspect and the probability that this conviction is false. That is, they passed from a purely probabilistic discourse to an argument invoking the act of choosing a particular option among several possible options. Their argumentation consists of a hypothetical case, in which investigators search for the individual at the source of a biological stain recovered on a crime scene. The investigators in this case consider a population of 100 million individuals (N = 108) as the population of potential sources, and possess a database containing the profiles of one million of these individuals (n = 106). In this population, the DNA profile of the crime stain has a match probability of γ = 10−6. However, in their example, Fimmers et al. [3] assume that the true source of the crime stain is not in the population considered by the investigators, that is, not among the N = 108 individuals, and consequently, not among the n = 106 profiles in the database (since the database contains profiles taken from the population of the N = 108 individuals). Assuming that “a suspect will certainly be convicted in every case in which the rarity of the corresponding DNA profile is [at least] one in a million” [3, p. 4],1 that is, when γ  10−6, they then compare the probability of a false conviction in a probable cause case2 to the probability of a false conviction in a database search case3 for an incriminating profile with a match probability of 10−6:

  • Probable cause case: “There is a suspect. The DNA profile of that person is determined and found to correspond to that of the crime stain. The person is going to be convicted on the sole basis of this correspondence. Given the assumptions in this example, we know that the conviction is erroneous, because the true author has escaped. How high are the odds, in our scenario, of this to happen by chance? The probability of the DNA profile is 1:1,000,000 and this is the probability for a correspondence by chance with the stain. The probability for a false decision is thus 0.000001.” [3, p. 4]4

  • Database search case: “There is no suspect. A search in the database is conducted, and exactly one person is found. That person is convicted on the basis of the same argument as that in scenario 1 [the probable cause case]. The conviction is of course, again, false, because the data of the true author are not stored in the database. What is the probability for such a false decision? The answer is somewhat more complicated than that in scenario 1 [the probable cause case]. An error occurs notably when exactly one person is found in the database. (…) We will find exactly one person with a probability of 0.368 (that is in approximately every third similar case), and this person will subsequently be convicted, even though the true author is not in the database. The probability for an error in scenario 2 [the database search case] is therefore considerably greater than in scenario 1 [the probable cause case].” [3, p. 4]1

Based on this reasoning, Fimmers et al. argue [p. 4]:

“The simple evaluation using a likelihood ratio, as proposed by Taroni et al. [21] is appropriate for the first scenario [the probable cause case], yet produces an unjustifiably high number of false decisions in the second scenario [the database search case]”.1

This is questionable, however, because a likelihood ratio in no way amounts to a categorical conclusion with respect to the process of individualization (i.e., the attribution of the trace to a single source to the exclusion of all other potential sources) [4]. In Fimmers et al.'s framework [3], every match results in a wrong individualization. Since every comparison of the crime stain's profile with the profile of an individual in the population has a probability of 10−6 of leading to a match, every comparison has a probability of 10−6 of leading to a false individualization. It is therefore hardly surprising that Fimmers et al.'s probability of a false individualization increases with the number of comparisons performed. In other words, one comparison in the probable cause case has a probability ofγ=106of matching the crime stain's profile, whereas one million comparisons in the database search case have a probability ofnγ(1γ)n1=0.368of leading to a match with the crime stain's profile. This reasoning process consists of an unrealistic deduction based solely on the evidence (i.e., the observed match and the match probability of the crime stain's profile). It is combined with an unusual definition of a population of potential sources, which does not contain the true source, and a definition of the decision as a categorical consequence of a match whenever γ  10−6.

There are many points to discuss regarding the arguments advanced by Fimmers et al. [3]. This paper treats the following three aspects:

  • (A)

    The decision of ‘individualizing’ an individual as the source of a crime stain having a match probability of γ = 10−6 in a population of 100 million potential sources (N = 108) after obtaining a single hit with this individual in a database containing 1 million of these potential sources (n = 106).

  • (B)

    The assumption that the true source of the crime stain is not in the population considered by the investigators.

  • (C)

    The conclusion that a probability of a false individualization is considerably greater in the database search case than in the probable cause case.

Throughout this paper, we will refer to these claims as points A, B and C.

Section snippets

Structure and contents of this paper

In this paper, we invoke decision theory to analyze the issue of how to decide to ‘convict’, or rather to ‘individualize’, the matching individual found in a database. The aim is to compare Fimmers et al.'s conclusions [3] (points A, B and C) with the results obtained from a decision-theoretic approach to the database search problem. Section 3 will present a decision-theoretic approach to the database search problem, using the visual representation of an influence diagram to clarify the

Preliminaries

Decision theory has provided a logical framework for solving several forensic decision problems [18], [19], [20], [21]. Here, we are interested in the process of ‘individualization’, that is, the attribution of a trace to a single source to the exclusion of all other potential sources. Notably, the act of ‘individualizing’, or ‘not individualizing’, a person or an object can be conceptualized as a decision made on the basis of the inferences resulting from the probabilistic evaluation of

Preliminaries

According to Fig. 3, Fig. 4 where the loss function is specified for λ = 1/10, it is not rational to individualize a suspect found through a database search given the numerical values presented in Fimmers et al.'s hypothetical case (i.e., N = 108, n = 106 and γ = 10−6 as given in point A). If the assumption in point B holds (i.e., S in N = false), it is impossible for the crime stain to come from Mr. Smith, because the crime stain does not come from someone in the population considered by the

Probability of a false individualization

Fig. 8 extends the influence diagram presented in Section 3.3 to include the probability of a false individualization in a node labelled C, in the same way as was done in [4]. This node describes the event of a correct conclusion as a Boolean variable that takes the state oftruefor,Hiai¬Hi¬ai,andfalseotherwise.This influence diagram shows that, logically, the probability of a false individualization is equal to Pr Hi|Mi, X1, …, Xi−1, Xi+1, …, Xn) in node Hi, in a situation where the

Discussion and conclusions

Fimmers et al. [3] concluded that the probability of a false individualization15 is greater in the database

Acknowledgements

This research was supported by the Swiss National Science Foundation grant no. 100014-135340. The authors also wish to thank the two anonymous reviewers for their helpful and constructive comments.

References (34)

  • A. Biedermann et al.

    Recent misconceptions about the ‘database search problem’: a probabilistic analysis using Bayesian networks

    Forensic Sci. Int.

    (2011)
  • R. Cook et al.

    A hierarchy of propositions: deciding which level to address in casework

    Sci. Justice

    (1998)
  • A. Biedermann et al.

    Decision theoretic properties of forensic identification: underlying logic and argumentative implications

    Forensic Sci. Int.

    (2008)
  • P.M. Schneider et al.

    Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA–Datenbank–Treffern (Recommendations of the German Stain Commission regarding the statistical evaluation of matches following searches in the national DNA database)

    Rechtsmedizin

    (2010)
  • F. Taroni et al.

    Letter to the Editor with reference to Schneider et al. “Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA-Datenbank-Treffern” (“Recommendations of the German Stain Commission regarding the statistal evaluation of matches following searches in the national DNA database”)

    Rechtsmedizin

    (2011)
  • R. Fimmers et al.

    Reply to the Letter to the Editor of Taroni et al. with reference to Schneider et al. “Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA-Datenbank-Treffern” (“Recommendations of the German Stain Commission regarding the statistal evaluation of matches following searches in the national DNA database”)

    Rechtsmedizin

    (2011)
  • D.J. Balding et al.

    Evaluating DNA profile evidence when the suspect is identified through a database search

    J. Forensic Sci.

    (1996)
  • I.W. Evett et al.

    Interpreting DNA Evidence

    (1998)
  • P. Donnelly et al.

    DNA database searches and the legal consumption of scientific evidence

    Mich. Law Rev.

    (1999)
  • I.W. Evett et al.

    Letter to the editor of Biometrics

    Biometrics

    (2000)
  • A.P. Dawid

    Comment on Stockmarr's “Likelihood ratios for evaluating DNA evidence when the suspect is found through a database search”

    Biometrics

    (2001)
  • D.J. Balding

    The DNA database search controversy

    Biometrics

    (2002)
  • National Research Council Committee on DNA Technology in Forensic Science, DNA technology in forensic science,...
  • National Research Council Committee on DNA Forensic Science: An Update, The evaluation of forensic DNA evidence,...
  • A. Stockmarr

    Likelihood ratios for evaluating DNA evidence when the suspect is found through a database search

    Biometrics

    (1999)
  • D.J. Balding et al.

    Inference in forensic identification

    J. R. Stat. Soc. Ser. A

    (1995)
  • D.H. Kaye

    Rounding up the usual suspects: a legal and logical analysis of DNA trawling cases

    N. C. Law Rev.

    (2009)
  • Cited by (0)

    View full text