The database search problem: A question of rational decision making
Introduction
The ‘classical’ database search problem, as it is known throughout forensic and legal theory and practice, relates to a question of the following kind: “What is the strength of the evidence against a given individual found through a database search, when that individual is the only person in the database who presents the same analytical characteristics (such as a DNA profile) as those observed on a crime stain?” After intense and controversial debates, starting in the mid-1990s, and which seemed to have been settled during the last decade, the database search problem has once more become the object of several publications [1], [2], [3], [4]. In particular, Schneider et al. [1] and Fimmers et al. [3] recently claimed that a single matching profile found through a database search reduces the evidential value of this match compared to a match found by other investigational means (i.e., a situation in which no database search was conducted). However, there are now ample counterarguments demonstrating that this is a misconception [5], [6], [7], [8], [9], [10] dating back to the NRC reports [11], [12] and writings by Stockmarr [13]. These latter accounts are constructed around a conceptually unsuitable pair of propositions, defined as follows:
Hdb: the source of the crime stain is in the database;
¬Hdb: the source of the crime stain is not in the database.
This contradicts probabilistic arguments that demonstrate an increase in the evidential value of a single database match when one considers the conventional and procedurally appropriate pair of propositions, which take the form of:
Hi: the crime stain comes from individual i;
¬Hi: the crime stain comes from someone else unrelated to individual i;
In their recent publication, Fimmers et al. [3] seek to take their argument in support of a decrease in the value of a database match a step further: they addressed the act of convicting a suspect and the probability that this conviction is false. That is, they passed from a purely probabilistic discourse to an argument invoking the act of choosing a particular option among several possible options. Their argumentation consists of a hypothetical case, in which investigators search for the individual at the source of a biological stain recovered on a crime scene. The investigators in this case consider a population of 100 million individuals (N = 108) as the population of potential sources, and possess a database containing the profiles of one million of these individuals (n = 106). In this population, the DNA profile of the crime stain has a match probability of γ = 10−6. However, in their example, Fimmers et al. [3] assume that the true source of the crime stain is not in the population considered by the investigators, that is, not among the N = 108 individuals, and consequently, not among the n = 106 profiles in the database (since the database contains profiles taken from the population of the N = 108 individuals). Assuming that “a suspect will certainly be convicted in every case in which the rarity of the corresponding DNA profile is [at least] one in a million” [3, p. 4],1 that is, when γ ≤ 10−6, they then compare the probability of a false conviction in a probable cause case2 to the probability of a false conviction in a database search case3 for an incriminating profile with a match probability of 10−6:
- •
Probable cause case: “There is a suspect. The DNA profile of that person is determined and found to correspond to that of the crime stain. The person is going to be convicted on the sole basis of this correspondence. Given the assumptions in this example, we know that the conviction is erroneous, because the true author has escaped. How high are the odds, in our scenario, of this to happen by chance? The probability of the DNA profile is 1:1,000,000 and this is the probability for a correspondence by chance with the stain. The probability for a false decision is thus 0.000001.” [3, p. 4]4
- •
Database search case: “There is no suspect. A search in the database is conducted, and exactly one person is found. That person is convicted on the basis of the same argument as that in scenario 1 [the probable cause case]. The conviction is of course, again, false, because the data of the true author are not stored in the database. What is the probability for such a false decision? The answer is somewhat more complicated than that in scenario 1 [the probable cause case]. An error occurs notably when exactly one person is found in the database. (…) We will find exactly one person with a probability of 0.368 (that is in approximately every third similar case), and this person will subsequently be convicted, even though the true author is not in the database. The probability for an error in scenario 2 [the database search case] is therefore considerably greater than in scenario 1 [the probable cause case].” [3, p. 4]1
“The simple evaluation using a likelihood ratio, as proposed by Taroni et al. [21] is appropriate for the first scenario [the probable cause case], yet produces an unjustifiably high number of false decisions in the second scenario [the database search case]”.1
This is questionable, however, because a likelihood ratio in no way amounts to a categorical conclusion with respect to the process of individualization (i.e., the attribution of the trace to a single source to the exclusion of all other potential sources) [4]. In Fimmers et al.'s framework [3], every match results in a wrong individualization. Since every comparison of the crime stain's profile with the profile of an individual in the population has a probability of 10−6 of leading to a match, every comparison has a probability of 10−6 of leading to a false individualization. It is therefore hardly surprising that Fimmers et al.'s probability of a false individualization increases with the number of comparisons performed. In other words, one comparison in the probable cause case has a probability ofof matching the crime stain's profile, whereas one million comparisons in the database search case have a probability ofof leading to a match with the crime stain's profile. This reasoning process consists of an unrealistic deduction based solely on the evidence (i.e., the observed match and the match probability of the crime stain's profile). It is combined with an unusual definition of a population of potential sources, which does not contain the true source, and a definition of the decision as a categorical consequence of a match whenever γ ≤ 10−6.
There are many points to discuss regarding the arguments advanced by Fimmers et al. [3]. This paper treats the following three aspects:
- (A)
The decision of ‘individualizing’ an individual as the source of a crime stain having a match probability of γ = 10−6 in a population of 100 million potential sources (N = 108) after obtaining a single hit with this individual in a database containing 1 million of these potential sources (n = 106).
- (B)
The assumption that the true source of the crime stain is not in the population considered by the investigators.
- (C)
The conclusion that a probability of a false individualization is considerably greater in the database search case than in the probable cause case.
Section snippets
Structure and contents of this paper
In this paper, we invoke decision theory to analyze the issue of how to decide to ‘convict’, or rather to ‘individualize’, the matching individual found in a database. The aim is to compare Fimmers et al.'s conclusions [3] (points A, B and C) with the results obtained from a decision-theoretic approach to the database search problem. Section 3 will present a decision-theoretic approach to the database search problem, using the visual representation of an influence diagram to clarify the
Preliminaries
Decision theory has provided a logical framework for solving several forensic decision problems [18], [19], [20], [21]. Here, we are interested in the process of ‘individualization’, that is, the attribution of a trace to a single source to the exclusion of all other potential sources. Notably, the act of ‘individualizing’, or ‘not individualizing’, a person or an object can be conceptualized as a decision made on the basis of the inferences resulting from the probabilistic evaluation of
Preliminaries
According to Fig. 3, Fig. 4 where the loss function is specified for λ = 1/10, it is not rational to individualize a suspect found through a database search given the numerical values presented in Fimmers et al.'s hypothetical case (i.e., N = 108, n = 106 and γ = 10−6 as given in point A). If the assumption in point B holds (i.e., S in N = false), it is impossible for the crime stain to come from Mr. Smith, because the crime stain does not come from someone in the population considered by the
Probability of a false individualization
Fig. 8 extends the influence diagram presented in Section 3.3 to include the probability of a false individualization in a node labelled C, in the same way as was done in [4]. This node describes the event of a correct conclusion as a Boolean variable that takes the state ofThis influence diagram shows that, logically, the probability of a false individualization is equal to Pr(¬ Hi|Mi, X1, …, Xi−1, Xi+1, …, Xn) in node Hi, in a situation where the
Discussion and conclusions
Fimmers et al. [3] concluded that the probability of a false individualization15 is greater in the database
Acknowledgements
This research was supported by the Swiss National Science Foundation grant no. 100014-135340. The authors also wish to thank the two anonymous reviewers for their helpful and constructive comments.
References (34)
- et al.
Recent misconceptions about the ‘database search problem’: a probabilistic analysis using Bayesian networks
Forensic Sci. Int.
(2011) - et al.
A hierarchy of propositions: deciding which level to address in casework
Sci. Justice
(1998) - et al.
Decision theoretic properties of forensic identification: underlying logic and argumentative implications
Forensic Sci. Int.
(2008) - et al.
Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA–Datenbank–Treffern (Recommendations of the German Stain Commission regarding the statistical evaluation of matches following searches in the national DNA database)
Rechtsmedizin
(2010) - et al.
Letter to the Editor with reference to Schneider et al. “Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA-Datenbank-Treffern” (“Recommendations of the German Stain Commission regarding the statistal evaluation of matches following searches in the national DNA database”)
Rechtsmedizin
(2011) - et al.
Reply to the Letter to the Editor of Taroni et al. with reference to Schneider et al. “Allgemeine Empfehlungen der Spurenkommission zur statistischen Bewertung von DNA-Datenbank-Treffern” (“Recommendations of the German Stain Commission regarding the statistal evaluation of matches following searches in the national DNA database”)
Rechtsmedizin
(2011) - et al.
Evaluating DNA profile evidence when the suspect is identified through a database search
J. Forensic Sci.
(1996) - et al.
Interpreting DNA Evidence
(1998) - et al.
DNA database searches and the legal consumption of scientific evidence
Mich. Law Rev.
(1999) - et al.
Letter to the editor of Biometrics
Biometrics
(2000)