More bad press on the practice of trawling DNA databases to locate suspects is here. Writing in the Washington Monthly
, Michael Bobelian reveals "DNA's Dirty Little Secret
" -- "a tool renowned for exonerating the innocent may actually be putting a growing number of them behind bars." The centerpiece in the expose is the difficult case of People v. Puckett
To some extent, the article rehashes previous writing on Puckett
in the Los Angeles Times
, San Francisco Magazine
, and the California Lawyer
, but it also makes mincemeat of statistics. Having analyzed one aspect of the case elsewhere
, I shall limit myself to commenting on a few choice excerpts from this story."[T]he jury was told that the
chance that a random person's DNA would match that found at the crime
scene was one in 1.1 million.
Puckett's were an ordinary criminal case, this figure might have been
accurate. Indeed, when police use fresh DNA material to link a crime
directly to a suspect identified through eyewitness accounts or other
evidence, the chances of accidentally hitting on an innocent person are
extraordinarily slim. But when suspects are found by combing through
large databases, the odds are exponentially higher. In Puckett's case
the actual chance of a false match is a staggering one in three ... ."
One in 1.1 million was an accurate statement of "the
chance that a random person's DNA would match that found at the crime
scene" at the loci in question. There was a bona fide dispute at trial over which loci to count as matching and what the resulting random-match should have been, but the probability that a person plucked at random from the Caucasian population will have DNA that matches the loci in question does not grow larger ("exponentially" or otherwise) because Puckett's name emerged from a trawl through the state database. The random match probability is just the frequency in the population. This number is what it is.
Of course, there is a dispute over the use of the random-match probability to express the probative value of a DNA match arising from a database trawl. Many statisticians agree that, if anything, the fact of the search enhances the probative value of the match, primarily because it not only identifies a matching profile (as in the non-database-search case) but also eliminates as possible contributors thousands or even millions of individuals. ("Dirty Little Secret" keeps this fact secret.) A respectable minority of the statisticians who have written on the subject, however, maintain that the random-match probability (p
) should be inflated by the size of the database (N
) -- the Np
rule. The Np
statistic is an upper bound on the probability that there would be a match to one or more profiles in a database composed entirely of profiles from people innocent of the crime under investigation.
However ones resolves the debate on the relevance of the innocent database match probability, it is misleading to suggest that "[i]n Puckett's case
the actual chance of a false match is a staggering one in three ... ." The stubborn fact is that we do not know the actual chance that the match in Puckett's case was true or false. The probability of 338,000 x 1/1,100,000 = 1/3 assumes
that the database is innocent. To slide from the innocent-database-match probability to the probability that the match was to an innocent man named Puckett is to commit the transposition fallacy condemned in the recent Supreme Court case of McDaniel v. Brown
(noted in a posting
on January 18, 2010). "In cases where a
suspect is found by searching through large databases, the chances of
accidentally hitting on the wrong person are orders of magnitude
This statement is much better. For large N
is much greater than p
. If there are many trawl cases with only a few loci to search, some hits to innocent people will occur. For example, if a database of 500,000 profiles from individuals is trawled 1,000 times for matches to crime-scene samples that each have a random-match probability of one in million, and if all the contributors of the profiles and the crime-scene samples are unrelated, then the expected number of adventitious matches will be 500.
Nonetheless, "hitting on the wrong person" will not always put people behind bars. For one thing, if the database is not innocent -- if it includes the
culprit -- there will be more than one match if an accidental" match
occurs. As a rule, the "accident" is unlikely to be the one prosecuted. Moreover, even if only one suspect emerges, it often will be easy to eliminate the innocent ones. The 2004 database used in Puckett
, for example, contained profiles from many people who were not even alive over 30 years ago, when Diane Sylvester was sexually assaulted and stabbed. Hits to those wrong persons could not lead to false prosecutions. Thus, the Np statistic exaggerates the true danger to the criminal justice system of the practice of trawling databases to find investigative leads. However, a nonzero danger remains."[T]he little information that has come to light about the actual rate of
coincidental matches in offender databases suggests the chances of
hitting on the wrong person may be even higher than the Database Match
Probability suggests. In 2005, Barlow heard that an Arizona state
employee named Kathryn Troyer had run a series of tests on the state's
DNA database, which at the time included 65,000 profiles, and found
multiple people with nine or more identical markers. If you believe the
FBI's rarity statistics, this was all but impossible--the chances of any
two people in the general population sharing that many markers was
supposed to be about one in 750 million, while the Database Match
Probability for a nine-marker match in a system the size of Arizona's
is roughly one in 11,000."
These remarks are so confused that it is hard to know where to begin. A series of papers in the scientific and legal literature (reviewed in Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?
) have shown that the Arizona numbers of partial matches are roughly what one would expect if the theoretical random-match probabilities are accurate. Studies of offender databases in other counties also confirm the theoretical estimates for matches at moderate numbers of loci.
The effort to apply "the Database Match
Probability" of Np
= 65,000 x 1/750,000,000 = 1/11,538 in this context is nonsensical. The Np
formula applies to a search involving a single nine-locus DNA profile as against N
= 65,000 nine-locus DNA profiles in the database. The Arizona trawl was totally different. It was an all pairs
search of N(N-1)
/2 = 2,112,467,500 pairs of 13-locus profiles for matches at any combination
of 9 or more loci
. There are 715 ways
to get a nine-locus match in a database of 13-locus profiles. Instead of a mere 65,000 comparisons, this peculiar trawl (not representative of a real database search) involved 715 x 2,112,467,500 = 1,500,000,000,000 comparisons! It is no wonder that matches at as many as nine loci were observed.
In short, the paragraph mixes two distinct issues: (a) whether expert witnesses should use Np
instead of p
to explain a match arising from an ordinary database trawl to a jury, and (b) whether the p
as currently computed ("laughably" according to Mr. Bobelian's source) is a reasonable estimate of the random-match probability -- regardless of how the defendant was selected for prosecution.Jurors told the Los Angeles Times
that the one-in-1.1-million statistic had been pivotal to their
decision. Asked whether the jury might have reached a different
conclusion if they had been presented with the one-in-three figure,
juror Joe Deluca replied, "Of course it would have changed things. It
would have changed a lot of things."
What did the Los Angeles reporters who interviewed the poor juror say that "1 in 3" meant? Their article
mischaracterizes it as "the probability that the database search had hit upon an innocent person." As noted above, no one knows the probability that this search hit upon an innocent person. We know only that if
Puckett and everyone in the database were innocent, then the chance that at least one person would have matched could have been no larger than about 1/3. Was it error to keep this information from the jury? Mr. Puckett's opening brief
and the state's brief
give different answers. They are better sources of information about the case than is DNA's Dirty Little Secret
Michael Bobelian, DNA's Dirty Little Secret
, Washington Monthly, Mar.-April 2010, available at http://www.washingtonmonthly.com/features/2010/1003.bobelian.html
Charles Brenner, Arizona DNA Database Matches, http://dna-view.com/ArizonaMatch.htm
Jason Felch and Maura Dolan, DNA Matches Aren't Always a Lock
, Los Angeles Times, May 3, 2008, available at http://www.latimes.com/news/local/la-me-dna4-2008may04,0,6156934,full.story
David H. Kaye, Rounding Up the Usual Suspects: A Legal and Logical Analysis of DNA Database Trawls
North Carolina Law Review, Vol. 87, No. 2, January 2009, pp. 425-503.
-----, Trawling DNA Databases for Partial Matches: What Is the FBI Afraid Of?
Cornell Journal of Law and Public Policy, Vol. 19, No. 1, Fall 2009, pp.
145-171Appellant's Opening Brief
, People v. Puckett, available at http://www.personal.psu.edu/dhk3/dhblog/AOB(Puckett-CA).pdfRespondent's Brief
, People v. Puckett, available at http://www.personal.psu.edu/dhk3/dhblog/ROB(Puckett-CA).pdf