Why Data Mining for Terrorists Doesn't Work
Wired magazine's Bruce Schneier does an excellent job explaining why data mining is not a reliable way to uncover terrorist plots.
The root problem he cites is a lack of a well-defined profile of terrorists and their activities. As we have seen, Al-Quaeda in particular has invented new tactics for each attack, has never concentrated on a single geographic location, and have provided no discernable pattern for how they time their activities. In short, we have little to no idea about what specifically characterizes terrorist activity, and thus we end up with thousands of false alarms each month.We'll be optimistic. We'll assume [a hypothetical data mining system] has a 1 in 100 false positive rate (99% accurate), and a 1 in 1,000 false negative rate (99.9% accurate).
Assume one trillion possible indicators to sift through: that's about ten events -- e-mails, phone calls, purchases, web surfings, whatever -- per person in the U.S. per day. Also assume that 10 of them are actually terrorists plotting.
This unrealistically-accurate system will generate one billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999% and you're still chasing 2,750 false alarms per day -- but that will inevitably raise your false negatives, and you're going to miss some of those ten real plots.