A new search engine being developed by Darpa aims to shine a light on the dark web and uncover patterns and relationships in online data to help law enforcement and others track illegal activity.
The project, dubbed Memex, has been in the works for a year and is being developed by 17 different contractor teams who are working with the military’s Defense Advanced Research Projects Agency. Google and Bing, with search results influenced by popularity and ranking, are only able to capture approximately five percent of the internet. The goal of Memex is to build a better map of more internet content.
“The main issue we’re trying to address is the one-size-fits-all approach to the internet where [search results are] based on consumer advertising and ranking,” says Dr. Chris White, the program manager for Memex, who gave a demo of the engine to the 60 Minutes news program.
To achieve this goal, Memex will not only scrape content from the millions of regular web pages that get ignored by commercial search engines but will also chronicle thousands of sites on the so-called Dark Web—such as sites like the former Silk Road drug emporium that are part of the TOR network’s Hidden Services.
These sites, which have .onion web addresses, are accessible only through the TOR browser and only to those who know a site’s specific address. Although sites do exist that index some Hidden Services pages—often around a specific topic—and there is even already a search engine called Grams for uncovering sites selling illicit drugs and other contraband, the majority of Hidden Services remain well under the radar.
White says part of the Memex project is aimed at determining just how much of TOR traffic is related to Hidden Services sites. “The best estimates before were in the single digits—in the one-thousands,” he says. “But we think there are, at any given time, between 30,000 and 40,000 Hidden Service Onion sites that have content on them that one could index.”
The content on Hidden Services is public—in the sense that it’s not password protected—but is not readily accessible through a commercial search engine. “We’re trying to move toward an automated mechanism of finding [Hidden Services sites] and making the public content on them accessible,” White says. The Darpa team also wants to find a way to better understand the turnover of such sites—the relationships that exist for example between two sites when one goes down and a seemingly unrelated site pops up.