Analysis: As Large Language Model AI Hogs the Spotlight, Purpose-Built Surveillance AI Toils in Its Shadow
Posted on March 31, 2025
A new profile published in El Pais on Palantir, a US-based AI surveillance analytics company, raises urgent questions on the effect that AI will have on digital surveillance. The tech company has seen an explosion in its stock valuation over the course of a few years. Given that its client base is primarily government agencies, Palantir’s practical overnight success prompts serious examination of the threats to privacy posed by tools like those which Palantir deals in.
This is by no means the first instance of private corporations providing government agencies with tools that enable surveillance. CCDBR has highlighted this on numerous occasions, particularly in the case of “stingrays” (also known as “cell-site simulators”). But as the article points out, Palantir’s products do not themselves carry out the digital monitoring, but are applied to collected data to analyze it. In this way, Palantir does not draw the same scrutiny that hacking tool purveyors like NSO Group do.
However, as this analysis will endeavor to show, AI analysis tools for collected data are no less menacing to digital privacy than the collection tools they rely upon.
First, commercial surveillance analysis tools allow the government to bury its activities deeper in the shadows, yielding a more durable, reliable, and extensive surveillance regime. One way such products solidify government spying programs is by giving agencies more cover to hide behind. Since Freedom of Information Act (FOIA) laws have carve-outs to prevent the disclosure of vendors’ trade secrets, whenever a commercial products are entangled with government programs, the government has the option to invoke this exception in the hopes of deflecting pesky FOIA requests. Some of CCDBR’s closest civil liberties allies, such as Lucy Parsons Labs, have stated as much in training workshops presented by CCDBR.
Another way is by helping the government leverage data mining at one more level of remove. Federal law enforcement and intelligence agencies alike have shown themselves to have no compunctions about outright buying data on Americans that they would otherwise need search warrants to obtain. While this has been so far ruled to be legal (see Smith v. Maryland and its attendant “third-party doctrine”), the practice has started to face increased scrutiny. But by using another company’s analysis tools, which themselves incorporate troves purchased from data mining, then the commercial acquisition of data is moved off the government’s books. This would not be uncommon in the intelligence world, as intelligence agencies refer to the practice of supplementing their own collected material with third-party data as “enrichment.” Thus, any dataset that can be purchased from data brokers could be used to augment data from the government, and this additional data would not be available for review by the public, potentially even in court.
It should be noted that this practice is not confirmed to be the case with Palantir’s tools, but it is certainly within the technical and legal capabilities of a company in Palantir’s position. Also, again, Palantir is held up as an example of a class of companies and tools, not as a unique phenomenon.
And yet another way that analytics applications will aid government in concealing its surveillance activities is by extending the process of parallel construction, making intelligence gathering more invasive in the process. Revealed alongside the Snowden disclosures, “Parallel construction” is a technique in which law enforcement devises and supplies its own legal justification for evidence it derived from warrantless surveillance. When intelligence is used for its intended purpose, military threat assessment and targeting, it is not subject to the usual limits imposed on law enforcement for the simple reason that it is not enforcing a law. It is (in theory) combating foreign threats on foreign soil. Applying military surveillance to domestic law enforcement is where constitutional guardrails come into force. The Fourth Amendment restricts search and seizure to only that for which a judge grants a search warrant. So because the nature of military surveillance is to search without a warrant, even clear evidence of illegality obtained through this means would be inadmissible in court.
To circumvent this, law enforcement may engage in parallel construction by taking the inadmissible intelligence-derived evidence—sometimes passed to it by higher-level government agencies—and then fabricating their own pretext to locate the same evidence in ways that are admissible in court. As long as the defense never finds out that the search was precipitated by inadmissible intelligence, law enforcement can carry on their investigation as if they did all the legwork without resorting to snooping.
Up to this point, enumerating the hazards posed by surveillance analysis software has not accounted for the integration of AI. The type of AI harnessed in tools like Palantir’s is not the large language model (LLM) generative AI dominating headlines for its ability to churn out thousands of words of text on demand. Rather, the AI behind data analysis tools tends to be of the more traditional machine learning or deep learning designs. Think of these tools as akin to web search engine models—yes, those are AI—which take a body of existing data and tailor its conclusions to the user’s stated and unstated preferences, as observed through the user’s interaction with it. The example model would not only track search terms entered but the times at which various searches are executed, which results were selected (and which were not), whether the search was followed up in quick succession with a search with the same thrust but slightly different wording, and many other factors.
Intelligence analysis AI would function similarly. It would adjust its findings based on what targets the user searched for, the pattern of target searches, relationships (real or surmised) between recently queried targets, which leads were followed up on and what the outcome was, and many other patterns that speculation like this can scarcely imagine.
With AI added to the equation, the dangers of surveillance analysis software all become more dire. Because of how lucrative AI products are, their secrets will be even more closely guarded, and therefore FOIA requests for AI-powered tooling are more likely to be denied. Due to the need for immense volumes of training data for the AI model (more on that later), the infusion of purchased data into the product becomes that much more incentivized. And as the AI analysis proposes more and more tangentially connected leads, law enforcement will have more prospective evidence for parallel construction—and due to the more distant relationship between data points, the signs of parallel construction will be harder for defense attorneys to detect.
Second, AI’s hunger for data could give rise to unnerving, unprecedented privacy challenges. Since AI yields more insightful analysis, and this in turn encourages more intelligence collection (which is then vigorously ingested by the AI), it promotes a virtuous cycle that would accelerate the expansion of government surveillance beyond anything Edward Snowden could have possibly feared. This snowballing would be limited only by the government’s imagination and budget, neither of which are meager when it comes to arrogating surveillance powers.
Furthermore, beyond the manifold ways that AI surveillance processing can be intentionally abused by human users, AI learning models also unsettlingly absorbs every data point that they are exposed to for integration into its models. Just as content creators are finding that their intellectual property is being incorporated into training sets for large language model AIs, American citizens may protest to their personal data being used to refine surveillance models without their knowledge or consent.
The extent to which surveillance data submitted to the AI tool is subsumed into the AI model itself is limited only by vendor’s terms and the government’s willingness to accede to them. To date, companies are best served by strip-mining every byte of data that touches their models, only backpedaling if the practice meets with sustained public objection. From the standpoint of where Americans’ data resides, an embrace of AI surveillance analysis constitutes what is effectively a merging of data between public and private spheres: if the government “collects it all” and feeds it all to the AI, and the AI learns from it all—forever and for all other customers—then the data is completely in common between the two.
Effectively combating AI analytics’ novel attack on Americans’ digital privacy will hinge on the foresight, thoroughness, and wisdom of government regulations. AI models are notoriously opaque, with even their designers technically unable to audit or articulate how their creations make evaluations. Measures seeking to impose limits will have to stipulate when AI surveillance analysis tools can be used, and what data can be uploaded to them. Perhaps the last line of defense will be a general prohibition on using AI, or anything derived from its use, in the courtroom. While the government is certainly capable of abusing surveillance in ways short of criminally prosecuting targets, blanket bans will reduce the incentive to use them in investigations lest prosecutions be dismissed.
Before any of that is viable, civil liberties defenders will need to take up the arduous task of educating themselves on the functional basics of AI technology. This is why organizations like the EFF (an ally of CCDBR through the former’s Electronic Frontier Alliance) are indispensable. Their expertise is more vital now than ever. Only through cooperation and dedication can a future of AI-enabled runaway surveillance be forestalled.