In September, Rebecca Slaughter, a Commissioner for the Federal Trade Commission (FTC) told C-SPAN that at least three Commissioners support an investigation of how the ad-tech industry collects, aggregates, and uses data. The statement came in reference to the FTC’s current re-investigation of child privacy rules and specifically the Children’s Online Privacy Protection Act (COPPA).
Slaughter’s statement suggests that when the FTC began looking into how companies collect data on children, they began to realize they know very little about how companies collect and use data on anyone, no matter their age. The technology at the center of this issue is called real-time bidding (RTB). Understanding what kinds of data companies collect in apps and on websites, how that is used to serve ads, and whether this data is inherently harmful, may help us understand how legislators should move forward.
In July of 2020, 10 federal legislators wrote what has become known as the Wyden-Cassidy letter to FTC stating,
The hundreds of participants in these auctions receive sensitive information about the potential recipient of the ad—device identifiers and cookies, location data, IP addresses, and unique demographic and biometric information such as age and gender. Hundreds of potential bidders receive this information, even though only one—the auction winner—will use it to deliver an advertisement.
Few Americans realize that companies are siphoning off and storing that “bidstream” data to compile exhaustive dossiers about them. These dossiers include their web browsing, location, and other data, which are then sold by data brokers to hedge funds, political campaigns, and even to the government without court orders.Wyden, Cassidy and Bicameral Coalition Request FTC Investigate Advertisers Tracking Americans at Places of Worship and Protests
One of the major concerns with government investigations of tech companies is that they take technical processes run mostly by algorithms and try to make that information understandable for our non-technical leaders – sometimes with hilarious results. To avoid those confusions, let’s talk first about what kinds of data apps and companies collect. Then we’ll talk about how RTB works, and how that may or may not affect individual privacy.
What data do companies collect?
When we say that data brokers have thousands of data points about an individual, many of those data points are publicly available or surrendered by the individual: IP address, place of work, home address, age, gender, and location are all readily available or extrapolated from public records. Data that the individual surrenders come from their search queries, the types of sites they click on, the news they follow, the products that they purchase, and any other behavior on the internet that’s either tracked by cookies or by apps.
Companies like Google, Apple, or Facebook, that can connect all of your search, click, and public data with your personal information through your login, have a far-reaching data set. This data is valuable to advertisers because it is tied to other personally-identifying information like phone numbers, names, and email addresses. Other companies want this data, so they can carefully target audiences that might be interested in their product.
The Senators and Representatives’ letter makes it sound like private data is being bought and sold on exchanges, as though the advertisers hand over reams of data in exchange for a single ad, but the tools are a lot more complicated than that. That means that very little individual information is exchanged outright.
How real-time bidding works
Publishers say “I’ve got this ad space, I need someone to find me advertisements to go in there.” So they take their ad space to advertising companies and describe the types of readers or website users that generally come to their websites. Advertising companies then find companies who are looking to put ads in front of the kinds of users that use the website. In the middle, where the advertising company is, is where the real-time bidding happens.
So, if ShoeCompany has a new sneaker that they think will sell well with males ages 12-25 who are interested in basketball, they build an ad for that audience. They then take their ad to an advertiser. The advertiser notes the target audience and enters the ad into their system with a standard bid. When a person clicks on an article on a partner publication that meets the targeting criteria, all of the possible advertisers “bid” on the space, and the winning ad is shown.
The actual amount of information that gets passed from the publisher to the advertiser is very little. Most publishers don’t have a lot of personally identifying data on their readers. They use extrapolated data from cookies, IP addresses, location, browser, and any information you’re going to find in a Google Analytics tool along with their own knowledge of their websites to attract advertisers.
On the other side, the advertisers ask companies for key product targeting to match those companies’ ads with the publishers who have ad space. Advertisers provide targeting information about the types of people they need to get in front of, and therefore state age ranges, interests, job types, location, and browsing information that matches their desired audience. Again, all of this is extrapolated from known data sets.
The term “auction” makes it feel like the tools bring a bunch of advertisers together in a room to fight over your data. The reality is that software scans databases for ads that match a particular set of requirements and then choose the ad of the highest bidder to show.
Why does it feel so invasive?
Ads often feel invasive because we don’t realize that we’ve given over the information that we have. Most of us have experienced the shoe ad that follows us around for weeks after we browsed or even bought shoes online, and many of us have probably seen an ad pop up for a company we were just talking about recently—offline.
A 2018 study of over 17,000 apps found that none of them accessed the microphone to listen to conversations. Even counting smart home devices that actively listen for you to trigger them, having a device “spy” on your conversations isn’t efficient or even the best way to get data about things that you want to purchase. Cookies, browsing data, and click activity are much more reliable than trying to understand human voices.
Advertising can feel invasive because we often don’t remember everything we scroll through during the day.
What sparked the most recent call for investigations
In a June article in BuzzFeed News, Sen. Elizabeth Warren called out a report by data broker Mobilewalla, saying,
This report shows that an enormous number of Americans – probably without even knowing it – are handing over their full location history to shady location data brokers with zero restrictions on what companies can do with it,” Warren said. “In an end-run around the Constitution’s limits on government surveillance, these companies can even sell this data to the government, which can use it for law and immigration enforcement. That’s why I’ve opened an investigation into the government contracts held by location data brokers, and I’ll keep pushing for answers.Senator Elizabeth Warren as quoted in “Almost 17,000 Protesters Had No Idea A Tech Company Was Tracing Their Location” by BuzzFeed News
In December 2019, a New York Times Opinion investigation of location tracking showed that location data collectors, through weather, maps, and even social media apps, can show your location in ways that produce a digital fingerprint about individuals, especially upon moving from your house to your place of work and back again every day. At a high level, this level of data specificity is scary: apps are already tracking your movements. This could lead to all sorts of privacy concerns, not to mention the problematic governmental oversight that Sen. Warren fears.
Apple will release an app store feature in iOS 14 that will list the types of information an app collects. This may help some people understand how their data may be collected before they download the app. But these features only protect people so far. If the service the app provides is more valuable than the perceived threat, people will use the app anyway. Apple’s privacy tool is meant to be more accessible than the terms of service screens we all agree to without reading. But as the Mobilewalla case shows, it’s not necessarily the app you should be afraid of.
What a Terms of Service page and even Apple’s clearer data tool don’t tell you is where your data gets sold. These disclosures are often cloaked in legalese that speaks of “our data partners” or “third-party contractors.” Terms like this could mean that they’re running your data through a business intelligence tool to better understand their demographics to help them market better, or it could mean that they’re selling your data to middlemen who then repackage, repurpose, and resell it to other companies.
Way down in the BuzzFeed article, we learn that even Mobilewalla, who released the report on demographics of the Black Lives Matter protests that took place in June 2020, are to some extent guessing. They use “artificial intelligence to turn a stew of location data, device IDs, and browser histories to predict a person’s demographics — including race, age, gender, zip code, or personal interests. Mobilewalla sells aggregated versions of that stuff back to advertisers.” According to Mobilewalla, they exposed the extent of their data capabilities to warn others of what could be done with that data.
If they can, they probably will
The New York Times investigation showed that with enough time and effort, you can put together fairly accurate pictures of the daily lives of individuals. AI isn’t magic, it requires expertise, time, and data to train it. Right now, companies are collecting and aggregating data for all sorts of purposes, but mostly to market to you. But it’s pretty clear that — at least at the moment — it takes a lot of detailed work to collect data on your interests, track your movements, or put together a personalized profile that’s correctly connected to your name. But that doesn’t mean it’s impossible.
What’s scary is that companies like Google and Facebook have a lot of personal data like names, phone numbers, and email addresses, that we give to them freely because we log into their apps. Google and Facebook may have started out with helpful products, but in their bids to make better, more addicting features and connect you with the best commerce, they started gathering a lot of information. Other tech companies and app makers soon realized the value of the seemingly meaningless data they collected for legitimate reasons.
When I started writing this article, I was convinced that the data that apps collect is mostly benign: it’s aggregated and extrapolated from a few private but anonymous data points connected with publicly available information. But if we let companies collect and aggregate this data right now, someone will eventually build an algorithm under the guise of personalized marketing or better service that will bring together that data for nefarious purposes.
The FTC and Congress have the opportunity to get out ahead of this threat.