If you’ve spent any time online in the past decade, you’ve noticed that elements of the digital and physical world tend to follow you around in ways you might not expect it to. If you have a Facebook account, you may have noticed that after talking with a friend about ordering pizza, you’re seeing ads for delivery near you. Or you might notice that when reminiscing about high school memories, Twitter suggests someone from your old school as a friend or follower. These experiences can be unnerving, and it’s led many people to believe that companies like Facebook and Google are using our devices to actively listen to our conversations and serve us advertisements based on what they hear.
With the addition of smart devices in our houses like Facebook’s Portal and Amazon’s Alexa and Google’s Nest Home Hub, it’s easy to imagine that these devices are constantly transmitting audio data to servers for processing and storage for future advertising revenue and data analytics. And maybe there is truth to that in some cases. However, I’m here to tell you this: these companies don’t need to listen in on our conversations to know what we’re talking about at any given point. They’ll never need to. Tech giants are collecting enough information and data about us to predict and model human behavior in ways that are so accurate it’s seems they’re listening in. Here’s how.
Cookies
Let’s start with the basics. Cookies.
Cookies are small pieces of data that are generated by websites and stored on your computer, usually in a simple text format. A cookie contains a unique ID tied to your computer and IP address and small bits of information about your browsing session. Sometimes this information contains things like your username or user ID, item IDs from a shopping cart, browsing history for the site, or text from a form you were filling out.
When you access a website that uses cookies, this data is stored to your computer to be retrieved the next time you visit the site or the next time the state of the website changes (like a refresh or clicking a new link). Cookies act as a way to preserve data between sessions and requests without storing it to a database on the server. The analogy used to teach cookies for me is a coat check. When you walk into a fancy restaurant and you check your coat, you hand the person at the desk your coat, and they hand you a ticket with your name and a number. They take your coat, and add a ticket to the coat with the same number as yours. When it is time to leave and retrieve your coat, the clerk can check the number on your ticket against the numbers on the coats in storage to retrieve the correct coat for you.
Cookies work in very similar ways. Data you generate on a website, for instance items you put in a shopping cart, is stored somewhere in a cookie. The full cart may not be stored in the cookie because that can take up too much space, but the IDs of the items in the cart can be stored along with your unique computer ID. The website then stores your unique computer ID, and the next time you visit the site, the browser and server check to see if you have a cookie saved. If you do, and the cookie ID matches the ID stored somewhere in the site database, it retrieves the cookie data and re-populates your shopping cart without requiring you to create an account or log in.
This is great for stateless storage of information, and it can help store data you input on a website without requiring you to have an account. But that’s not the only way cookies can be used. A third party can create a tracking script that can be loaded on a website. When you visit that website, the third party script runs, creates a cookie, and stores it on your computer. Then, if you visit a new site that also happens to have this third party script, the script runs again, detects that it already created a cookie from a previous website, and can then modify the cookie or send data to the third party’s servers to say “this user visited these two sites at these times”.
So even though you may have never created a Facebook account or a Google account, companies can still track your activity by giving websites these scripts to add to their sites. You may not have a Facebook account, but Facebook can use cookies to track your behavior and activity off of the Facebook platform. Using this data, they can learn about your browsing habits, what you are interested in, and what sites you visit. This isn’t a speculative idea either. Facebook has a script called the Facebook Pixel, and Google Analytics tracking tags exist on every website that wants to track traffic, this site included.
Data Aggregation
Tech companies don’t just collect your data from their own platform and apps or cookies. They purchase data from other companies. The data purchased from companies (called commercial data brokers) can include information like the number of credit cards you have, your credit score, if you are a property owner, your income, and total liquid investable assets. With a multitude of companies working together to collect and sell or share data between each other, one platform doesn’t need to collect all of your data. They just need to collect enough to associate you with data from other companies. By combining these sources, platforms and technology companies can make more accurate predictions about your behavior and personality. A famous example of a company learning something about its customers using this pattern recognition is Target finding out a teen was pregnant before her parents did.
Technology companies don’t need to rely on their own data, and we have created an ecosystem that is perfect for collecting rich information about us every time we open an app. The biggest advantage companies have when tracking us isn’t a single piece of data from a single user, it is the aggregation of every user’s data with the data from other sources and companies. Once companies have access to user data they collected plus data purchased from brokers, they can feed this information into computer systems and algorithms to make predictions about your demographics and behavior.
Interactions and Algorithms
Ok, so companies can use cookies and data from other companies to learn about you. What else can they collect and how do these predictions work?
Every interaction on platforms like Facebook, Instagram, and Google are tracked. Instagram likely has the ability to track how long you look at an image just based on how slowly you scroll through the feed. Google can see what you look for in search results by analyzing mouse movements and clicks on links before sending you to the site you clicked on. YouTube and TikTok can track how long you watch a video before moving on to the next one or before leaving the platform all together. This data, combined with data about you specifically (such as your likes, your friend list, your own content, and posts), can be pushed through algorithms that compare your interactions and data with the interactions and data of every other user on the app. To any human, this data is far to complex to process, but to a machine learning algorithm, this data is incredibly valuable.
Take, for instance, YouTube and TikTok. The goal of the platforms is to get you to spend more time watching videos so that you watch more advertisements. By analyzing which videos you watch the most, which videos you skip, and which videos you share and interact with (like, comment, subscribe, etc.), they can feed this data to an algorithm that is specifically designed to recommend videos to you that are similar enough to pique your interest, while being different enough to keep you engaged. At the same time this is happening, another system is analyzing which advertisements you interact with on the platform. This system can track which ads you skip right away, which ones you watch all the way through, and which ones you click.
By combining the data about your ad interactions with the data about your internet activity like shopping cart history, this system can display advertisements for products that you may have considered purchasing recently. It can also serve you advertisements for products that your friends have purchased recently that it thinks you might also be interested in. Additionally, third party companies can pay for advertising time and space to try and reach a new audience. If the Casper mattress company wants to sell to recent high school graduates, it can buy ad spaces on YouTube for people between the ages of 16 and 20, who have recently searched for apartments, who make a certain income, and who fit a geographic demographic that Casper sales are low in.
This kind of filtering can become eerily specific, and none of the data required to make these filters needs to come from your camera or microphone. Based on our activity and collected data, companies can categorize you into useful groups to help advertisers better target you.
Facebook may know that one of your friends has a birthday coming up in a month, and it can use that information combined with your friend’s browsing history to recommend products to you to purchase as a gift for them. On my Facebook ad settings, it knows that I am friends with men who have birthdays in 7-30 days. Companies can also use your contact list to make associations between users. If I am friends with Sean and Sean is friends with Julie, but Julie and I are not friends, companies can use our mutual friend Sean to find common interests between me and Julie, and recommend us to each other.
It doesn’t have to happen with mutual friends either. Julie and I are not friends, but Julie’s friend Gabe can have similar interests to me. If Gabe and Julie spend a lot of time together (which can be tracked by location and IP addresses), Facebook might recommend Gabe as a friend to me before Julie, just by association between Julie and Sean.
So are they listening?
These kinds of associations are complex and difficult for individuals to make, which is one of the reasons they are computed by algorithms and machine learning programs and not people. It also is one of the reasons it can seem like these companies are actively listening. We might not think of these underlying connections between our behavior and the behavior of others, but they exist. When you stop to consider how much data is collected just by one company, and you consider how many companies and services you interact with every day, it becomes easy to see how much of our personal data is out in the world being analyzed. It can be overwhelming to think of every data point on every interaction we have every day. Simply put, I don’t think there is a need for companies to actively listen in to our conversations when we are already freely giving them more than enough data to make these associations.
Of course, there are devices that DO listen to us regularly, like the Google Home or Amazon Echo. But these devices don’t have to transmit audio 24/7 to get useful information about you. Every smart device you connect to a Google Home becomes linked to your data in a Google profile. Every search you do on Amazon and every show you watch on Prime Video gets fed into algorithms that link to your smart devices.
Tech companies don’t need to actively listen to us every day. We make it easier for them to learn about us just by existing and using their services.