Understanding Metadata

Earlier on the site, I cited a statistic that 87% of the web is encrypted. This means that when you visit, say Facebook, that your Internet Service Provider (ISP) can see that you visited and how long you hung out for, but they can’t see your login credentials (username and password) or which exact pages you went to. This is done with the use of Transport Layer Security, or TLS, a powerful and increasingly popular encryption protocol used online. It’s quite effective and difficult to break.

So in effect even the average person has - generally speaking - a basic level of powerful security in their online lives (which is why I listed installing HTTPS Everywhere as "Most Important). This begs the question that privacy enthusiasts everywhere have come to despise like nails on a chalkboard: “why should I care?” If your sensitive details such as password and credit card number are safely encrypted, who cares if your ISP or the Starbucks IT guy can see what websites you visit? (Spoiler alert: the introduction.)

For starters, because TLS breaks down at the end point. When you connect to Amazon, your ISP can see that you visited Amazon, but not what you bought or your card number. Amazon, however, can see it all without restriction. But more importantly, often you don't need to see the content itself to start making powerful and dangerous inferences.

What is Metadata?

This information in question is called “metadata,” sometimes described as “data about the data.” Maybe I can’t see exactly what you said in your email, but I can see who you emailed, what time, and the size of the email. And on the surface it doesn’t seem so bad. Who cares if you know that I emailed my mom at 7pm and the email was 7KB?

As is the case with most privacy and security concerns in the modern era, the problem isn’t so much what’s collected but rather how it has the potential to be used. Take this excellent article from the Electronic Frontier Foundation, for example. A couple examples they list of metadata that has the potential to be too revealing include:

  • They know you called a gynecologist, spoke for a half hour, and then called the local Planned Parenthood's number later that day. But nobody knows what you spoke about.
  • They know you got an email from an HIV testing service, then called your doctor, then visited an HIV support group website in the same hour. But they don't know what was in the email or what you talked about on the phone.
  • They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret.

    (Lifted directy from EFF's Surveillance Self Defense page)

As you can see, metadata has the potential to be just as revealing as content itself, and therefore should be protected just as much as the actual data. You might say to yourself, “You said potential abuse, do you really think that’s likely?” The answer is absolutely, 100% without a doubt, not-just-being-paranoid: "yes." China is already notorious for their incredibly invasive, 1984-like “Social Credit System.” The United States is starting to implement the use of your social network in insurance industries. Oh, and the United States is working on their own “Social Credit System” too. So yeah, metadata is an important part of your attack surface that you need to consider as you protect your privacy and security.

So What to Do?

Certain metadata is impossible to avoid. You can't always leave your phones at home so location-based metadata is inevitable. Most websites also collect information about what type of device you’re using when you visit. Even connecting to a VPN service or sending encrypted email requires a certain amount of metadata to communicate. I wish this section ended with a list of suggested services to help reduce or eliminate the amount of metadata you leak in your daily life, but the fact is no such thing yet exists. Instead, the goal of this page is to make you aware of metadata, how it exists and is collected, and what it says about you.

As you pick services to help keep you safe and private in the digital world, it’s important to consider who those services are talking to and what they’re saying. Wire messenger, for example, collects metadata about your initial sign-up so they can create your unique account. This data isn’t shared outside the company without a warrant, but it is a good reminder that Wire is not 100% anonymous from the company itself if that’s your goal. On the other hand, Mullvad VPN allows you to pay with Bitcoin without surrendering any personal information at all, so in theory - if done carefully and correctly - Mullvad can make you 100% anonymous on the internet.

Most of us probably don’t need to be 100% anonymous for any reason, but it's a good idea for us to protect our metadata just as much as our actual communications whenver possible. Again, I wish I had some concrete advice, but instead it simply comes down to asking yourself “what metadata am I giving up and to who?” Using a VPN means you’re transferring a considerable amount of your metadata away from your ISP and over to your VPN provider. Assuming you use a reputable, trustworthy VPN provider, that’s a good strategy. Encrypted emails are the same thing. Many of these companies will surrender what they can if given a warrant, but these same companies rarely have much to turn over aside from a few login locations and times. It’s a multi-layered approach but it’s one worth considering until technology can catch up to protect our metadata by default.

Previous Next