Understanding Metadata

Ninety-five percent of the web is encrypted. That means that if you visit Facebook, your Internet Service Provider (ISP) can see that you visited and how long stayed, but they can’t see your login credentials (username and password) or which exact pages you went to. This is done with the use of Transport Layer Security, or TLS, a powerful and increasingly popular encryption protocol used online.

There are two problems with relying strictly on the current TLS model of the intert, however. First, it only protects data in transit. When you connect to Amazon, your ISP can see that you visited Amazon, but Amazon can see every page, click, and purchase without restriction. Second and more importantly, often you don't need to see the content itself to start making powerful, accurate assumptions.

What is Metadata?

Metadata is often described as “data about the data.” For example, the content of an email is not metadata, but who you emailed, what time, the subject, and the size of the email are. On the surface this may not seem very revealing. However, take this excellent article from the Electronic Frontier Foundation, for example. A couple examples they list of metadata that has the potential to be too revealing include:

Metadata has the potential to be just as revealing as content itself, and therefore should be protected just as much as the actual data. These are not hypothetical abuses or situations. A former NSA Chief once said "[The US Government] kills people based on metadata," referring to how metadata can reveal so much information that it can be used to justify military strikes. In another instance, police were able to determine a man murdered his wife based on the metadata from his smartwatch and CCTV cameras. I could list many more stories like these. Metadata matters.

How to Deal with Metadata

Unfortunately, any digital action creates metadata. The best you can do when attempting to protect your privacy is to be mindful of what metadata may be created by the action you're about to take and then determine how to best reduce or mitigate it. For example, reputable VPN providers (and some messengers like Signal) do not log the sites you visit, your IP address, or other metadata for longer than needed to make the service work. This is desirable but should not always be trusted. Another approach is to fake your metadata when possible. For example, using a VPN or Tor browser to access a website: the website now thinks your IP address is that of the VPN provider or exit node. Ideally you should find a way to combine these approaches for extra protection and redundancy.

Unfortunately, the amount of metadata created and recorded can be quite extensive. For example, one smart TV manufacturer was caught scanning the names of nearby WiFi networks, as well as detecting every device on the local network and detailed information about them. protecting from that level of invasion requires more than just a reputable VPN. Fortunately, most of us don’t need to be 100% anonymous, and situations like these fall largely outside of the threat model of most people reading this. However, it's still a good idea whenever changing anything in your digital life to ask what metadata could potentially be leaked, what could be done to prevent that, and what your threat model requires.