This post was originally published on April 25, 2014.
Hopefully, Snowden’s revelations about the NSA’s surveillance of phone call metadata are still fresh in your minds. “But it’s just metadata!” some people still insist. “It’s not real data. Who cares?”
Everyone should care. Your metadata tell a startlingly full story about you, as data scientists Deepak Jagdish and Daniel Smilkov illustrate in their TEDx Cambridge talk, The Power of Metadata. If you have time today, I highly recommend that you take a few minutes to watch it.
What is Metadata?
Metadata is information about interactions that you have with other people and organizations as you use technology. It’s not the actual content of these interactions, but rather the information about the content.
Examples of metadata
- Who did you call on the phone, and when?
- Who did you email, and when?
- Where and when did you use your credit card?
- Which websites did you visit from which computer and when?
These may sound harmless, but when you add all of these up over time and crossreference them with each other, they can paint a very accurate portrait of you.
Immersion: your email metadata, visualized
Over at the MIT Media Lab, Jagdish, Smilkov, and their advisor, Cesar Hidalgo, decided to create Immersion to visualize what we can learn from email metadata—that is, only the From, To, Cc, and Timestamp fields in emails. (Immersion does not touch subject lines and bodies of the emails.)
Using only this metadata, Immersion creates an informative illustration of your relationships with people, and how those relationships have evolved over time.
I had to try it for myself. After waiting for Immersion to process thousands of my emails, I could see how shifts in my social network correlated with big life changes—moving cities, changing jobs, etc. The social network map that Immersion created divided my social networks into social groupings that I could identify at a glance: I could see a cluster of people with whom I had worked six years ago, and clusters of people I had spent time with in different cities.
I decided to do a thought experiment. “What could a third party figure out from my data?” I asked myself. “If I didn’t know myself, what could I figure out from this?”
Lots. It would be very easy, for example, to infer who my closest friends and family are (they’re the people that I have emailed consistently from the moment I got my email address). They could also infer which of my associates associated with whom based on group email threads. In this sense, our metadata exposes not only information about ourselves, but about the people with whom we associate as well.
One of the things that struck me the most about the TEDx talk was when Smilkov made the point that email interfaces provide only the shallowest of glimpses at your email history, making it easy for us to forget that there are years and years of metadata hiding beneath our most recent emails. For example, when we’re logged into Gmail or Outlook, we usually only see the last 20-50 emails we received, and each day they get replaced by new ones. Consequently, many of us don’t think about the thousands upon thousands of emails in our accounts—and all the metadata associated with them.
Metadata is everywhere
Metadata isn’t just in emails. The Guardian’s interactive guide to metadata demonstrates how data is generated from a range of activities, from taking pictures on your digital camera, to using a search engine. Students at Stanford showed that phone record surveillance, even for a short period of time, can reveal more than we want others to know:
Phone metadata is unambiguously sensitive [blogger note: emphasis added], even over a small sample and short time window. We were able to infer medical conditions, firearm ownership and more, using solely phone metadata,” [Jonathan Meyer, a Stanford computer scientist] said.
In fact, metadata is so revealing that, according to New York Magazine’s Daily Intelligencer,
“When you take all those records of who’s communicating with who, you can build social networks and communities for everyone in the world,” mathematician and NSA whistle-blower William Binney — “one of the best analysts in history,” who left the agency in 2001 amid privacy concerns — told Daily Intelligencer. “And when you marry it up with the content,” which he is convinced the NSA is collecting as well, “you have leverage against everybody in the country.”
In a powerful thought experiment, Duke sociologist Kieran Healey demonstrates how the British could have stopped Paul Revere by performing social network analysis using only metadata about social clubs and their members!
What’s the takeaway?
When sending emails and communicating with others are inescapable facts of modern life, the information about these interactions construct a very full profile of us and who we associate with, and could betray secrets that we would rather keep to ourselves.
There are small measures you can take to throw people off your metadata scent, so to speak. For example, you can use disposable email addresses and disposable cell phones, paid for in cash of course. You could also use anonymous search engines like DuckDuckGo and browse without using cookies. Unfortunately, doing all of this is not feasible or sustainable for most people. Also, unless all of your associates also take such measures, then these methods can only hide so much metadata.
Thankfully, VPNs can play a role in obscuring your metadata. When you use a VPN, you effectively hide your IP address and location behind the VPN server’s IP address. Don’t forget that IP addresses pack a lot of punch – they can reveal your location (to varying degrees of accuracy) and your Internet service provider to any website or service that you interact with.
The most likely scenario is that governments and other third parties will go on indiscriminately collecting our metadata. After all, knowledge is power, and this kind of knowledge, as Binney states in the quote above, can be used against you.
I don’t want to imagine how my metadata might be used against me in the future. For example, what if health insurance companies had access to my past google searches and could deny me coverage based on old search queries? E-commerce sites already sell goods at prices that vary depending on their ZIP codes, which are figured out using IP addresses, as this article in the Wall Street Journal reports. What if I live in a ZIP code with a higher per capita income? Should I be penalized for that?
Ultimately, metadata is an undeniable fact of life, and its power is absolutely something that we should all be aware of. We leave a digital trail every time we visit a website or send an email or text message. While it’s fun and interesting to see how tools like Immersion can make interactive visualizations of your life as told through your email metadata, in the wrong hands, it has the potential to be used against us. And that is something we should not forget.