The recent disclosure of data snooping by Verizon and now several Internet Companies from articles published by the Guardian in the United Kingdom show that there is probably someone or some entity that is leaking highly classified information about the inner workings of the NSA. What is odd is the timing of the three releases; think about it, the releases are happening right at the start of talks with the Chinese about cyber security. Obama is now in a defensive posture going into these high level meetings and has been put in a difficult position. So one has to wonder about the timing, could these disclosures have been leaked by the Chinese government to soften the approach taken by the US? As they say in security circles, true coincidences are rare.
Let’s take a closer look at some of the information that has been released. First the scope and reach of the information is enormous especially for the cost cited in the NSA presentation of 20 million dollars. One would be hard pressed to build a data center for that, let alone storage capable of holding all of the information alleged to be collected by the NSA. So that puts into question the authenticity of the information released. However that being said, Verizon’s admission that data is being given to the NSA implies the opposite. In reality we may never know the truth. Real or fake, the amount of data storage we are talking about is mind boggling, we are not talking about terabyte size data storage and not even petabyte sizes but more likely tens or hundreds of exabytes (that’s 10 to the 18th bytes). Cisco had predicted that by the end of 2013 that Internet traffic would be about two exabytes per month, Al Gore would be proud… So I don’t believe the articles or assumption that the NSA is storing the data, it’s not practical and in reality they do not have too.
So putting on my engineering hat how would such an unprecedented amount of data be handled, they would index it just like Google does. Google does not store everything, it looks at it and just grabs ‘meta data’, links and some limited amount of the actual data so the data can be found at some time in the future. I mention ‘meta data’ because heads of the Senate intelligence committee mentioned that the data collection was no big deal and was quote “… just meta data”. (Gee what was your first clue Dr. Watson). So what the NSA probably has is a search engine like Google that indexes private information. How is this possible, well think about the private data that is collected by cell phone and internet companies? Those companies may be doing marketing and billing to those customers so it is in their best interest to organize the data for searching by using ‘meta data’ or some other means. Providing the data in this form would provide cover for the companies claiming that there is no direct access and make their statements of denial at least partially true. Your conversation with Aunt Sally is probably not recorded by the NSA, but the NSA might know that you called and talked with Aunt Sally for five minutes from your cell phone while driving home from work Friday night last week, and they also might know that you called your mistress right after your finished discussing Aunt Sally’s hip replacement. (They don’t know that she is your mistress but with some data mining of your bank and credit cards it would not be hard to infer – try and use cash the next time you visit Victoria Secret, it might help…).
Another source of data the NSA may be using is coming from the overseas fiber cables as they enter the US. Theoretically it would be an easy and logical place to examine data originating from off shore and would be legal under the Patriot act. Still, direct storage of this data is again unlikely and impractical; it is more likely indexed on the fly and available by subpoena if the need arises. The most sinister method, besides getting meta data directly from cell phone carriers, internet companies and or through siphoning off incoming fiber connections is the use of fake cell towers or so called Stingray towers. A recent Supreme Court decision upheld the right of grabbing data this way, which basically is a fake cell tower setup between real cell towers by law enforcement and uses a general warrant to justify the data grab (the Supreme Court upheld the use of a general warrant instead of a more specific warrant) and you would have no idea that you were using such a tower. In theory you would place the cell tower in high crime areas and let the computer tell you when it has something of interest. Several large metropolitan areas have been using the technology for years.
One thing that I think many of the articles fail to realize is the depth of data collection, and I am not talking about recording every phone conversion or email. I am talking about the depth due to the diverse media sources. The data is not just cell phone traffic, internet, instant messaging, and email, but, it is also banking and many other forms of communications and transactions. Indexing all these diverse private transactions is much more important than any one phone conversation or email, it allows the government or for that matter a marketer or researcher (if they can get their hands on the data) to build information about just about anyone and connect that to a name with little effort. I don’t think we are there yet since the Boston bombing shows that we don’t truly know everything about everybody, yet… I find it amusing when I read stories about quote “The mark of the beast” biblical reference here, when someone leaves a job because the employee number or badge identification contains or is the dreaded number. It is amusing because there will be no need for a mark of the devil, the devil can track us all with data mining, no mark required.
What I think of all those conspiracy theorist and 1984 (George Orwell would be proud) adherents is that your privacy is not being directly breached, but your behavior and actions can be inferred by the data mining of many seemingly islands of unrelated data. Data mining seeks to relate data that does not have a direct or apparent relationship; it connects the dots by looking at patterns or actions that seem above or beyond the norm. Our own desire to stay connected or reconnect with others has led to our handing the government our privacy without ever agreeing to it. It is not just governments that can connect these data points, but startups and corporations are vying to reach the dream of every marketer, to know everything about everybody. Big brother really is not watching you unless you step out of the norm, what worries me is who decides what the norm is…