Gmail, privacy and centralization

There’s been a lot of hubbub over Gmail, Google’s new free (advertising-based) Not An April Fool’s Joke email service with 1Gig of disk space. The biggest issue is that Google hasn’t properly communicated where they stand on protecting email privacy, especially in relation to their plan to automatically scan email and present relevant advertisements as a sidebar. In response, a host of privacy organizations have written an open letter demanding that the service be suspended until privacy issues are addressed. The EFF has also been asking some important questions, and Google says they’re “batting about a number of options”.

On the surface, Gmail isn’t that different than existing online email services. It’s a free email account run on company-owned-and-operated servers, just like MSN Hotmail and Yahoo! Mail. It automatically scans and annotate email, just like spam-filters do already. And in spite of criticism about Gmail’s privacy policy, it’s not that different (and in fact more explicit) than the ones you find at MSN or Yahoo!. But look just a little down the road and Gmail isn’t an email service at all, it’s a personal information archival service. This is the real service Google is looking to provide. As they put it: “Gmail is an experiment in a new kind of webmail, built on the idea that you should never have to delete mail and you should always be able to find the message you want.”

My first reaction is “it’s about damn time someone’s doing this.” Since 1995 I’ve kept every email I’ve received or sent (yes, even spam), for a total of over 1.6 Gig and almost 200,000 non-spam email messages. I index it all with the Remembrance Agent (my PhD thesis project) so whenever I get email on, say, some hot new technology I also get links to what other friends, colleagues and mailing lists have said on the subject. (On a different note, when I write love letters I see what I’ve written to previous girlfriends, which is sometimes quite educational.) I’d love to have this kind of thing hooked up not only to my own email but also, say, my favorite 1000 RSS feeds that I’d like to read but don’t have time for. That’s clearly the direction Google is heading (they even cite me — I love it when that happens!)

Systems like Gmail face two problems, both of which are also strengths. The first is that my personal and work email archives contains some of the most sensitive information there is in my life. They include email confirmation of purchases, trips I’ve taken and investments I’ve made. They include love letters I’ve sent and later regretted, discussions of medical issues, and drunken emails complaining about people with whom I’ve lived and worked. They include research ideas not yet patented and drafts of papers not yet published. Often these emails are sensitive precisely because they are powerful and useful, but more often than not information that empowers me can also empower my enemies, competitors and parasites.

The second problem is Google’s centralized architecture, which is easier to maintain and deploy but requires me to trust them with my most sensitive assets. This is a general problem with indexing the Deep Web of proprietary data, and I suspect it was the main failure point for Autonomy’s short-lived Kenjin system and the main reason they moved to an inside-the-firewall search system. This is not to say a centralized approach is untenable; we already have institutions that are trusted with sensitive data, namely doctors, lawyers, and financial institutions. But what these three have in common are a combination of legal and institutional guarantees of privacy, security and longevity of the data they keep. By improving on the usual web-mail model Google plans to join these institutions in terms of trust required, but so far they haven’t improved on the old and inadequate web-mail privacy guarantee. It may not even be possible for Google to make the necessary guarantees without Congressional support, an unlikely prospect given the Justice Department’s current lust for total information awareness.

If Google manages to innovate new trust models as well they do technology, I suspect Gmail will be a good stop-gap technology, though it will never be as trustable as a combination of my personal local data cache, an encrypted backup service, and trusted friends or services who keep backup keys. Call me picky, but I’m still holding out for my personal server. How much longer before I can have the Web in my pocket?

References

About Gmail (Google, 1 April 2004)
Google’s Web mail no joke (Stefanie Olsen, CNET News, 2 April 2004)
Twenty Eight Privacy and Civil Liberties Organizations Urge Google to Suspend Gmail (World Privacy Forum and 27 others, 6 April 2004)
Google’s Gmail and Your Privacy – What’s the Deal? (EFF, EFFector 17(12), 9 April 2004)
Google to consider Gmail changes (Evan Hansen, CNET News, 13 April 2004)
Why I Wouldn’t Consider Google’s E-Mail (Dan Gillmor, eJournal, 3 April 2004)
Gmail Privacy Policy
MSN Privacy Policy
Yahoo! Privacy Policy
Remembrance Agent Page (your faithful correspondent)
Query-Free News Search (Henzinger et al, WWW2003, 20 May 2003)
Autonomy claims the search is over (BBC News, 29 March 2000)
Autonomy Active Knowledge Page
What can you do with a Web in your Pocket? (Brin et al, Data Engineering Bulletin 21(2): 37-47, 1998)