When do I get the web in my pocket?

Some time ago I asked how much longer before I can have the Web in my pocket. Let’s try a quick back-of-the-envelope calculation:

A paper from January 2005 calculates the publicly indexable Web (the part easily accessible to search engine web-crawlers) as being around 11.5 billion pages. Estimates on average webpage size seem to be all over the map, but let’s figure around 100 KB per page, for a total of around a petabyte (one million Gig) for today’s indexed web. (I’m assuming text and images, but ignoring other media.)

Disk these days is going for less than 50 cents per Gig, so enough disk to store your own personal Google (and then some) costs around $500,000. With compression you can probably cut that in half. The price of disk is also falling by a factor of two every 12 months, so assuming no major jumps or snags in the disk-price curve, in a little less than a decade we can expect to hold the equivalent of today’s indexed web for less than $1000.

Now of course, in that time the web will continue to grow, so we may no longer be satisfied with our measly petabyte-on-the-desk, but I figure the amount of human-generated Web content has a much slower growth rate than our disk-space curve. The number of web sites actually shrank between 2001 and 2002, and though it now seems to be growing again there’s only so much content that human beings can create in a day. The real question I have is whether in a decade anyone will see having access to the whole web as being all that interesting — I could easily see the majority of people losing interest in the surface web in favor of personal deep-web niches. The only reason I want the whole web in my pocket is because it’s too hard for me to filter out in advance the 99.99% of the web that’ll never be of interest to me — the closer we get to that kind of pruning, the less disk we need and the higher-quality the experience will be.

Update 8/2/05: doing a different back-of-the-envelope estimate leads to being able to store a compressed-HTML cache (no images) on less than $1000 worth of disk within 3 years…

When do I get the web in my pocket? Read More »

Microsoft giving grants for Personal Lifetime Storage projects

Microsoft Research has announced a Request for Proposals for projects in relating to their Digital Memories (Memex) research kit, in the context of “personal lifetime storage.” Microsoft’s inspiration (and probably the inspiration for everyone else working in this area too, at least indirectly) is Vannevar Bush’s 1945 article As We May Think, in which he famously described a kind of personal library-in-a-desk he called the memex:

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

MSR expects to give 6-9 awards to college and university projects, up to a max of $50,000 per award, and recipients would also be given a SenseCam wearable camera and software from the MyLifeBits, VIBE and Phlat research projects at Microsoft Research. Strings are minimal — they expect semiannual progress reports, want it presented at at least one of their workshops and expect the project to be either dedicated to the public domain or released under an open license such as the BSD license.

Microsoft giving grants for Personal Lifetime Storage projects Read More »

Registration for ISWC is now open

Advance registration for the 9th Annual IEEE International Symposium on Wearable Computers, to be held October 18th-21st in Osaka, Japan, is now open. ISWC always brings together a great mix of industry and academic researchers from fields as diverse as interface design, machine vision, hardware and fashion design, and as program committee co-chair I can guarantee this year will be no exception.

Registration for ISWC is now open Read More »

New digital cinema standard to use JPEG 2000 compression

Yesterday a consortium of the major movie studios announced final specs for a new standard digital format for movie theaters. The specification uses JPEG 2000 video compression, which (though it happened before I started working there) I’m proud to say largely came out of work performed at my lab.

The big advantage of JPEG 2000 is that you can “pull out” bits from a code stream to get different resolutions — in this case a 4K distribution (1,302,083 bytes per frame at 48FPS) and a 2K distribution (651,041 bytes per frame at 48 FPS) can both be generated on-the-fly from the same file, just by discarding segments of the stream.

(Thanks to Mike for the link.)

New digital cinema standard to use JPEG 2000 compression Read More »

Fujutsu shows bendable e-paper display

I’m a little late on this news, but last week Fujitsu announced a new bendable e-paper technology. EE Times has the most complete technical description I’ve seen on it:

The display is a passive-matrix, reflective type cholesteric liquid crystal display. Two 3.8-inch diagonal QVGA prototypes, a monochrome display and a color version able to display 512 colors, were shown.

Differing from widely used flat displays that have color filters consisting of red, green and blue pixels, the paper display has a three layered structure in total about 0.8 mm thick. One layer consists of two 0.125 mm-thick films sandwiching liquid crystal. Cholesteric crystals in each layer are twisted in a certain pitch to reflect only red, green or blue light respectively.

Images on the screen can be changed with 10-milliwatts to 100-milliwatts depending on scanning speed.

(Thanks to John for the link…)

Fujutsu shows bendable e-paper display Read More »

Newspeak update

From: Ministry of Truth
Subject: Newspeak update

Please be informed that the phrase Global War on Terrorism is obsoleted in favor of the phrase Global Struggle Against Violent Extremists. Changes will be reflected in the upcoming tenth edition of the Newspeak Dictionary.

Newspeak update Read More »

Wearable video for ethnography

Public Radio’s Marketplace has a nice piece on the company Actionspeak, which hires people to go shopping while wearing small video cameras. The claim is that the cameras are unobtrusive enough that the research subjects quickly find themselves acting as they always do while shopping, and Actionspeak then analyzes the video to learn ways their customer can improve their presentation or marketing. They’ll also do runs where subjects are asked to give a running monologue about what they’re thinking as they shop.

These videos might give some straight marketing info (like which family member actually decides the sale or whether to focus on self-position, packaging or price), but I bet the real win is in showing designers how their product actually gets used in the wild. The combination of seeing as your customer sees, along with the ability to ask about particular moments afterwards, is really powerful. Not only can you learn things you’d never learn from interviews alone, but the overlay of first-person video with explanatory customer interviews has much more impact on a designer than would a table of survey results containing the same information. (Take a look at the consumer goods video especially for examples.)

Wearable video for ethnography Read More »

Wikipedia on your handheld

The English version of Wikipedia is about 650,000 articles, which comes out to about 1 Gig compressed database — that easily fits on a PDA / cellphone these days. I’ve been thinking for a while now that I should look into loading it all onto my Treo 600, but I see now someone has done all the work for me!

Erik Zachte has produced conversion scripts as well as detailed instructions on how to convert the complete Wikipedia Encyclopedia into TomeRaider ebook-reader format for Pocket PC Windows and Palm OS. Text-only version fits in just over half a gig, text + images is 1-2 gig depending on image down-sampling. I also like his “Build, Buy or Borrow” plan: you can use his scripts to build your own latest version for free, buy the latest version on CD or DVD, or download for free his semi-anually updated version direct from the Wikipedia server. That’s exactly the sort of “free as in freedom” software business plan I hope winds up succeeding in the new economy.

Wikipedia on your handheld Read More »

PwdHash browser extension

This is cute: PwdHash is a browser extension that will replace text entered into a password field with a hash of the password + domain name of the website. That lets you use a single password for different sites without revealing, say, your PayPal password to your bank and vice versa. As the creators point out, this is also pretty good protection against phishing scams (since they’ll collect the wrong password since their domain is different). It’s still vulnerable to pharming and other attacks that poison your DNS or webcache results, but their paper goes into all sorts of clever attacks that they do try to defend against, like Javascript and dictionary attacks.

(by way of the Mercury News)

PwdHash browser extension Read More »