Yesterday I said that within a decade disk space should be cheap enough to put the entire visible web on your desk for under $1000. I think that's actually a pretty conservative estimate, since it assumes a 100 KB average page size, up to an order of magnitude higher than some estimates.
Here's another back-of-the envelope: let's say we wanted the equivalent of Google's webcache on your desktop (that is, all the HTML but no images). Another way to calculate it starts with the fact that the 2003 update to Berkeley's How Much Info? study estimated that in 2002 the web was only 167 Terabytes total, with only 30 TB as HTML (69 TB when you include images). Assuming 75% compression, that's just around 8 TB. That same year a 2002 OCLC study calculated that the total number of web pages was only increasing by about 5% per year (with the number of sites actually shrinking, but the number of pages per site growing). That rate had been decreasing ever since the explosion in the mid '90s, but let's assume growth became a steady 5% and will stay at that rate for the next few years. (There are a lot of assumptions going on here, but the nice thing about these kinds of curves is that even if my numbers are off by a factor of two somewhere, so long as disk keeps increasing at the same rate that crossover point only changes by one year.)
Now we've got two trends, and just need to find the intersection point for the price we want:
| Year | Price of 1 TB disk | Size of public web (compressed HTML only, assumes 5% growth/year) | Cost to store |
|---|---|---|---|
| 2002 | 8 TB | ||
| 2003 | 8.5 TB | ||
| 2004 | 8.8 TB | ||
| 2005 | $500 | 9.25 TB | $4,625 |
| 2006 | $250 | 9.7 TB | $2,425 |
| 2007 | $125 | 10.2 TB | $1,275 |
| 2008 | $62.50 | 10.7 TB | $670 |
| 2009 | $31.25 | 11.25 TB | $350 |
| 2010 | $15.50 | 11.8 TB | $185 |
So given a few assumptions, we'll be able to cache all the raw text on the public web for under $1000 (disk cost) within 3 years!
Posted by bug to Media Technology at August 2, 2005 1:44 PM | TrackBackI remember that at Hardpack's wedding, you and I were in the airport shuttle together and we did some back-of-the-envelope calculations to figure out when storage space would hit one bit per atom, and that it was sometime between 2020 and 2030.
How we doin' on that?
Posted by: Beemer at August 2, 2005 4:59 PMIs my logic wrong here? Google has 8 billion pages indexed. If we assume 100K per page, that's 800terabytes. So using your numbers, HTML only would be 200 terabytes with a compression rate of 75% would make for 50terabytes meaning we wouldn't drop below $1000 somewhere in early 2011. Which is _still_ ridiculously early.... Unless of course, my math is off base :)
-M
Posted by: Mort at August 2, 2005 8:14 PMI'm using a different set of assumptions on this envelope than the previous post. In particular I'm starting with the Berkeley study size count, which actually recursively downloaded pages from sites rather than estimating number of pages and then using an average page size. Of course, if my 9.25 TB estimate is correct for 2005 it would imply an average of only 3.2K uncompressed text representation for each of those 11.5 Billion pages mentioned in the other study — not impossible but sounds a little low so our mileage may vary.
Let's see, figure 10E79 atoms in the universe, and we're about to hit 500 GB hard drives = 4x10E12 bits. So we still have a factor of of 10E67 to make up — even doubling every year that's about 222 years (really!). I'd guess our old calculation was off, or maybe we're misremembering...
Posted by: Bug at August 3, 2005 6:26 PM