Searching for Signal

the n01se blog

Regarding Cloud Storage

This post is inspired by a blog post by Larry Stewart about "Connected-only devices" (e.g. Google Chromebook) and the problem with cloud/remote storage. Larry argues that for many types of data there are fundamental reasons why local storage is better than storing data in the cloud. His reasons are:

  • Storage is cheap, communications are not
  • Storage is low power, communications are not
  • Local storage always works, communications does not
  • My use of local storage is private, in the cloud there are watchers
  • Local operations have predictable performance, remote does not

Larry is correct about the current problems with "Connected-only devices" and cloud storage and he has identified some real problems/costs of cloud storage vs local storage. I'm going go further and identify some additional costs involved in the local vs cloud storage equation. These other costs are more dominant (in determining what choices people will make) and by considering them I think they reveal a interesting trend:

There is a cross-over point where cloud data becomes cheaper than local data.

The cross-over point varies for different people and different types of data and in many cases it has already been crossed.

Storage/Network costs:

First lets look at some of the costs that Larry identified. It is well known that transistor density is currently doubling every 24 months or so. This principle is known as Moore's Law. Less well known is Butter's Law which states that the amount of data able to be sent per optical fiber doubles every 9 months and Nielsen's Law which states that the maximum bandwidth available to home users will double every 21 months. On the storage side is Kryder's Law which states that storage density doubles every 12 months.

Network and storage capacities are following exponential curves just like transistor densities. Even if network capacity and user bandwidth were doubling at a much, much slower rate than storage capacity there would still be a cross-over point where cloud data becomes cheaper (in practice and for most people) than local data. This is because the most important factors to consider are not the raw resource costs, but rather the time cost and the cost of convenience lost.

A few thoughts before moving on:

  1. The doubling of storage density affects the cost of cloud storage too,  so it's not a cut and dry Moore vs Kryder argument.
  2. The speed/latency of disk storage is not keeping up with capacity. Cloud storage systems often have a better answer for this than is available with local storage. For example, before the advent of Gmail, we used to have to wait for email searches to complete.
  3. The relevant question is not "How much am I paying per bit?" rather "How much is it costing me to keep my email/pictures/music (insert data/media type) stored locally vs stored in the cloud?"
  4. When I'm talking about cloud storage, I'm not referring to a remote cloud hard drive service like Dropbox, I really mean a cloud service that is designed around a certain type of data such as Gmail for email or Github for source code, etc.

Time Cost:

Both local data and remote/cloud data have a time cost. The time cost for remote data is primarily waiting for access to my data to download/cache and be available/usable. The primary time cost for local data is time spent managing that data: backups, upgrades, organizing, permissions, etc.

The time cost of remote data is decreasing on an exponential curve (the smaller the file/media type, the sooner this becomes essentially a negligible cost). The time cost of managing local data certainly is not decreasing at anywhere near the same rate.

Cost of Convenience Lost:

Both local data and remote/cloud data also have a lost convenience cost. For remote data, this lost convenience is any time I don't have reasonable Internet connectivity (or the service is down). Finding yourself in a location without Internet access is extremely annoying, but imagine how much worse the situation was just three years ago.

The convenience cost for local data is that few people have access to this local data once they leave their homes. If they do have access, then they have probably either turned their home into a personal remote data service (time cost) or they duplicate their data to all their mobile devices (time cost).

There is also a privacy/security cost, but unfortunately, I think for most people it is very hard to quantify and therefore irrelevant for their day-to-day decisions. Also, it could be argued that the average person's mis-managed Windows PC might be more exposed than the average cloud provider.

The Cross-Over Point(s):

The cross-over point has already happened for most Internet users with email, bank records, contact lists, etc. Anyone remember POP3 email?

For most younger people this cross-over has also already happened with pictures (Facebook, Picassa, etc) and music (Pandora, last.fm, Apple Cloud, Google Music, etc). For the younger crowd (and other early adopters) the cross-over will happen soon (if it hasn't already) with documents (Google Docs, Office 365), presentations (Scribd, Google Docs, Youtube), videos/movies (Youtube, Hulu, Netflix, etc), etc, etc.

Large connected-only devices like the Google Chromebook will reach the cross-over point much later because they add a significant additional cost to the equation: weight/reduced portability. It's certainly possible that large connected-only devices may be so premature that they will die and not be ressurected for well beyond when they would have otherwise reached the cross-over point of cost-effectiveness.

The trend towards remote/cloud storage is already well underway. Nearly everyone who uses the Internet uses cloud storage in one form or another and they will be using much more of it in the near future.

"The future is already here — it's just not very evenly distributed."

---

PostScript:

I knew Larry while I worked at SiCortex. I have met few software engineers that are more brilliant than Larry. Whenever I think about Joel Spolsky's famous post about hiring poeple that are "1. Smart and 2. Get things done", Larry is one of the people that comes to mind for me. Thank you, Larry, for unknowingly spurring me on to write something I've been wanting to write for a while now.