I really want to love a study that uses the term zettabyte without a trace of sarcasm but I am having a hard time with it. Perhaps somebody can set me straight, but I doubt it. This is the only thing I am going to hear about at work from my colleagues today, maybe all week, and it’s going to kill me. If you haven’t read Cisco’s VNI Forecast whitepaper or used the interactive tool (it’s cool) you should at least check it out. I can sum up the paper like so:
Cisco predicts an unfathomably large amount of data will be transmitted across the internet over the next four years but don’t worry; it is not at all self serving or in Cisco’s interests to drop massive unsubstantiated predictions on the public which can never be verified independently…
That’s perhaps a tad unfair at first blush but I think some scrutiny on this research effort is justified. After all, if I am going to be in a room full of experts day after day trotting out their latest pet statistic from the study (“hey, Pete! Did you know tentacle porn is estimated to grow to 97 exabytes in 2016?”) I think they deserve to have their annoying self importance backed up by good fundamentals; those start with the data and the methodology.
I sought the study methodology to understand what the margin of error for a paper like this might be. Are we talking about a solid zettabyte or did the authors of the project get to approximately 800 exabytes and phone it in from there? Spit out a crazy number and you get instant press, perhaps? One never can be too sure, so in a study about the growth of a worldwide network, written by a networking company I expected to see some math behind the methodology and statistics. What I got was this in the first step:
Step 1: Number of Users
The forecast for Internet video begins with estimations of the number of consumer fixed Internet users. Even such a basic measure as consumer fixed Internet users can be difficult to come by, as few analyst firms segment the number of users by both segment (consumer versus business) and network (mobile versus fixed). This year, the number of consumer fixed Internet users was not taken directly from an analyst source but was estimated from analyst forecasts for consumer broadband connections, data on hotspot users from a variety of government sources, and population forecasts by age segment. The number of Internet video users was collected and estimated from a variety of sources, and the numbers were then reconciled with the estimate of overall Internet users.
Somehow the phrase “we guessed” doesn’t sound as academic but it does have the advantage of economy and is distinctly more accurate. They have analyst forecasts for which their data is already problematic. Some of their data comes from government sources, (whose government?) and we have the rather vague statement about reconciling estimates up there. Whatever does that mean? It means they had two figures that disagreed on the number of users and they picked one.
Care to hazard a guess which number Cisco picked? I’ll give you a hint: it’s either the smaller or the larger number and they use the word “Zettabyte” in the paper.
The truth is that Cisco is probably close but I won’t be using these conclusions in any of my recommendations to clients. It’s not too difficult to figure out network traffic is exploding and 4x growth over 4 years is only a 58.7% CAGR, so these estimates are not outlandish. I can tell you from my own experience that unstructured data storage growth bounces between 40% and 70% per year, so as long as the number of zeroes is close, nobody will seriously question Cisco’s findings.
And that’s the problem; we should question the findings. The investment here by Cisco cannot be understated. Research and analysis takes time and money which is why a meta-analysis like this is often cheaper and faster. Their prose is clear and the interactive tool is very slick. Finally, the presentation layer is outstanding. It’s just that I am not certain I can trust the underlying data. Perhaps it would have been prohibitively expensive to contract out a survey from, say, Gallup, Pew or Harris Interactive. It’s not my money to spend but with one of those firms I might know they are confident to within a few percentage points how many billions of users might be accessing the Internet, streaming content and contributing to the zettabyte conclusion.
So while I think Cisco is probably close, I’m allergic to drawing conclusions the data does not support. The report for now is just an excuse for more tech journalists to whip out a pocket calculator and sound smart, just like my colleagues: “Hey, Pete…did you know a zettabyte is the same as 43,980,465,111 Blu-ray discs?!” Yet another reason I don’t read the industry press, as if I needed it.
If you’re thoroughly impressed with the study methodology and you care to comment, please keep in mind I just poked at the first step in their data collection and analysis approach. The remaining steps are no better. Until then, I need to go. I have to figure out how many copies of all Star Trek related media I could fit into a zettabyte. Probably just a few, I understand there were a lot of spinoffs.