What’s so big about big data? Let’s start with a few stats…
1. IDC says 1.8 zettabytes of data were created and replicated in 2011, and 1.8 trillion gigabytes are expected to be created and replicated in 2012. This is a growth factor of 9 in the past 5 years.
2. Over the next decade, the number of files will grow by 75x. However, the staff available to manage will only grow by 1.5x.
3. 75% of the information created is generated by individuals, but enterprises have some liability for 80% of the information at some point in its digital life-cycle.
4. Only 1/2 of the information that should be secured and recoverable is actually secured and recoverable.
As a CXO, I’m excited to see an opportunity– an opportunity to mine for value, new streams of revenue, corporate differentiation, etc. I’m excited about the possibilities. Then I wake up and I’m scared [bleep]less. Sure, there is reason for excitement and an opportunity to do great things, but how do I secure, backup and recover all of that today? Also, if the growth factor is a third of what they predict, how do I protect and recover my data sets next year? Via traditional methodologies? Should I laugh or cry at the thought?
The only chance I have is to take a different approach: an approach that begins with looking at the data differently, capturing my new view at the source and storing it in a way that eliminates the concept of a backup. I need all of these features without compromising my ability to version for recovery.
So how should I look at data differently? Well, data is either structured or unstructured. Some will add semi-structured. Regardless, they are different so protecting them with the same methodologies is not only inefficient and costly, it makes it almost impossible to recover within the time required to maintain a competitive edge. Data is also either active or inactive. The definition varies according to the data set but active is active and inactive is inactive, regardless of how it is determined. What if I could adopt a different approach to protecting that inactive data set? What if the inactive data accounts for 60-80% of the corporate data set, like most, or 30-40%, like others? What if I could store that data in such a way that it could be removed from the traditional backup stream without compromise? What if the data was always online, accessible, stored with versions, and its retention governance was assured?
The very real possibility of this method sets the stage for a different approach that enables the security of the trillions of gigabytes we will create and our ability to recover without the concept of a traditional restore.
Stay tuned for Part 2. If the analysts are half right and 30-50% of our data is unstructured, what if we could capture the data at its source? How would we do things differently and make them better?