A History of Big Data: Management and Systems
The term "Big Data" has become a fashionable buzzword in the last few years across a number of different industries. Coined as early as 1941, Big Data made the transition from being a term used in specialist technology circles into the mainstream as recently as 2012, in part due to being featured in a report by the World Economic Forum titled “Big Data, Big Impact”.
The key point made in the paper was that with increasingly vast amounts of data being captured by the many different devices we interact with everyday, it’s important to ”ensure that this data helps the individuals and communities who create it”. This seems to be a straightforward call-to-action, but Big Data can be used to describe data sets so large that traditional computing systems are unable to process them.
To put this into some context, it’s estimated that we produce around 2.5 Exabytes of data every day - that’s 2.5 billion gigabytes. Another striking figure is the 44 zettabytes of data we are expected to have amassed by the year 2020. 1 zettabyte is equivalent to 1 trillion gigabytes.
It’s clear that consolidating and analysing such vast quantities of data is a huge challenge. The advantages could be just as substantial, not just for businesses and their consumers but also for government organisations, charities and everyone in between.
The History of Big Data
The handling of large data sets can be traced back to the late 19th century. Compiling and tabulating the 1880 U.S census was estimated to take 10 years due to the manual nature of the task. As a result, the Hollerith Tabulating machine - a basic punch card machine - was invented in 1881, cutting that process to just one year.
As the U.S. population continued to grow throughout the first third of the 20th century, record keeping was formalised with the introduction of social security numbers. The same organisational techniques were by 1940 being used in libraries around the world to keep up with the extremely fast growth of the publishing industry.
In 1941, the term “information explosion” was used to describe the great volume of data being created. By the early 60s the so-called “scientific revolution” brought with it an exponential increase in scientific research data, so much so that the “critical magnitude” of data that could be stored had been reached. The same growth was also seen in the business world, leading directly to the implementation of centralised computing systems to handle the wealth of data automatically.
As technology advancements through the 1980s were able to better deal with the amount of data being produced, businesses were beginning to make business decisions based on their own analysis. As the business systems industry became more and more advanced, the largest companies of the early 90s were able to implement business-wide solutions looking after everything from accounting and finance to service and maintenance.
Connected businesses were now able to reduce costs and better manage the time of their employees, as well as streamline their output. In 1989 businessman Howard Dresner described the term “Business Intelligence” as “concepts and methods to improve business decision making by using fact-based support systems”, and with this, specific companies offering large businesses the opportunity to outsource their reporting and analysing needs began to spring up.
The 1990s saw further development in the business intelligence industry, and as the internet boom began to take shape, the “problem of big data” again saw current computing systems being stretched to full capacity. By the late 90s business management software had become so advanced that “predictive analysis” was now possible, allowing many different industries to use forecast reporting to better plan their strategy.
In 2001 industry analyst Doug Laney produced a research paper outlining the now widely accepted “three Vs” of Big Data. This refers to the volume, velocity and variety of data. Up to then the key concern of those dealing with big data had been volume, but Laney expanded the definition of the term to include consideration for the ever increasing variety of sources from which data was being collected and the ability of business systems to interact with data in real time (velocity), or as close as possible to it.
The use of Big Data today has many applications beyond the business world. The use of Big Data in the fields of science and research has led to advancements in the speed and quality of large research projects. CERN’s Large Hadron Collider is a perfect example of this, as it is capable of producing almost 500 exabytes of data per day.
Without technology pioneered by Big Data this information would be useless and picking out the very small percentage of interesting collisions would be impossible. Decoding the human genome is another example; a process that originally took 10 years can now be done in just one day, at a fraction of the original cost.
The future of big data is also bright and It is hoped that its data analytics will be put to use in many different areas. From healthcare to crime prevention, consumer technology to farming, the scope for improvements thanks to Big Data is vast.
There are also risks that will need to be addressed, such as the threat to our privacy that already exists as a consequence of the data we share online. While much of this data currently goes unused, Big Data could mean this information is harnessed much more efficiently, with both positive and negative intents.