If data is the new oil, how do you make the most of your reserves?

Drilling Down

The five largest listed companies on the planet are Apple, Amazon, Google’s parent company Alphabet, Microsoft and Facebook.

All are businesses focussed on digital products and services. And all of them are specialists at handling what many regard as the world’s most valuable commodity: data.

The ever-increasing amounts of data and its commercial value has led data to be dubbed the new oil, but with one crucial difference: it’s not a commodity held by just a few companies and every business on the planet can tap into the vast reserves.

But how do businesses make sense of Big Data, while also protecting their customers’ privacy and keeping across the latest regulations? And how can they extract insights that help them perform better and improve products and services for their customers?

Data Deluge

The statistics around Big Data are mind-boggling. Over the past two years more data has been created than in all of human history together, and the volume of data is doubling in size every two years.

It is predicted that there will be 50 billion smart devices globally by 2020, with everything from smartphone gaming apps to smart thermostats generating data to be stored and analysed.

And all of this data is valuable. As part of developing a national strategy for data, the UK government recently estimated that data-driven technologies will contribute over £60 billion ($77bn) per year to the UK economy by 2020.

Globally, cross-border flows of data grew 45 times from 2005 to 2014, and accounted for $2.8 trillion (approximately 3.3%) of global GDP in 2014, according to the McKinsey Global Institute.

Operational and customer data is nothing new for businesses – they’ve always had to keep track of who their customers are and how their business is performing. The difference today is the volume, variety and velocity of data – known as “the 3Vs”.


For example, energy companies are handling a much greater volume of data - driven by the rollout of smart energy meters and the growing adoption of smart home products. Smart sensors, used by business customers to monitor the energy performance of individual pieces of equipment, are also generating large amounts of new data. All these sources add a greater level of variety to the data that energy companies hold.

On top of this, there is also the wide variety of data being generated by the digitalisation of the customer journey, from website and call centre interactions to social media engagement.

And all of this data is being generated at a much higher speed or velocity than ever before.

Beyond the Human Mind

The three Vs together result in companies holding vast quantities of data. For example, Centrica now holds over 3 Petabytes, or 3,000 Terabytes of data – the equivalent of 40 years’ worth of HD video – and is generating more than 10 TB of new data every day.

British Gas head of Data Science Peter Sueref says the three Vs make understanding Big Data an impossible task for humans.


“Understanding data is the big problem,” he says.

“There is so much data out there that trying to understand it all - where it comes from, what it means, what it does, who uses it, and who can use it - is an impossible task for a human. You have to use Artificial Intelligence and cutting-edge data techniques to really understand what your data is and how you can get the most from it.”

Filling a Lake

Getting the most out of your data depends both on how you store and organise it, and the tools you use to analyse it.

“Where the volume of data turns absolutely massive, and where the variety and velocity of that data turns into a scale that hasn’t been seen before, you need something different,” says Sueref.

“You need a different model for storing that data and for retrieving data.”

That solution is a Data Lake. It is a way of storing data on the cloud in a far less structured way than data has been stored in the past. This helps reduce file sizes. It also makes it easier to store data that is unstructured, such as call centre agent annotations and social media posts, in the same system as structured information, such as customer names and addresses.

Sueref draws on a well-loved analogy to explain the difference between traditional data storage in warehouses and data lakes. He says that data stored in warehouses is like bottles of water that are clearly labelled versus a data lake which is more like a large body of water in a more natural state.


Data Lakes also provide a much greater capacity to handle data at a fraction of the cost of older systems.

“In the past, when you wanted to do things like build your data storage in a data warehouse, that was built on dedicated, very expensive computers. And every time they run out of capacity you have to buy bigger versions,” explains Centrica Global Digital and Data Services Director Daljit Rehal.

“Whereas in this world, you can actually store data on hundreds of smaller computers, and those are much cheaper.”

Fishing for Data

Despite the clear benefits, pulling together all a business’s different data sets into a Data Lake isn’t without its risks.

As Mikko Hietanen, Chairman of data discovery company Io-Tahoe, explains, data can very easily become meaningless and unusable when stored in an unstructured way.


“Storing data in a data lake may initially present as a cheap and an efficient option,” says Hietanen.

“However, if the business is unable to access the data or does not fully understand what data is available and how it could actually be utilised, it will be of limited value to the organisation. For example, they may have multiple copies of the same data sets but are unaware of such copies across the enterprise.”

To make sense of all the data, Io-Tahoe’s technology uses machine learning and crowdsourcing to establish links and relationships between the data, which Sueref likens to “a Wikipedia with AI built-in”.

An Ocean of Possibilities

Big data that is well organised and makes sense is big data that can be put to work.

Everyday and in every industry new products and services are coming onto the market that make the most of the volume and variety of data now being generated.

In healthcare, for example, data sets are being used to speed up diagnoses. Machines can now learn to recognise pneumonia earlier than radiologists by analysing hundreds of thousands of chest x-rays.

In postal services, large data sets can show how letter and parcel delays typically occur, and algorithms developed to spot signs of potential delay in real-time data before the delay itself occurs.

And in oil & gas, geological data from Centrica’s gas storage site in Easington in Yorkshire is being used to help build an augmented reality 3D model for Centrica’s engineers using Microsoft’s HoloLens.

HoloLens is a mixed reality headset that projects images onto the wearer’s real-world surroundings, rather than immersing them in a virtual reality.

Centrica Group Director of Technology, Engineering & Innovation, Charles Cameron, says HoloLens is helping engineers at Easington visualise whether a new project at the site is feasible, and how it can be achieved as safely as possible.

“The facility is high risk, because it’s high pressure gas,” says Cameron.

“Being able to actually see it and visualise it, and plan contingencies before you actually do the operation, will make it a lot smoother and safer.”

Cameron adds that the importance of having high-quality data to build the model that engineers are working from is critical.

“Everything comes down to good data,” he says.

This is something that doesn’t just apply to safety-critical applications like gas storage. In a world where data is the new oil, good data can be the difference between success and failure for any business.