Dark Data is the ROI Problem

Ingenie
Jun 22, 2025
6 min read

The age of data generation

Data in the Digital Economy has been a massive generative force, no doubt, and as a result has accumulated in vast amounts.

Zetabytes of data storage.

Billions of applications.

Trillions of queries.

Thousands of data centres.

Continuous generation, duplication, storage and processing of data is increasing exponentially with the adoption of AI, especially in the form of the latest large language models, which continuously consume and tokenise words and characters to produce desired output. By 2030, the amount of data produced is expected to reach 1 yotta byte per annum! Let’s be honest - none of us can comprehend or visualise what this means, except that it is a lot, largely invisible, and increasingly puts pressure on financial and natural resources.

Dark Data, defined by Gartner as “the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes”, is now estimated to be 85% of all data stored and used by an enterprise. Keeping all this data safe and secure is a nagging headache for Chief Information Officers, expected to deliver investment returns on data assets and vast, sophisticated data infrastructures, typically running into $100 million+ in Capital Expenditure.

Data as a business asset

Technologists have long argued that there is perpetual value to data. It does not depreciate and does not deteriorate.

The other aspect is its perpetual effect. It has become almost impossible to delete and very easy to multiply.

We are told that dark data holds significant commercial potential, for instance:

Unique Insights: Historic data can reveal hidden patterns, correlations and insights that are not visible in recent, “hot” data analysis.

Better informed Decision Making: use a wider pool of information to improve strategic planning.

Hidden Treasures: The business could gain a competitive advantage by uncovering unique assets.

Resource Optimisation: Analysing dark data can lead to better resource allocation and optimisation, reducing operational inefficiencies.

Data storage dynamics

As organisations and individuals have shifted their data to the cloud, as part of ongoing digital transformation, more often than not, in a “lift & shift” fashion, hyper-scalers have decided how to best optimise storage based on companies' usage patterns (and structured their commercial contracts accordingly).

Businesses, therefore, can make an informed decision on how to store their dark data, and they do. Whilst most of such stored dark data is for compliance purposes, today, financial institutions, for instance, make limited use of historical transactions, customer communication logs and market data to optimise investment strategies and for fraud detection. With the arrival of Generative AI, there is now a massive demand for training data, which means access, analysis and more frequent usage is accommodated for Large Language Models. This calls for a complete re-assessment of the initially defined commercial terms with cloud providers.

We’ve got lots of data. Let’s go!

Some businesses have already deployed small armies of data engineers and scientists on a dark data treasure hunt. Provided that all this effort is a challenge with a clearly defined problem.

What are the risks?

Technical

A typical Data Mart or warehouse used for isolated data analysis is designed for static Business Intelligence (BI) reports, and relies on batch data processing. AI models require powerful CPUs (Central Processing Units), streaming data pipelines, and a wider network capacity to access the cold data stored in a deep archive. As data sets' size increases, so does the time it takes for access and processing. This results in speed and efficiency issues that many data specialists can attest to. Current computing capacity and data infrastructure is not designed to cope with scale.

Financial

When trying to access cold data, because of the data multiplication effect, business users and their technology partners find themselves in a scenario of high storage costs and even higher usage costs, i.e. occasionally 3 x higher than typical monthly bills from cloud and energy providers. Gartner, Inc. reports that “worldwide IT spending is expected to reach $5.06 trillion in 2024, an increase of 8% from 2023”. Commenting on the gold rush level of spend on Gen AI projects, Gartner’s analyst Lovelock says: “In 2024, AI servers will account for close to 60% of hyper scalers’ total server spending.”

Reputational & ethical

The ethical AI framework has only recently been defined in selected countries and regions. The Infocomm and Media Agency in Singapore recently unveiled the GenAI Model Governance Framework, with the world’s first Testing Framework and Toolkit for AI Governance for businesses operating in Singapore. The purpose of the framework is to begin to address ethical and governance issues when deploying AI solutions: explainability, bias, and human-centricity.

Environmental

While other industries have begun to quantify and reduce their environmental footprint, the technology industry has been quick to come up with use cases, but slow to quantify its own footprint. IT contributes up to 5% to the total carbon footprint worldwide, more than aviation. Training one large language model consumes the equivalent of 1 small town's electricity per month. Besides energy, water is another critical resource, often overlooked and underestimated in its impact.

"We are damned if we do and damned if we don't".

The good news is, it is not a new problem and lessons can be learned from the past few years of digital transformation.

Case Study

ITV's Strategic Transformation in Content Management

ITV stands as the largest content, media, and broadcasting entity in the United Kingdom, renowned for its iconic British filmography and unscripted entertainment offerings. In light of Apple's strategic interest in entering the streaming market, ITV recognized an opportunity to leverage its extensive library, which comprises nearly 100,000 hours of content across more than 1,000 formats. However, the dual existence of this content in both physical and digital forms raised significant concerns regarding the associated costs of storage, maintenance, re-mastering, and streaming. Consequently, ITV's executives prioritised optimisation efforts over several years, aiming to enhance operational efficiency.

Objectives and Challenges

As ITV sought to expand its international footprint and diversify revenue streams, the company embarked on a mission to monetise its existing archives. This initiative prompted a comprehensive content audit designed to identify valuable assets within its extensive library. However, after months of meticulous examination, the audit revealed limited opportunities for significant new revenue generation. While some content retained sentimental or historical value, much of it had already served its purpose and was well-documented.

Strategic Outcomes

Despite the initial findings, the audit process introduced a pivotal new dimension to ITV's operations. The true transformation emerged from the implementation of a robust content governance framework, which significantly reduced the number of permissible formats from over 1,000 to just 50. This strategic shift proved essential for ITV, a company fundamentally reliant on its content, marking a critical advancement in its transition to a digital content management paradigm.

Impact of Content Governance

The newly established content governance framework, coupled with a stringent definition of what constitutes an "asset," facilitated rigorous tagging, streamlining, and minimization practices for program producers. The introduction of content minimization key performance indicators (KPIs) for creators further reinforced this approach. As a result of these decisive measures, ITV has successfully doubled its revenues over the past decade. Without such transformative strategies, the content giant may have faced significant challenges in sustaining its market position.

Key take-away

ITV's case exemplifies how a strategic focus on content governance and optimisation can not only enhance operational efficiency but also drive revenue growth in a competitive landscape. The company's proactive measures in refining its content management practices have positioned it as a resilient player in the global media and broadcasting industry.

5 Lessons Learned

Hot data is your greatest asset: Clean, recent, and tagged source data that is central to your business model represents your most significant asset. It is intriguing how many businesses fail to leverage it effectively while seeking additional data.
There is no magic solution to transform obsolete data into valuable insights: There is a reason why dark data remains unused, and it is not solely due to storage issues. Why expend valuable resources on cleaning, tagging, and restoring it, or even retaining it?
Less is more: Accumulating excessive data and diversifying it across various formats and databases is counterproductive. Instead, focus on data minimisation techniques at the point of data creation, beginning with computational and architectural design.
Strategic data access: When training AI models, unless an organisation has unrestricted access to external data sources (e.g., social media), retrieving dark, cold data from cloud archives can become prohibitively expensive. Small AI/ML or statistical models can effectively address clearly defined business challenges.
Simplicity is key: Implementing simple, robust governance and discipline in data management—whether in architecture, storage, or tagging—will significantly enhance the utilisation of existing data and achieve desired commercial outcomes.