Experts warn artificial intelligence needs could soon exceed current and near-future electricity grid capacity, but the tech sector also faces another existential crisis.
Large language AI models such as ChatGPT, Gemini, Claude and Grok all work in the same way. In order for them to deliver effective responses to user prompts and requests, the need to ‘consume’ vast quantities of information.
This process – known as ‘scraping’ – essentially involves uploading online data from anywhere and everywhere, which teaches the platforms everything from how to draw like Monet to the rules of French grammar. It’s the focus of countless copyright law suits, because the original information was first created by a human, and most are not being compensated by Big Tech for using their work. And, crucially, that means there is not a limited supply.
According to the Business and Technology University [BTUAI], less than 5% of the world’s ‘textual knowledge’ – every piece of information that has ever been written down – is searchable in electronic form. And, while the web has grown exponentially since 2010, when Google Books estimated the world had 130million distinct titles in publication (and just 15% had been digitised), according to the dead internet theory, since 2016 most online activity has not been human. Instead, it primarily consists of automatically generated content.
‘We’ve already run out of data,’ Neema Raphael, Data Chief at Goldman Sachs, explained on the bank’s podcast, Exchanges. ‘I think what might be interesting is people might think there might be a creative plateau… If all of the data is synthetically generated, then how much human data could then be incorporated? I think that’ll be an interesting thing to watch from a philosophical perspective.’
This signifies a worrying emerging reality – if the current trajectory remains the same, artificial intelligence will increasingly learn from non-human sources. The impact of this on its accuracy and effectiveness, not to mention the behaviour and responses of human users basing judgement on AI output, could be problematic for everyone involved.
There have already been widespread reports of large language models repeating and spreading mis- and disinformation created on mass by ‘bad actors’. This happens because if you produce enough of any content that data will, eventually, make its way into the knowledge pool these platforms rely on. So what happens when AI is primarily being fed artificially generated content?
Of course, the information well has not completely run dry, yet. Proprietary datasets, for example those produced and held by businesses, represent a significant and often untapped resource that could train AI. And turning to this could mean a step change in how tech companies operate, mainly because these information mines are often highly secure due to the competitive nature of corporations.
They are not freely accessible online via Google search, which could mean training artificial intelligence in this way becomes more expensive than what has, up until now, been a free-for-all raid on whatever can be grabbed – often because the original owners do not have the power or resources to mount a legal defence. In comparison, an investment bank does not store its information on the ‘free internet’, and this business critical resource will never simply be handed over to train artificial intelligence.
Whatever the months ahead hold, they will be a testing ground for the sustainability of many AI models in their current form. Certainly when it comes to those contending in the race to achieve artificial general intelligence – considered by the West to be something of a holy grail. This week has already seen a potential $160billion wiped off the value of Meta as the company revealed it could be on course to spend $100billion on AI this year alone, investors are increasingly wary about claims the sector can continue to develop and improve at the rate it has in recent years, which is crucial to maintain profitability.
Image: Solen Feyissa / Unsplash
More Data Management:
Chinese councils briefed on AI and data, so what about Britain?
British public does not want data centre expansion at any cost
Leave a Reply