Ensuring & Balancing
Securing the future of AI requires more than just regulation; it needs policies that support high-quality data as a public good.
Effective AI governance depends on balancing regulatory frameworks with initiatives that promote equitable and high-quality data access.
High volumes and diverse data are critical for improving the performance of Large Language Models (LLMs) and driving AI progress.
Concerns
The rush for data can lead to ethical issues, such as using pirated content, which raises concerns about quality and fairness
There is a risk of feedback loops that amplify biases and reduce data diversity due to contamination by LLMs themselves.
The potential for 'peak data' by 2030 could limit the availability of pristine text necessary for AI development.
Cultural policies
Current LLMs are often trained on a narrow range of content, reflecting biases and gaps in data diversity, particularly in language and cultural representation.
AI training lacks primary sources and diverse linguistic data, missing out on valuable cultural and historical texts.
Global cultural and archival data represent a significant untapped resource that could enrich AI’s understanding and capabilities.
Harnessing cultural heritage data could revolutionize our understanding of history and safeguard cultural heritage.
Publicly available cultural data could benefit smaller companies and startups, promoting innovation and leveling the playing field
Italy & Canada
Italy’s €500 million Digital Library project aimed to make cultural heritage accessible but faced challenges and restructuring.
Canada’s Official Languages Act demonstrated the value of bilingual datasets for training translation software and AI.
Importance of Regional and Low-Resource Languages
Promoting digitization of low-resource and regional languages is crucial for complementing high-resource languages in AI development.
Digitizing cultural heritage is essential for preserving history, democratizing access to knowledge, and enabling inclusive AI innovation.
COMMENTS