What is Machine Unlearning (MUL)?
MUL refers to the process of removing the influence of specific training data points on an already trained machine learning model
First mooted by Cao and Yang in ‘Towards Making Systems Forget with Machine Unlearning’
It is the antithesis of ML
An algorithm is added to the AI model for the purpose of identifying and deleting false, incorrect, discriminatory, outdated, and sensitive information
Solution or not
The concept builds on the challenge of removing information due to the constant churning of data by the Large Language Models (LLMs)
But it gets difficult to keep track of the data as it can be utilised for multiple objectives, creating a complex web of algorithms, also known as data lineage
It adversely affect its quality, leading to manipulation, adversarial outputs, and difficulty in locating and removing sensitive information
Moreover, as there is no sandbox approach for choosing and processing data in these models,
there is also a proven possibility of hackers inserting manipulated data to produce biased results (data poisoning).
One might argue for simply deleting the entire data set, i.e. data pruning, and re-training the entire AI model.
However, it will lead to inflated computational costs and undue delays for the data fiduciaries
Simultaneously carrying the risk of losing substantial accuracy
How MUL model can be implemented to effectively
3 approaches : private, public, and international
In the private approach, data fiduciaries will be primarily responsible for testing MUL algorithms,
which can then be applied across their training models for efficient deletion based on specific requirements
In the public approach, the government has the responsibility to prepare the statutory blueprint, either through soft-law or hard-law approaches,
to obligate data fiduciaries to fulfil their legal obligations
The international approach emphasises the role of nation states in coming together and preparing a framework to be adopted uniformly at a domestic level
COMMENTS