Does training of AI models rely too much on input from other AI models?

Large Language Models (LLMs, like Chat GPT) use a large body of existing texts in training themselves so that they can answer questions by producing text that looks like the work of humans. The underlying texts used in training LLMs should be generated by humans—but are they? Researchers from the Swiss Federal Institute of Technology (EPFL) tried to find out.

In developing and training AI models, many companies pay gig workers on platforms like Amazon’s Mechanical Turk to complete tasks that are typically hard to automate. Examples include solving CAPTCHAs, labelling data and annotating text. This data created by humans is then fed into AI models to train them. But if that data was in fact itself created by other AI models and is incorrect, models trained with that data may absorb and amplify the errors. That may make it more and more difficult to find errors and work out where they came from.

What was the research?

The EPFL researchers hired 44 people on Mechanical Turk to summarise extracts from 16 medical research papers. Each extract was about 400 words long and the participants then produced summaries of around 100 words for each paper.

What was the result? The researchers estimated that between 33% and 46% of the gig workers had taken a shortcut: they had asked AI models to create text used in the summaries, rather than creating the summaries themselves.

Implications of the findings

The authors say their results show that platforms, researchers, and crowd workers need to find new ways to ensure that inputs meant to be created by humans do indeed come from humans.

To read more

The research is not yet complete and has not yet undergone peer review. But the researchers have published a draft paper reporting preliminary results. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks, by Veniamin Veselovsky, Manoel Horta Ribeiro and Robert West, is available at 2306.07899.pdf (arxiv.org)

A short commentary appears at The people paid to train AI are outsourcing their work… to AI | MIT Technology Review

Leave a comment

Your email address will not be published. Required fields are marked *

%d bloggers like this: