ChatGPT Outperforms Human Crowd Workers For Text-Annotation Tasks

If GPT-4 has shown the potential to take away high-paying white collar jobs, such as those of programmers and consultants, it is coming for the less glamourous ones too.

A research paper has found that GPT-4 outperforms crowdsourced humans for text annotation tasks. Platforms like Amazon’s Mechanical Turk and others allow humans to remotely perform repetitive jobs for small amounts of money, such as labeling images and classifying text. The paper found that using ChatGPT was also cheaper than using crowdsourced humans.

“Using a sample of 2,382 tweets, we demonstrate that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection,” the paper published on 28th March 2023 by Gilardi et al says. “Specifically, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of five tasks, while ChatGPT’s intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003—about twenty times cheaper than MTurk. These results show the potential of large language models to drastically increase the efficiency of text classification,” it adds.

“We find that for four out of five tasks, ChatGPT’s zero-shot accuracy is higher than that of MTurk. For all tasks, ChatGPT’s intercoder agreement exceeds that of both MTurk and trained annotators. Moreover, ChatGPT is significantly cheaper than MTurk: the five classification tasks cost about $68 on ChatGPT (25,264 annotations) and $657 on MTurk (12,632 annotations) (see Section 4 for details). ChatGPT’s per-annotation cost is therefore about $0.003, or a third of a cent—about twenty times cheaper than MTurk, with higher quality. At this cost, it might potentially be possible to annotate entire samples, or to create large training sets for supervised learning. Based on our tests, 100,000 annotations would cost about $300,” the paper said.

Now this is pretty incredible by itself — platforms by Mechanical Turk are often used by low-skill workers in developing countries to make some side money, and having ChatGPT perform their jobs for even cheaper could mean that they lose out on this source of income. But more interestingly, the tasks that these human workers are often used to train machine learning models. Workers, for instance, could classify a sample of tweets, which could then be used to train a model which can perform a similar classification automatically. It now turns out that a machine in ChatGPT can even perform the classification better than humans. This opens up the tantalizing possibility that machines, in theory, could train machines to learn new things and get better. And if this flywheel could start whirring, AGI could hit us much sooner than anyone had ever expected.

1 thought on “ChatGPT Outperforms Human Crowd Workers For Text-Annotation Tasks

  1. Pingback: Apple Will Be The Leader In AI In A Few Years: Stable Diffusion’s Emad Mostaque – MagicWand AI

Comments are closed.