La Grada
  • Economy
  • Mobility
  • News
  • Science
  • Technology
  • La Grada
La Grada
No Result
View All Result

Study reveals that humans continue to outperform AI in content moderation to protect brands, albeit at a cost 40 times higher

by Estefanía H.
August 22, 2025
in Technology
Study reveals that humans continue to outperform AI in content moderation to protect brands, albeit at a cost 40 times higher

Study reveals that humans continue to outperform AI in content moderation to protect brands, albeit at a cost 40 times higher

Goodbye to spending all day on your phone—experts reveal the exact limit to protect your mental health

Goodbye to harassment—WhatsApp will impose monthly limits on those who send messages without response, a measure against abuse and spam

The viral trick to get “Goku Mode” on WhatsApp with Dragon Ball icons, backgrounds, and sounds—here’s how to do it step by step

In a completely digitalized world where social media content blends with corporate advertising campaigns, brand safety is one of the main concerns for companies. The news portal The Register takes up the study conducted by the company Zefr, regarding who should perform content moderation on the internet: humans vs. AI. The study evaluates the performance and costs of various multimodal language models (MLLM), in comparison with human performance. Among the models examined is Llama-3.2-11B-Vision.

Metrics used include the accuracy with which AI determines that content is inappropriate, comprehensiveness, which measures the ability to avoid false negatives, and the F1 score, which is an average of the two previous metrics. The results revealed that, although the use of AI as a moderator is much cheaper, it does not reach the percentages obtained by humans. AI is not able to fully understand contexts and nuances, which prevents it from being entirely accurate.

Companies must ensure that their marketing campaigns do not align closely with harmful content both for the brand’s values and in the case of controversial topics. That is why the final decision of the study is to use a hybrid model in which AI handles the bulk of the information, while the more ambiguous and complex cases are supervised by humans.

Humans vs. AI

Companies face multiple challenges. However, with the advancement of technology, this list of challenges is increasing, and moreover, they are not easy to resolve. Social media and the internet have become a perfect platform for companies’ marketing campaigns, but they must be cautious. The content shared on social networks can include the most delicate and controversial topics, which is why companies need to choose carefully where to advertise.

This brings about another dilemma: is it better to use human moderators or AI-driven multimodal language models (MLLM)? One of the main aspects to consider is the cost, as hiring human moderators can be 40 times more expensive than using a language model. But the question is, do they provide the same results?

What do results say?

Zefr, a leading technology company that offers security and brand suitability solutions, has gathered a group of researchers to conduct a study comparing the effectiveness of human moderators and AI models, based on efficacy. To this end, specific parameters have been used:

  • Precision: refers to the probability that the claims of AI or human models are correct.
  • Recall: refers to the ability to detect all inappropriate content, and that this selection is correct.
  • F1 Score: refers to the average obtained from the previous parameters. It is about finding the balance between both.

With those concepts clear, we can move on to the analysis of the results. The multimodal language models (MLLM) that have been studied are GPT-4o, GPT-4o-mini, Gemini-1.5-Flash, Gemini-2.0-Flash, Gemini-2.0-Flash-Lite, and Llama-3.2-11B-Vision. A sample of 1,500 videos of explicit content related to drugs, alcohol, tobacco, death, injuries, military conflicts, and children’s content was analyzed. The results where the following:

Table obtained from The Register.

Model Precision Recall F1
GPT-4o 0.94 0.83 0.87
GPT-4o-mini 0.92 0.85 0.88
Gemini-1.5-Flash 0.86 0.96 0.90
Gemini-2.0-Flash 0.84 0.98 0.91
Gemini-2.0-Flash-Lite 0.87 0.95 0.91
Llama-3.2-11B-Vision 0.87 0.86 0.86
Human 0.98 0.97 0.98

Regarding the cost comparison between humans and AI, the results were:

Table obtained from The Register.

Model F1 Cost
GPT-4o 0.87 $419
GPT-4o-mini 0.88 $25
Gemini-1.5-Flash 0.90 $28
Gemini-2.0-Flash 0.91 $56
Gemini-2.0-Flash-Lite 0.91 $28
Llama-3.2-11B-Vision 0.86 $459
Human 0.98 $974

Conclusions

Regarding accuracy, the data shows better results for humans, with 0.98 compared to 0.31 for AI. Although these are good figures, the models are unable to match human capability. Why does this happen? AI struggles to identify contexts and nuances. One example was considering inappropriate content a Japanese video about coffee addiction, simply for associating the word ‘addiction’ with the category of drugs.

In terms of cost, it is more than evident: the human cost to conduct the study was $974, while AI only needed between $28 and $56. It is true that the price difference is staggering compared to the accuracy difference, but the reality is that accuracy carries more weight. The research concludes with the proposal of a hybrid approach, where AI would be used for the bulk of the content, while human moderation would be reserved for more complex topics that require context.

Meta’s AI could be in danger. Find out why!

  • Legal Notice
  • Privacy Policy & Cookies

© 2025 La Grada

No Result
View All Result
  • Economy
  • Mobility
  • News
  • Science
  • Technology
  • La Grada

© 2025 La Grada