AI accuracy at risk? Contractors forced to evaluate Gemini responses outside their expertise

Updated on 19-Dec-2024
HIGHLIGHTS

A new Google policy affecting Gemini AI raises concerns that its outputs on critical topics might become less reliable.

Until recently, contractors had the option to skip prompts that were too technical or outside their knowledge areas.

However, new internal guidelines from Google have removed this flexibility.

Generative AI may seem magical, answering everything from math problems to medical queries in seconds. But behind these systems, companies like Google employ teams of “prompt engineers” and analysts who fine-tune the technology by evaluating chatbot responses for accuracy. Now, a new policy affecting Google’s Gemini AI raises concerns that its outputs on critical topics might become less reliable.

Gemini, Google’s AI chatbot, relies on contractors from GlobalLogic, an outsourcing company owned by Hitachi, to assess responses on factors like “truthfulness.” Until recently, these contractors had the option to skip prompts that were too technical or outside their knowledge areas. For example, if asked to evaluate an answer about a rare heart condition, a contractor without a medical background could simply choose not to rate it. However, new internal guidelines from Google have removed this flexibility.

Also read: Google announces Gemini 2.0, its most powerful AI model: How is it different from Gemini 1.5

According to TechCrunch, the updated rules now require contractors to evaluate all prompts, even those requiring specialised expertise they don’t have. Contractors must “rate the parts of the prompt [they] understand” and leave a note if they lack domain knowledge. Skipping is only allowed in two cases: if the prompt or response is incomplete, or if it contains harmful content that requires special approval to evaluate.

Previously, the guidelines explicitly stated, “If you do not have critical expertise (e.g., coding, math) to rate this prompt, please skip this task.” But the updated version reads, “You should not skip prompts that require specialized domain knowledge.”

This change has sparked worries about Gemini’s accuracy, especially on sensitive topics like healthcare or technical subjects. One contractor expressed frustration in an internal chat, saying, “I thought the point of skipping was to increase accuracy by giving it to someone better?”

The potential impact is significant. Contractors, now forced to review topics beyond their expertise, might unintentionally approve responses with errors. This could lead to inaccuracies in Gemini’s outputs, potentially affecting people relying on the AI for trustworthy information.

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds.

Connect On :