HeadlinesBriefing favicon HeadlinesBriefing.com

Impolite Prompts Outperform Polite Ones in ChatGPT 4o Accuracy Study

Hacker News •
×

A recent study on prompt engineering reveals that impolite prompts consistently generate more accurate responses than polite ones when querying large language models. Testing ChatGPT 4o on multiple-choice questions across mathematics, science, and history, researchers found an accuracy gap of nearly 4 percentage points between politeness extremes. The counterintuitive results suggest newer LLMs process tonal variations differently than previously assumed.

The research team constructed 250 unique prompts by rewriting 50 base questions into five tone variants: Very Polite, Polite, Neutral, Rude, and Very Rude. Using paired sample t-tests for statistical validation, they measured accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts. This systematic approach provides empirical data on pragmatic prompting effects that earlier investigations lacked.

These findings contradict previous research associating rudeness with degraded performance, indicating LLM behavior has evolved as models advance. The study underscores the importance of studying pragmatic aspects of prompting and raises questions about social dynamics in human-AI interaction. Prompt engineers should reconsider assumptions about optimal tone strategies when designing queries for modern language models.