A recent experiment has revealed that the output format of Large Language Models (LLMs) can significantly impact their behavior, with Nvidia, Ring, and OpenAI being notable players in the LLM space. The study found that the number format used to express confidence scores can alter the model’s output, with decimal formats producing more conservative and consistent results.
The experiment showed that when LLMs output confidence scores in decimal format, they tend to be more cautious and consistent in their predictions. In contrast, when the output is formatted as a percentage score from 0 to 100, some models exhibit erratic behavior and may even break. This suggests that the choice of output format can have a profound impact on the reliability and accuracy of LLMs, with decimal formats being a more robust choice.
The implications of this discovery are significant, as it highlights the need for careful consideration of output formats when deploying LLMs in real-world applications. As researchers and developers continue to refine and improve LLMs, they will need to take into account the potential effects of output format on model behavior, ensuring that their systems are reliable, consistent, and accurate. The findings of this experiment are likely to inform future developments in the field, leading to more robust and trustworthy LLMs.

















Leave a Reply