BullshitBench tests whether AI models can detect nonsensical questions—or if they'll confidently answer them anyway. The results are dire.