Ask HN: Is Prompt Engineering Just Overfitting?
Whenever I see people doing prompt engineering they start with some kind of evaluation dataset, then they refine their prompt to perform well on that evaluation dataset. But isn't this just like training on a test dataset i.e. overfitting?
The procedure you are describing could absolutely lead to overfitting, but you wouldn't know for sure until you test the prompt on an independent dataset.