How data is used for LLM programming

Software 1.0 -- the non-AI, non-ML sort -- extensively uses testing to validate things work. These tests are basically hand-written rules and assertion. For example, a regular expression can be easily tested with strings that should and should not give a match.

In software 2.0, and specifically supervised learning, the program is automatically learned from the dataset. It is similar to unit tests: the input and output define the system's expected behaviour. But the dataset is much larger. Thousands to millions of data points are needed to learn the program effectively.

Funnily enough, LLM programming looks a bit more like software 1.0 than software 2.0. You can do tasks zero-shot ("give me 10 dog names"), or few-shot ("give me 10 dog names like Luna or Chippy"), but in both cases this looks more like writing code ("prompt") or unit test ("example") as opposed to training on a large dataset. Of course, training on lots of examples is still possible via fine-tuning, but it is optional. The capability of the model relies on it having been trained on large datasets beforehand in an unsupervised manner.

From what I have heard, the combination of all the above is the most effective. That is, to make LLMs work well for you, you want to both craft an effective prompt and fine-tune the base LLM you are using.