Creating AI is Curating Examples

Creating AI is Curating Examples

A few years ago at Starship, I contributed to the Data is the Specification manifesto.

The core idea is that it's better to solve problems directly against a collection of examples, as opposed to trying to generalise the problem first and then solving the general problem.

Typically you would run an employee engagement survey, distil the top 5 problems, and brainstorm solutions for those. Alternatively, you could use data as the problem specification. You could make a list of the specific complaints of each employee, and evaluate every proposed solution against this list - what % of complaints would it solve?

It feels unnatural to approach employee engagement this way. In contrast, when building AI, this is the core job: building a large enough list of relevant examples. For a typical image classification task (e.g. "did the user upload an image of the right ID document?") improving the neural network architecture, preprocessing and model-training code, etc -- in short, the machine learning part -- matters much less than what data you feed into the model.

Since building a basic AI only requires curating the right set of examples, there is no need to learn Python or TensorFlow for that. Given a good enough interface to teach through examples, anyone should be able to teach a task to AI.

However, the AI barrier to entry remains high today. It reminds me of stories about the early days of the internet. Few people could build websites and hosting them cost a lot up front, so it seemed silly to suggest that every person could have a website. Yet Wix, WordPress, Facebook Pages, Instagram profiles and many other platforms show that a 1,000x reduction in the effort of building a website has produced countless new use cases.

With the right tools and platforms for curating data, building AI will become 1,000x easier as well. With that, I can see massive benefits coming from the widespread use of narrow AI, way before a breakthrough in human-level general AI.