24.05.22
DeepMind have built a new ML model that can do lots of tasks, called Gato.
Machine learning is a branch of artificial intelligence that aims to leverage historical input data to improve the accuracy of predicted outcomes – much like how humans learn. However, machine learning models have traditionally been restricted by their ability to only work in one domain.
Gato is different. The London-based AI firm Deepmind, a subsidiary of Google have published a paper describing a new model that can do a variety of tasks. They call it Gato.
The multi-modal AI system is capable of performing more than 600 different tasks; use language, play games and control a robot arm amongst other things. Impressively, Gato performed at over a 50% expert score threshold in 75% of the tested tasks.
But how does it work? Gato utilises a transformer architecture similar to models like GPT3. The transformer architecture is fast becoming the most successful approach to large scale neural machine learning. The same approach can be used for any data that can be conceived as a sequence. One reason it is successful is because there is no need to collect large amounts of labelled data. Instead, the structure or the data is used to predict the next entity in the sequence.
The approach allows models to be trained on huge amounts of data and therefore the capability to learn a variety of tasks. This is actually a similar process to how the brain works, according to philosopher Andy Clarks predictive processing theory. In his work, Clark stressed the importance of having agents embodied in the world and capable of action. With this work by DeepMind we are moving closer to that ideal. I think this is a pretty important piece of research!