9 months since GPT-3 revolution

On May 28, 2020 OpenAI published the GPT-3 paper, "Language Models are Few-Shot Learners",

This was the "AlexNet moment of the Transformer Revolution", and the qualitative jump was even more significant than the AlexNet 2012 jump.

One extremely strange and remarkable property of GPT-3 is that purely linguistic knowledge in this model is often sufficient to guess a piece of correct computer code from a natural language description of a problem (even though we don't think this model "truly understands programming").

There was already quite a boom in these novel models (invented as recently as 2017), after BERT and GPT-2, but now the field had just exploded: "efficient transformers", "vision transformers", "multimodal transformers", etc.

And tons of interesting work were done in hybrid models which combined transformers and other attention-based models with all kinds of other techniques. Hybrids of all kinds of methods with transformers and other attention-based models are probably the future. For example, the famous Alpha Fold 2 by DeepMind which "solved" protein folding in November was a hybrid model with an attention-based component at its center:

All this means two things: 1) "True AI" can emerge any moment; we are seeing a flood of breakthroughs now, and one of them (or a short sequence of them) might result in a much more radical shift than anything we've seen so far. I don't know if it happens this year, but it's a possibility (from invention of convolutional neural nets in 1989 it took more than 20 years to AlexNet, from invention of Transformers in 2017 it took only 3 years to GPT-3, things can really start happening very fast).

2) If you are a practitioner in any field (especially if this field is machine learning of some kind), it makes sense to ponder hybrids between your favorite methods and "attention" (which is just a linear combination of high-dimensional vectors [sometimes with all coefficients being non-negative and summing up to 1]), hybrids between your favorite methods and matrix multiplication (which is just a way to compute a lot of linear combinations of high-dimensional vectors rapidly), hybrids between your favorite methods and Transformers (a certain way of arranging those matrix multiplications and interleaving them with modest neural connectors). This is likely to be a very fruitful thing, and this is how you can supercharge your favorite methods and produce novel results.


