This was the "AlexNet moment of the Transformer Revolution", and the qualitative jump was even more significant than the AlexNet 2012 jump.
One extremely strange and remarkable property of GPT-3 is that purely linguistic knowledge in this model is often sufficient to guess a piece of correct computer code from a natural language description of a problem (even though we don't think this model "truly understands programming").
There was already quite a boom in these novel models (invented as recently as 2017), after BERT and GPT-2, but now the field had just exploded: "efficient transformers", "vision transformers", "multimodal transformers", etc.
And tons of interesting work were done in hybrid models which combined transformers and other attention-based models with all kinds of other techniques. Hybrids of all kinds of methods with transformers and other attention-based models are probably the future. For example, the famous Alpha Fold 2 by DeepMind which "solved" protein folding in November was a hybrid model with an attention-based component at its center: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
All this means two things: 1) "True AI" can emerge any moment; we are seeing a flood of breakthroughs now, and one of them (or a short sequence of them) might result in a much more radical shift than anything we've seen so far. I don't know if it happens this year, but it's a possibility (from invention of convolutional neural nets in 1989 it took more than 20 years to AlexNet, from invention of Transformers in 2017 it took only 3 years to GPT-3, things can really start happening very fast).
2) If you are a practitioner in any field (especially if this field is machine learning of some kind), it makes sense to ponder hybrids between your favorite methods and "attention" (which is just a linear combination of high-dimensional vectors [sometimes with all coefficients being non-negative and summing up to 1]), hybrids between your favorite methods and matrix multiplication (which is just a way to compute a lot of linear combinations of high-dimensional vectors rapidly), hybrids between your favorite methods and Transformers (a certain way of arranging those matrix multiplications and interleaving them with modest neural connectors). This is likely to be a very fruitful thing, and this is how you can supercharge your favorite methods and produce novel results.
Cross-post: https://anhinga-anhinga.dreamwidth.org/84201.html