Anhinga, snakebird

9 months since GPT-3 revolution

On May 28, 2020 OpenAI published the GPT-3 paper, "Language Models are Few-Shot Learners",

This was the "AlexNet moment of the Transformer Revolution", and the qualitative jump was even more significant than the AlexNet 2012 jump.

One extremely strange and remarkable property of GPT-3 is that purely linguistic knowledge in this model is often sufficient to guess a piece of correct computer code from a natural language description of a problem (even though we don't think this model "truly understands programming").

There was already quite a boom in these novel models (invented as recently as 2017), after BERT and GPT-2, but now the field had just exploded: "efficient transformers", "vision transformers", "multimodal transformers", etc.

And tons of interesting work were done in hybrid models which combined transformers and other attention-based models with all kinds of other techniques. Hybrids of all kinds of methods with transformers and other attention-based models are probably the future. For example, the famous Alpha Fold 2 by DeepMind which "solved" protein folding in November was a hybrid model with an attention-based component at its center:

All this means two things: 1) "True AI" can emerge any moment; we are seeing a flood of breakthroughs now, and one of them (or a short sequence of them) might result in a much more radical shift than anything we've seen so far. I don't know if it happens this year, but it's a possibility (from invention of convolutional neural nets in 1989 it took more than 20 years to AlexNet, from invention of Transformers in 2017 it took only 3 years to GPT-3, things can really start happening very fast).

2) If you are a practitioner in any field (especially if this field is machine learning of some kind), it makes sense to ponder hybrids between your favorite methods and "attention" (which is just a linear combination of high-dimensional vectors [sometimes with all coefficients being non-negative and summing up to 1]), hybrids between your favorite methods and matrix multiplication (which is just a way to compute a lot of linear combinations of high-dimensional vectors rapidly), hybrids between your favorite methods and Transformers (a certain way of arranging those matrix multiplications and interleaving them with modest neural connectors). This is likely to be a very fruitful thing, and this is how you can supercharge your favorite methods and produce novel results.

Anhinga, snakebird

Julia programming language

Julia is an unusual language. It is based around the idea of "eating your cake and having it too, again and again". Flexible and very fast at the same time, friendly readable syntax and Lisp-strength macros and multiple dispatch, etc:

Julia Flux is trying to become the next generation machine learning framework, and is also characterized by this approach of "eating your cake and having it too". If TensorFlow 1.0 is the past, and PyTorch is the leading state-of-the-art framework of the present, Julia Flux is quite likely to become the machine learning framework of the future; see the first comment in this blog post for details:

Does anyone here use Julia, or does anyone here knows someone who uses Julia?

Anhinga, snakebird

2019: shaders, dreamwidth, and more

I hope for us all to have a creative and safe New Year.

Computer art news: I started to play with OpenGL shaders and with

Other news: books and stories, my own texts, open source activity and software experiments, employment change, etc:

I generally shifted quite a bit towards dreamwidth and this blog during this year, and away from livejournal; this was not planned, but just happened "organically". Most of my activity this year was at

I created a couple of feeds from that blog to LJ (usually, RSS and Atom feeds are slightly different, but in this case they seem to be the same): dmm_dream_rss (broken since late January, fixed on April 9, broken again since April 14, fixed on May 27 capturing posts since May 15) and dmm_dream_atom (broken since April 9, fixed on May 26 capturing posts since May 15).

Anhinga, snakebird

Self-referential neural nets in 2018

Two series of experiments with self-referential neural nets with vector flows ("dataflow matrix machines") were done by us in 2018.

The ability of a neural net to modify itself on the fly was used to edit it interactively while it is running ("livecoding"). This also opens the way to have populations of neural nets editing each other.

Emerging "sleep-wake" behavior and other emerging bistability patterns were observed in randomly initialized neural nets (May 2019 update: a couple of video recordings of those behaviors are posted: and ). There is no theoretical understanding of this emerging dynamics yet.

Collapse )

Other blogs by this author: (partial mirror: ) (mirror: )
Anhinga, snakebird

Dataflow matrix machines as generalized recurrent neural networks

A year ago I posted about dataflow programming and linear models of computation:

It turns out that those dataflow matrix machines are a fairly powerful generalization of recurrent neural networks.Collapse )
Anhinga, snakebird

(no subject)

I am using the friends list here mosly to create a useful Friends page.

And this is my Friends of Friends page.

I filter my default Friends page somewhat since December 2005. This is my unfiltered Friends page. This is an "express selection" (about 20% of the volume).

Russian Virtual Keyboard by Paul Gorodyansky. Online translation by All tags of this journal.

LiveJournal notifications about new friends or new comments work only in some cases.

If I did not notice something, please leave a comment in the post below this one.

Anhinga, snakebird

A robot with impressive language capabilities

This is a prototype of an elderly home care robot developed by a very small group at IBM (with the benign indifference from their employer corporation that does not want to deal with robots and headaches and liabilities associated with robots). Its ability to verbally communicate with a human and to learn from a human is very impressive (basically, one can program this robot to a large extent simply by talking to it). Here is a demo video from the talk at the AGI-12 conference:

The paper itself, "An Extensible Language Interface for Robot Manipulation", explaining to some extent how this works can be found here:

and the free online version of AGI-12 proceedings is here (scroll down to AGI-12 Contributed Paper Sessions):