MedWhat

Big picture:

Most state of the art NLP models today are based on attention mechanisms, specifically multi-layer self attention mechanisms, also known as “Transformer” architectures.
Landmark models include

For the first time, use a complex natural language model for multiple tasks (translation, English constituency parsing) that does not use convolutions or recurrence (the previous state of the art). Instead, a multi-layer self attention mechanism generates increasingly powerful contextual embedding representations for every token in the input, which can be used for any task.

General natural language encoding model, any decoding model can be added and fine tuned for a specific task (e.g. next sentence prediction, word prediction, question answering, etc.)
Multi-layer self attention mechanism
Pre-training of the encoding layer is achieved with extremely large corpora of natural language -> general modell

Standard language modeling objective (next token prediction) as pre-training for powerful transformer based language model
Task-conditioning as auxiliary input in natural language form to the model (e.g. naming the task and providing some examples)
This allows few-shot or zero shot learning, i.e. the model can be essentially directly applied to any new task, without fine-tuning or changing parameters, by providing a description of the task as part of the input
GPT-3 performs well on seemingly unrelated tasks such as writing code from natural language descriptions, or generating subject-specific text that looks like it was written by a human
GPT-2 and GPT-3 build on this by employing larger text corpora and larger models with longer training
Parameters: GPT: 100M GPT-2 (paper, blog): 1.5B GPT-3 (blog): 100B (!!!)
Limitations:

Large context and summarization are difficult
Unidirectional training creates limitations
Models have the biases from the corpora it was trained on
Large and costly inference
GPT-2 and GPT-3 are extremely large models that are difficult to train without the computational resources of a cash-flooded company

The latest on Artificial Intelligence