Paradigm Shifting Papers in AI

By an AI and human team

Attention is All You Need

“Attention is All You need” laid the foundation for the brains that would eventually give rise to GPT.

The original paper proposed a new type of neural network architecture called the Transformer. Unlike previous complex systems such as “recurrent” or “convolutional” layers, the Transformer relies solely on a mechanism called "attention".

Letting the AI parse for itself what is most important, achieving state-of-the-art results in machine translation and other natural language processing tasks.

The models were easier to train, required less time to do so, and performed better on tasks such as machine translation.

In their experiments, the authors tested the Transformer on two translation tasks: English-to-German and English-to-French. The Transformer outperformed existing models, achieving better translation quality (measured by a metric called BLEU) while requiring less training time.

The authors also demonstrated that the Transformer could be applied to other tasks, like English constituency parsing, which is a way to analyze the grammatical structure of sentences.

Today the T in GPT stands for Transformer and is still the architecture for large language models like GPT and Bard.

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
— Authors

Paper Published 17 December 2017


GPT4

Published 14 March 2023

The colossal and seminal paper that presented the fourth generation of Generative Pre-trained Transformer (GPT), a massive language model with 1.5 trillion parameters that could generate coherent and diverse texts on almost any topic, surpassing human performance on several benchmarks and challenges.

Previous
Previous

The Middle Way

Next
Next

Artificial General Intelligence