A Few of My Favorite Transformer Tutorials

So, you want to learn about transformer models? Here are some of my favorite learning resources to get started:

Peter Bloem Transformers from Scratch: I like how he builds up attention from simple vector dot products (without invoking the query/key/value notation until late in the game after he has given you a firm mathematical understanding of what is going on and why it makes sense). Code examples in pytorch, but easily understandable without it
Brandon Roehrer Transformers from Scratch: I like how he motivates the idea of Markov models for text generation, and then motivates the idea of attention in terms of having variable lookback (without too many parameters). Also a lucid explanation of positional encoding tricks
Jay Alammar The Illustrated Transformer: This seems to be perenially popular on hackernews…I guess if you are into pictures and youtube videos it would be your thing. Included here for completeness, but I prefer the two above
My favorite My Favorite Things is Coltrane, naturally, but that’s a story for another post.
Addendum: 15 Mar 2023: FastGPT–GPT-2 in 300 lines of Fortran. ‘Nuff said.
Addendum: 15 Jan 2024: Random Transformer Nicely worked simple example, multiply the matrices yourself for concreteness.
Addendum: 24 Feb 2024: Implement GPT-2 in an Excel Spreadsheet. This is just the thing you need to do to build your skills for the Excel World Championships
14 Mar 2024 Built a simple GPT-2 type model in Mathematica, on the Shakespeare corpus step by step instructions, with custom tokenization. can be trained on a laptop in about an hour.
20 April 2024 3blue1brown has a nice video series on transformers as part of a series on deep learning
16 Jun 2024: Recent review of Review: Application of Transformers in Cheminformatics JCIM 06/2024