GPT in PyTorch

vk.com 31.01.2022

In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: https://openai.com/blog/better-language-models/ Code minGPT: https://github.com/karpathy/minGPT Code transformers: https://github.com/huggingface/transformers/blob/0f69b924fbda6a442d721b10ece38ccfc6b67275/src/transformers/models/gpt2/modeling_gpt2.py#L946 Code from the video: https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/gpt 00:00 Intro 01:32 Overview: Main goal [slides] 02:06 Overview: Forward pass [slides] 03:39 Overview: GPT module (part 1) [slides] 04:28 Overview: GPT module (part 2) [slides] 05:25 Overview: Decoder block [slides] 06:10 Overview: Masked self attention [slides] 07:52 Decoder module [code] 13:40 GPT module [code] 18:19 Copying a tensor [code] 19:26 Copying a Decoder module [co