ML Architecture and Performance Reading Group

September 1, 2025 · 2 min read

I spend a lot of time thinking about machine learning research these days - particularly efficient architecture and pre-training research. This also means that I like to read about ML a lot, and I love discussing ideas with others. Here is a draft of a blurb:

If you’re interested in understanding the science of how large language models become intelligent and gain emergent capabilities, consider signing up for the ML Performance Reading Group! We will meet roughly once a week (Monday 5PM EST). One of the organizers will send 1-2 papers to read asynchronously before the week, and we will discuss thoughts and ideas about the paper during the meeting.

The papers we will be reading primarily focus on novel, principled designs for the entire model training stack; this includes building architectures based on empirical observations (MesaNet), new optimizers (Muon), pre-training (How to Scale Your Model, Inside vLLM), hardware alignment (DeltaNet, Flash Attention), and more.

Prerequisite knowledge: we assume 6.390 knowledge, and you should be able to write the architecture of a transformer from scratch. Experience with GPU hardware, important deep learning papers (e.g. those covered in 6.7960) and reading papers is helpful, but certainly not necessary. We welcome questions and curious learners of all forms. In some sense, my goal is for this reading group to be the resource I had when I first starting working on ML research.

Note: This reading group is not focused on the applications of AI, although we believe intelligent architectures should be useful. We are also not the place to go to in search for referrals. Outside of the cultural zeitgeist surrounding the field, we believe that the science behind ML architecture research is some of the most interesting and challenging problems out there, and if you agree, we’d love for you to join us by filling out https://forms.gle/z91xaLNrpTYUGq226.

I worked on efficient pre-training and novel architecture research at Tilde (tilderesearch.com), and I'm cohosting with Kristine, who researched at Together AI and Kaiming He's vision group.

Example Reading

https://www.aleksagordic.com/blog/vllm

(Sections of) https://jax-ml.github.io/scaling-book/

https://jeremybernste.in/writing/deriving-muon

https://guangxuanx.com/blog/stacking-swa.html

https://arxiv.org/abs/2505.19488