Soumith Chintala

PyTorch's design origins

its connection to Lua, its intertwined deep connection to JAX, its symbiotic connection to Chainer

Going down the history of PyTorch

December 17, 2023
 

The groundwork for PyTorch originally started in early 2016, online, among a band of Torch7’s contributors.

Torch7 (~2010-2017)

These days, we also commonly refer to Torch7 as LuaTorch, as it was used via Lua. Torch7 was written by Ronan Collobert, Clement Farabet and Koray Kavuckuoglu in ~2010. I was deeply involved in Torch7 since 2012, with official “maintainer” status, joining these three original authors in April 2014.

Refactoring LuaTorch to be language agonstic (late 2015 to mid 2016)

LuaTorch’s C backend with all the CPU and CUDA code for Linear Algebra and Neural Networks was deeply intertwined with Lua. So, a bunch of us lead by Luca Antiga Andreas Köpf Sergey Zagoruyko me Adam Paszke Francisco Massa refactored these backends to be agnostic of Lua, and usable independently. We did this after discussing online that we should move LuaTorch to a new, modern design, but hadn’t quite framed what that design should be.

Writing a new Python based Torch (mid 2016)

Adam Paszke reached out to me early 2016 looking for internships. At that time, the entire LuaTorch team at FAIR was ~3 people (Gregory Chanan Trevor Killeen and me). I asked Adam to come do an internship to build the next version of LuaTorch, with modern design. Sam Gross was in-between projects, so he joined in full-time as well.

We started from a fork of the LuaTorch, LuaTorch-nn codebases specifically for two things:

  1. the TH/THC and THNN/THCUNN C backends
  2. Building a compatibility with LuaTorch’s checkpoints, so that LuaTorch users could smoothly continue into PyTorch. We did this by transpiling LuaTorch’s nn code to Python. We called this package in PyTorch torch.legacy.nn.

Then, coming to the design itself, we debated a lot of designs. The strong inspirations were:

  1. torch-autograd (written by Alex Wiltschko and Clement Farabet)
  2. Chainer (written by the team at Preferred Networks).

Zeming Lin who loved Chainer would obsessively tell us its the best thing ever, so he came on board to build this together with us. Quite a few others such as Natalia Gimelshein and Adam Lerer part-time got involved in various ways.

We wrote the code for the new design of PyTorch from scratch.

The connection to JAX: inspiration of HIPS/autograd

Alex Wiltschko’s torch-autograd (which was a big inspiration for PyTorch’s design) was directly inspired by Matt Johnson Dougal Maclaurin David Duvenaud and Ryan Adams’s HIPS/autograd library, so in that indirect sense, we had strong inspiration from Ryan’s library. In fact, we were so oblivious to certain origins that we named our Autodiff engine torch.autograd because we thought it was the norm within the autodiff community to call things “autograd”. We later had to apologize to Matt Johnson and team about the name of our subpackage conflicting with their autograd package.

Later, Matt Johnson Dougal Maclaurin and others went on to create JAX, continuing down their design exploration of HIPS/autograd.

The inspiration from Chainer -> PyTorch and the inspiration for PyTorch -> Chainer v2

Chainer was a strong inspiration, we really liked the concept of Chains and stuff. The Chainer devs were friends of us, and we interacted with them a lot as well. I visited them in Japan in 2017.

Chainer’s design is in my opinion a revolutionary design – very original for that time and pretty awesome. We are proud to have been inspired from it.

However, unlike people commonly misunderstand and misattribute, we didn’t simply replicate Chainer’s design as-is. People have posted online on how PyTorch’s design looks exactly like Chainer’s and hence its origins are just copy-paste – and that’s because they don’t understand the co-evolution. After PyTorch’s release, Chainer evolved to include some of PyTorch’s good ideas, and eventually they converged to look the same. For example, Chainer’s nn Chains required you to pass in all the modules to the constructor (or use an add_link). The concept of self-assignment (i.e.) self.conv = nn.Conv2d(...), the concept of Parameter was something we introduced as an evolved upgrade from Chainer v1. We also innovatively changed the way the autodiff engine was implemented – things like “variable versioning” to detect correctness issues with inplace operations, and a few other new ideas, ideas that eventually went back into Chainer in their v2.

When Chainer’s community wanted to stop development, Preferred Networks amicably and proactively joined the PyTorch community (link in references).

Post-launch evolution (2017 to present)

This post doesn’t have the space to cover PyTorch’s:

Many other parts of PyTorch that I didn’t include – its become somewhat of a monolith at this point.

Attributing ideas is healthy, awesome and should be done more often

Since PyTorch has launched, several new libraries have used the designs and ideas from PyTorch – the particular new ideas that we introduced eventually propagated to many other libraries – and this is awesome. We are proud to have been inspired by work before us, and we are proud to have inspired work after us. We also take pride in always attributing our inspirations clearly – torch-autograd, chainer and many other projects that have inspired us in lesser ways.

I think people don’t do this enough, attribute their origins clearly – either ego or corporate controls come into play to erase history – and people should do more here. In that sense, I’m really proud of my JAX friends who see framework design as a scientific endeavor, openly discussing ideas and evolutions, and proudly attributing their origins and inspiration.

References:

  1. My reply in March’17 on the origins of PyTorch
  2. Chainer’s v1 design
  3. PyTorch adds new tools and libraries, welcomes Preferred Networks to its community
  4. PyTorch’s autodiff innovations in a short paper
  5. The PyTorch paper
  6. Alex Wiltschko’s torch-autograd
  7. HIPS/Autograd
  8. THNN refactors:
  1. Online chat where the THNN organization happened

Other posts

Previous: Decisions and Pivots