The groundwork for PyTorch originally started in early 2016, online, among a band of Torch7’s contributors.
These days, we also commonly refer to Torch7 as LuaTorch, as it was used via Lua. Torch7 was written by Ronan Collobert, Clement Farabet and Koray Kavuckuoglu in ~2010. I was deeply involved in Torch7 since 2012, with official “maintainer” status, joining these three original authors in April 2014.
Refactoring LuaTorch to be language agonstic (late 2015 to mid 2016)
LuaTorch’s C backend with all the CPU and CUDA code for Linear Algebra and Neural Networks was deeply intertwined with Lua. So, a bunch of us lead by Luca Antiga Andreas Köpf Sergey Zagoruyko me Adam Paszke Francisco Massa refactored these backends to be agnostic of Lua, and usable independently. We did this after discussing online that we should move LuaTorch to a new, modern design, but hadn’t quite framed what that design should be.
Writing a new Python based Torch (mid 2016)
Adam Paszke reached out to me early 2016 looking for internships. At that time, the entire LuaTorch team at FAIR was ~3 people (Gregory Chanan Trevor Killeen and me). I asked Adam to come do an internship to build the next version of LuaTorch, with modern design. Sam Gross was in-between projects, so he joined in full-time as well.
We started from a fork of the LuaTorch, LuaTorch-nn codebases specifically for two things:
- the TH/THC and THNN/THCUNN C backends
- Building a compatibility with LuaTorch’s checkpoints, so that LuaTorch users could smoothly continue into PyTorch. We did this by transpiling LuaTorch’s
nncode to Python. We called this package in PyTorch
Then, coming to the design itself, we debated a lot of designs. The strong inspirations were:
- torch-autograd (written by Alex Wiltschko and Clement Farabet)
- Chainer (written by the team at Preferred Networks).
Zeming Lin who loved Chainer would obsessively tell us its the best thing ever, so he came on board to build this together with us. Quite a few others such as Natalia Gimelshein and Adam Lerer part-time got involved in various ways.
We wrote the code for the new design of PyTorch from scratch.
The connection to JAX: inspiration of HIPS/autograd
Alex Wiltschko’s torch-autograd (which was a big inspiration for PyTorch’s design) was directly inspired by Matt Johnson Dougal Maclaurin David Duvenaud and Ryan Adams’s HIPS/autograd library, so in that indirect sense, we had strong inspiration from Ryan’s library. In fact, we were so oblivious to certain origins that we named our Autodiff engine
torch.autograd because we thought it was the norm within the autodiff community to call things “autograd”. We later had to apologize to Matt Johnson and team about the name of our subpackage conflicting with their
The inspiration from Chainer -> PyTorch and the inspiration for PyTorch -> Chainer v2
Chainer was a strong inspiration, we really liked the concept of Chains and stuff. The Chainer devs were friends of us, and we interacted with them a lot as well. I visited them in Japan in 2017.
Chainer’s design is in my opinion a revolutionary design – very original for that time and pretty awesome. We are proud to have been inspired from it.
However, unlike people commonly misunderstand and misattribute, we didn’t simply replicate Chainer’s design as-is. People have posted online on how PyTorch’s design looks exactly like Chainer’s and hence its origins are just copy-paste – and that’s because they don’t understand the co-evolution. After PyTorch’s release, Chainer evolved to include some of PyTorch’s good ideas, and eventually they converged to look the same. For example, Chainer’s nn Chains required you to pass in all the modules to the constructor (or use an add_link). The concept of self-assignment (i.e.)
self.conv = nn.Conv2d(...), the concept of
Parameter was something we introduced as an evolved upgrade from Chainer v1. We also innovatively changed the way the autodiff engine was implemented – things like “variable versioning” to detect correctness issues with inplace operations, and a few other new ideas, ideas that eventually went back into Chainer in their v2.
When Chainer’s community wanted to stop development, Preferred Networks amicably and proactively joined the PyTorch community (link in references).
Post-launch evolution (2017 to present)
This post doesn’t have the space to cover PyTorch’s:
- evolution to add in ideas from Caffe2 (Yangqing Jia Dmytro Dzhulgakov et. al)
- its 5 compiler designs before we landed on what seems great (Zach DeVito, Edward Yang Adam Paszke James Reed Jason Ansel Christian Sarofeen et. al.)
- our inspirations from JAX and designing functorch (Richard Zou, Horace He Victor Fomin Animesh Jain)
- our entire distributed design and evolution
- the origins of the sparse package (Martin Raison) and its evolution (Christian Puhrsch et. al.)
- PyTorch’s domain libraries
- data loading (Sam Gross Tongzhou Wang)
- community design, community growth, innovation in design of incentives (Piotr Balecki Alban Desmaison, me)
- Several innovations in GPU code (several key folks from NVIDIA and Meta)
Many other parts of PyTorch that I didn’t include – its become somewhat of a monolith at this point.
Attributing ideas is healthy, awesome and should be done more often
Since PyTorch has launched, several new libraries have used the designs and ideas from PyTorch – the particular new ideas that we introduced eventually propagated to many other libraries – and this is awesome. We are proud to have been inspired by work before us, and we are proud to have inspired work after us. We also take pride in always attributing our inspirations clearly – torch-autograd, chainer and many other projects that have inspired us in lesser ways.
I think people don’t do this enough, attribute their origins clearly – either ego or corporate controls come into play to erase history – and people should do more here. In that sense, I’m really proud of my JAX friends who see framework design as a scientific endeavor, openly discussing ideas and evolutions, and proudly attributing their origins and inspiration.
- My reply in March’17 on the origins of PyTorch
- Chainer’s v1 design
- PyTorch adds new tools and libraries, welcomes Preferred Networks to its community
- PyTorch’s autodiff innovations in a short paper
- The PyTorch paper
- Alex Wiltschko’s torch-autograd
- THNN refactors: