Why do we rotate weights when computing the gradients in a convolution layer of a convolution network?

Today someone asked on Google+

Hello, when computing the gradients CNN, the weights need to be rotated, Why ?

question

I had the same question when I was pouring through code back in the day, so I wanted to clear it up for people once and for all.

Simple answer:

This is just a efficient and clean way of writing things for:

Computing the gradient of a valid 2D convolution w.r.t. the inputs.

There is no magic here

Here’s a detailed explanation with visualization!

Input question Kernel kernel Output output

section1

section2

section3

As you can see, the calculation for the first three elements in section 2 is the same as the first three figures in section 3.

Hope this helps!