What binds together the loss function, the optimizer and the model in PyTorch?

−0

Your suspicion is right that PyTorch relies heavily on (hidden) global states.

In your specific example, loss.backward() computes (all) gradients and accumulates them directly in the grad attribute of each parameter. You can verify this by printing the grad attributes of the parameters before and after the backward call using the following list comprehension [par.grad for par in model.parameters()] (note that these can be big matrices and it might be useful to select only a few elements to reduce the amount of generated output).

Once the gradients have been stored in the grad attributes, the optimiser, which has access to (a subset of) the parameters by means of the references you pass upon construction, can use these gradients to perform the desired updates.

Also, note my emphasis on accumulate: the computed gradients are added to whatever was stored in the grad attribute previously. This is why it is so important to call optimizer.zero_grad() (or model.zero_grad()) before starting gradient computations.

I hope this gives you the desired insights in how this works.

PS: there is also an official PyTorch tutorial on the autograd system.

PPS: there is more global state in how the autograd works, but I think that would be something for a different question.

posted 7 days ago

CC BY-SA 4.0

mr Tsjolder‭

554 reputation 6 18 63 6

Copy Link

Raw

Markdown

History

Communities

What binds together the loss function, the optimizer and the model in PyTorch?

1 comment thread

1 answer

0 comment threads