RNNs redesign #2500

CarloLucibello · 2024-10-14T21:52:31Z

A complete rework of our recurrent layers, making them more similar to their pytorch counterpart.
This is in line with the proposal in #1365 and should allow to hook into the cuDNN machinery (future PR).
Hopefully, this ends the infinite source of troubles that the recurrent layers have been.

Recur is no more. Mutating its internal state was a source of problems for AD (explicit differentiation for RNN gives wrong results #2185)
Now RNNCell is exported and takes care of the minimal recursion step, i.e. a single time:
- has forward cell(x , h)
- x can be of size in or in x batch_size
- h can be of size out or out x batch_size
- returns hnew of size out or out x batch_size
RNN instead takes in a (batched) sequence and a (batched) hidden state and returns the hidden state for the whole sequence:
- has forward rnn(x, h)
- x can be of size in x len or in x len x batch_size
- h can be of size out or out x batch_size
- returns hnew of size out x len or out x len x batch_size
LSTM and GRU are similarly changed.

Close #2185, close #2341, close #2258, close #1547, close #807, close #1329

Related to #1678

PR Checklist

darsnack · 2024-10-18T13:42:25Z

Fully agree with updating the design to be non-mutating. There are two options we've discussed in the past:

y, h = cell(x, h) like here (I guess this PR removes y as a return value which is fine)
y, cell = cell(x) / y, cell = Flux.apply(cell, x)

Option 1 is outlined in this PR so I won't say anything about it.

Option 2 is a more drastic redesign to make all layers (not just recurrent) non-mutating. Why?

Do a design that covers stateful layers in general (e.g. norm layers) and not just recurrent cells
Keep a nice feature of Flux's current design which is that the model contains all info: parameters, state, flags, etc.

CarloLucibello · 2024-10-20T17:34:49Z

I thought about Option 2. On the upside, it seems a nice intermediate spot between current Flux and Lux. The downside is that the interface would seem a bit exotic to flux and pytorch users. Moreover, it would be problematic for normalization layers.

Also, we need to distinguish between normalization layers and recurrent layers.

Normalization layer at training time update some internal buffers, within a stopgrad barrier. The buffer update has no influence on the output of the layer and the final loss. You typically apply the layer only once during the forward pass. Normalization layers are typically part of larger models (chains or custom structs). Therefore for normalization layers: 1) we haven't had the gradient computation problems we had for recurrent layers; 2) you want the layer with the updated buffer to be inserted back in your model, but this would require a mutating operation or returning a new model.
For recurrent layers, Option 2 would be sensible, but is it worth it? Once you adopt the perspective that a cell takes two inputs, x and h, and gives back an output, hnew, all problems disappear. I think we add complexity for no gain in trying to keep the state internal.

ToucheSir · 2024-10-20T21:53:21Z

The main benefit for keeping the state "internal" or having it be part of a unified interface like apply would be that Chain works with RNNs again. Whether that's worth the extra complexity is the question. Given our priorities, I think it's best left as future work.

CarloLucibello added breaking RNN labels Oct 14, 2024

CarloLucibello added this to the v0.15 milestone Oct 14, 2024

CarloLucibello changed the title ~~Cl/rnn~~ RNN redesign Oct 14, 2024

MartinuzziFrancesco mentioned this pull request Oct 16, 2024

New RNN design in Flux MartinuzziFrancesco/RecurrentLayers.jl#1

Open

CarloLucibello force-pushed the cl/rnn branch from 8abc593 to aeb421b Compare October 17, 2024 10:06

CarloLucibello changed the title ~~RNN redesign~~ RNNs redesign Oct 17, 2024

CarloLucibello added 9 commits October 20, 2024 19:52

add tests

42e8919

finish RNNCell

1f35a7a

RNN rework

e4e4093

LSTMCell

3b746fc

LSTM

8c88a28

more work

466261c

gru

0e00f1f

extended testing

3737eab

runtests

834bed3

CarloLucibello force-pushed the cl/rnn branch from 6f35f2d to 834bed3 Compare October 20, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNNs redesign #2500

RNNs redesign #2500

CarloLucibello commented Oct 14, 2024 •

edited

Loading

darsnack commented Oct 18, 2024 •

edited

Loading

CarloLucibello commented Oct 20, 2024

ToucheSir commented Oct 20, 2024

RNNs redesign #2500

Are you sure you want to change the base?

RNNs redesign #2500

Conversation

CarloLucibello commented Oct 14, 2024 • edited Loading

PR Checklist

darsnack commented Oct 18, 2024 • edited Loading

CarloLucibello commented Oct 20, 2024

ToucheSir commented Oct 20, 2024

CarloLucibello commented Oct 14, 2024 •

edited

Loading

darsnack commented Oct 18, 2024 •

edited

Loading