Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrolled recurrent layers (RNN, LSTM) #2033

Closed
wants to merge 11 commits into from

Conversation

jeffdonahue
Copy link
Contributor

(Replaces #1873)

Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

cd data/coco
./get_coco_aux.sh # download train/val/test splits
./download_tools.sh # download official COCO tool
cd tools
python setup.py install # follow instructions to install tools and download COCO data if needed
cd ../../.. # back to caffe root
./examples/coco_caption/coco_to_hdf5_data.py

Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN for captioning using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

@cvondrick
Copy link

Firstly, thanks for the fantastic code. I had been playing with my own LSTM, and found this PR, and it is above and beyond any of my own attempts. Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the ReshapeLayer will produce all zeros instead of actually copying the data. I've created a minimal test case that shows this failure for this PR:

# Load a random dataset 
layer {
  name: "ToyData_1"
  type: "DummyData"
  top: "ToyData_1"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "gaussian"
      std: 1
    }
  }
}

layer {
  name: "ZeroData"
  type: "DummyData"
  top: "ZeroData"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "constant"
      value: 0
    }
  }
}


# Reshape ToyData_1 to be the same size
layer {
  name: "Reshape"
  type: "Reshape"
  bottom: "ToyData_1"
  top: "ToyData_2"
  reshape_param {
    shape {
      dim: 101
      dim: 7
      dim: 3
    }
  }
}





# Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ToyData_2"
  top: "ToyData_1_vs_2_Difference"
  type: "EuclideanLoss"
}

# We expect this loss to be non-zero, and it is non-zero. 
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ZeroData"
  top: "ToyData_1_vs_Zero_Difference"
  type: "EuclideanLoss"
}

# Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_2"
  bottom: "ZeroData"
  top: "ToyData_2_vs_Zero_Difference"
  type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the exact same size (identity) to create ToyData_2. We would expect that || ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero. Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note that, as expected, the loss between ToyData_1 and all zeros is non-zero.

It seems there is a bug with reshape. I've fixed it here by copying an older version of Reshape into this branch: https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But, hope this bug reports helps.

The same issue occurs in #2088

Carl

@jeffdonahue
Copy link
Contributor Author

Well that's disturbing... I don't have time to look into it now but thanks
a lot for reporting Carl! Will follow up when I've figured something out.
On Mar 20, 2015 10:41 AM, "Carl Vondrick" [email protected] wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

@jeffdonahue
Copy link
Contributor Author

Oops, failed to read to the end and see that you already had a fix. Thanks
for posting the fix! (I think the current version of my reshapelayer PR
may do what your fix does, in which case I'll just rebase this onto that PR
as I should anyway.)
On Mar 20, 2015 10:41 AM, "Carl Vondrick" [email protected] wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

@cvondrick
Copy link

Thanks Jeff -- yeah, we fixed it by copying a ReshapeLayer from somewhere else. Unfortunately, we have lost track of exactly where that layer came from, but I'm sure somebody here (maybe even you) wrote it at some point.

@hf
Copy link

hf commented Mar 24, 2015

When is this feature going to be ready? Is there something to be done?

@thuyen
Copy link

thuyen commented Mar 24, 2015

For the captioning model, can anyone show me how to generate captions after the training is done? Current LSTM layers process the whole input sequence (20 words in the coco example) across time, but we need to generate one by one at each time step (current time step is the input to the next).

@vadimkantorov
Copy link

I've just tried to run train_lcrn.sh (after running coco_to_hdf5_data.py and other scripts) and I get a "dimensions don't match" error:

F0324 16:37:24.435840 20612 eltwise_layer.cpp:51] Check failed: bottom[i]->shape() == bottom[0]->shape()

The stack-trace and log are here: http://pastebin.com/fWUxsSmv

I've uncommented line 471 in net.cpp to find the faulty layer (the only modification). It seems it happens in lstm2 which blends input from the language model and from the image CNN.

train_language_model.sh runs fine without errors.

Ideas?

@ih4cku
Copy link
Contributor

ih4cku commented Mar 24, 2015

By the way, does Caffe's recurrent layer support bi-directional RNN?

@vadimkantorov
Copy link

Both factored and unfactored setups are concerned. Seems there are some dimensions problems while blending CNN input with embedded text input.

@aksarben09
Copy link

I have the same question as @thuyen. My understanding is that the current unrolled architecture slices an input sentence and feed the resulting words to each time step at once. So, for both train and test nets, the ground truth sentences are fed to the unrolled net. However, for captioning an image, there is no sentence to give to the net. But I don't think it is correct to give the start symbol to each layer. Did I miss anything?

@aksarben09
Copy link

The dimension check fails for the static input (the image feature) with size 100_4000 vs 1_100*4000. It seems to be caused by Reshape layer; @cvondrick 's fix seems to solve this.

@jeffdonahue
Copy link
Contributor Author

Yes, as noted by @cvondrick, this works with the older version of the ReshapeLayer which puts everything in Reshape (as opposed to the newer one that uses LayerSetUp -- see discussion with @longjon in #2088). I don't yet have any idea why the Reshape version would work but the LayerSetUp version wouldn't, but I've just force pushed a new version of this branch that uses the previous ReshapeLayer version, and confirmed that both example scripts (train_lrcn.sh & train_language_model.sh) run. Sorry for breaking the LRCN one.

@jeffdonahue
Copy link
Contributor Author

By the way, does Caffe's recurrent layer support bi-directional RNN?

You can create a bi-directional RNN using 2 RNN layers and feeding one the input in forward order and the other the input in backward order, and fusing their per-timestep outputs however you like (e.g. eltwise sum or concatenation).

@vadimkantorov
Copy link

Thanks @jeffdonahue , training lrcn now works! Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

@jeffdonahue jeffdonahue force-pushed the recurrent branch 2 times, most recently from d3ebf3e to 80e9c41 Compare March 26, 2015 21:13
@read-mind
Copy link

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

@jeffdonahue
Copy link
Contributor Author

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

True; one would need to implement an additional layer to do the reversal. You'd also need to be careful to ensure that your instances do not cross batch boundaries (as is allowed by my implementation as it works fine for unidirectional architectures) since inference at each timestep is dependent on all other timesteps in a bidirectional RNN.

Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

In the not-too-distant future I'll add code for evaluation, including using the model's own predictions as input in future timesteps as you mention.

@jeffdonahue
Copy link
Contributor Author

I've also gotten a number of questions on the optional third input to RecurrentLayer -- I've added some clarification in the original post:

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

@liqing-ustc
Copy link

Thanks for the fantastic code. But the code of the Reshape function in Recurrent layer makes me confused. when passing data from "output_blobs_" to "top blobs", why it is

    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);

rather than

    top[i]->ShareData(*output_blobs_[i]);
    top[i]->ShareDiff(*output_blobs_[i]);

it seems that the top blobs is just reshaped and empty.

the original code is here:

template <typename Dtype>
void RecurrentLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(top.size(), output_blobs_.size());
  for (int i = 0; i < top.size(); ++i) {
    top[i]->ReshapeLike(*output_blobs_[i]);
    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);
  }
  x_input_blob_->ShareData(*bottom[0]);
  x_input_blob_->ShareDiff(*bottom[0]);
  cont_input_blob_->ShareData(*bottom[1]);
  if (static_input_) {
    x_static_input_blob_->ShareData(*bottom[2]);
    x_static_input_blob_->ShareDiff(*bottom[2]);
  }
}

@jeffdonahue
Copy link
Contributor Author

Ah, I didn't know the HDF5OutputLayer worked that way, I see... sounds a little scary, but might work... good luck!

@fl2o
Copy link

fl2o commented Mar 15, 2016

@shaibagon Thanks for the hightlight but I struggle to see how to handle signals with different lengths (ie timestep) for the training process using NetSpec? I can't change my unrolled net architecture during training...
Should I use a very long LSTM and stop the forward pass after I have reached the end of the signal being processed then start the backward pass?

@shaibagon
Copy link
Member

@fl2o AFAIK, if you want exact backprop for recurrent nets in caffe, there's no way around explicitly unrolling the net across ALL time steps.
However, if you use @jeffdonahue 's recurrent branch, you will be able to achieve exact forward estimation, and backprop exact to the limit of the temporal batch size. This can alleviate the need to explicitly unroll very long temporal nets.

Regarding working with very long sequences:

  1. You may define maxT and explicitly unroll your net to span maxT time steps, padding shorter sequences with some "null" data/label.
  2. Since caffe uses SGD for training, it is better to have more than one sequence participating in a forward-backward pass (i.e., mini-batch of size > 1). Otherwise gradient estimation will be very noisy.

Can you afford all these bolbs in memory at once?

@shaibagon
Copy link
Member

@jeffdonahue BTW, is there a reason why this PR is not merged into master?

@fl2o
Copy link

fl2o commented Mar 15, 2016

@shaibagon I am gonna try padding shorter sequences with some "null" data/label (Should I use a special term or just 0 ?) in order to avoid the gradient estimation problem, but I am not sure yet about the memory issue..! (maxT will be around 400! while minT ~50)

@shaibagon
Copy link
Member

@fl2o I'm not certain just using 0 is enough. You want no gradients to be computed from these padded time steps. You might need to have an "ignore_label" and implement your loss layer to support "ignore_label".
Make sure no gradients from the padded time steps are propagated into the "real" time steps

@fl2o
Copy link

fl2o commented Mar 15, 2016

That's what I was wondering ....
Wonder if it's not "easier" to use this PR directly ^^
Gonna figure it out! Thank you @shaibagon

@shaibagon
Copy link
Member

@fl2o in the future, I think it would be best to keep this github issue thread for PR related comments only. For more general inquires and questions about LSTM in Caffe, it might be better to ask a question in stackoverflow.

@chriss2401
Copy link

@shaibagon Cheers for all the helpful comments.

@jeffdonahue jeffdonahue mentioned this pull request Apr 5, 2016
@lood339
Copy link

lood339 commented Apr 11, 2016

Hi, I used the LRCN code to generate captions form an image. I replace the alexNet with google net. The result likes this:
"A brown cat sitting top top top top ...."
The sentence repeats the word "top" a lot. Is there any reasons?
I also tried other modifications. It seams the LSTM is very sensitive to the learning parameters. Is this conclusion right in general?
Thanks.

print ('Exhausted all data; cutting off batch at timestep %d ' +
'with %d streams completed') % (t, num_completed_streams)
for name in self.substream_names:
batch[name] = batch[name][:t, :]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

words at timestep t might not be deleted:
batch[name] = batch[name][:(t+1), :]

@liminchen
Copy link

Could anyone tell me what's the difference between C_diff and C_term_diff in the backward_cpu function? I'm trying to understand the code and write a GRU version. Thanks in advance!
`template
void GRUUnitLayer::Backward_cpu(const vector<Blob>& top,
const vector& propagate_down, const vector<Blob
>& bottom) {
CHECK(!propagate_down[2]) << "Cannot backpropagate to sequence indicators.";
if (!propagate_down[0] && !propagate_down[1]) { return; }

const int num = bottom[0]->shape(1);
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* flush = bottom[2]->cpu_data();
const Dtype* C = top[0]->cpu_data();
const Dtype* H = top[1]->cpu_data();
const Dtype* C_diff = top[0]->cpu_diff();
const Dtype* H_diff = top[1]->cpu_diff();
Dtype* C_prev_diff = bottom[0]->mutable_cpu_diff();
Dtype* X_diff = bottom[1]->mutable_cpu_diff();
for (int n = 0; n < num; ++n) {
for (int d = 0; d < hidden_dim_; ++d) {
const Dtype i = sigmoid(X[d]);
const Dtype f = (flush == 0) ? 0 :
(flush * sigmoid(X[1 * hidden_dim + d]));
const Dtype o = sigmoid(X[2 * hidden_dim
+ d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = C[d];
const Dtype tanh_c = tanh(c);
Dtype* c_prev_diff = C_prev_diff + d;
Dtype* i_diff = X_diff + d;
Dtype* f_diff = X_diff + 1 * hidden_dim_ + d;
Dtype* o_diff = X_diff + 2 * hidden_dim_ + d;
Dtype* g_diff = X_diff + 3 * hidden_dim_ + d;
const Dtype c_term_diff =
C_diff[d] + H_diff[d] * o * (1 - tanh_c * tanh_c);
*c_prev_diff = c_term_diff * f;
*i_diff = c_term_diff * g * i * (1 - i);
*f_diff = c_term_diff * c_prev * f * (1 - f);
*o_diff = H_diff[d] * tanh_c * o * (1 - o);
*g_diff = c_term_diff * i * (1 - g * g);
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
C_diff += hidden_dim_;
H_diff += hidden_dim_;
X_diff += x_dim;
C_prev_diff += hidden_dim_;
++flush;
}
}`

@maydaygmail
Copy link

@jeffdonahue captioner.py for generating sentence, to generate the current word, captioner.py only use the previous one word not all the previous words?

@anguyen8
Copy link

anguyen8 commented May 3, 2016

Does any know if there is a pre-trained image captioning LRCN model out there? I'd greatly appreciate if this is included in the Model Zoo.

@jeffdonahue : would you be able to release the model from your CVPR'15 paper?

@anteagle
Copy link

Has this branch been landed to the master ? The layers are in the master, but it seems the examples are not there. Could anyone point to me to the right way to get this branch ? I did git pull #2033, but just showed Already up-to-date.

@shaibagon
Copy link
Member

@anteagle it seems like the PR only contained the LSTM RNN layers and not the examples (too much to review). You'll have to go to Jeff Donahue's "recurrent" branch.

@anteagle
Copy link

@shaibagon thanks, I got from Jeff's repo, though it has not been updated for a while.

@jeffdonahue
Copy link
Contributor Author

Closing with the merge of #3948 -- though this PR still contains examples that PR lacked, and I should eventually restore and rebase those on the now merged version. In the meantime I'll keep my recurrent branch (and other mentioned branches) open and in their current form for reference.

@yangzhikai
Copy link

hello,I have a question .When I read the file 'lstm_layer.cpp',I find a lot of 'add_top','add_bottom','add_dim',but I can't find the definition of them in caffe folder.Could you tell me where can I them and whats the meaning of the code such as 'add_bottom("c_" + tm1s);'.

@jeffdonahue
Copy link
Contributor Author

jeffdonahue commented May 10, 2017

The methods you refer to are all automatically generated by protobuf. See caffe.proto for the declarations of top, bottom, etc., which result in the protobuf compiler automatically generating the add_top, add_bottom methods. (The resulting C definitions are in the protobuf-generated header file caffe.pb.h.)

@yangzhikai
Copy link

oh , Thank you very much. I have not find this file(caffe.pb.h) because I haven't complied it before!

@soulslicer
Copy link

Hi, is there any working example of the layer in caffe?

@cuixing158
Copy link

The same question, is there any working example of the layer in caffe?

@jianjieluo
Copy link

@cuixing158 @soulslicer jeffdonahue's example for coco image caption task.

#2033 (comment)

Go for his caffe branch and you will find the example .prototxt files and others which may be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.