index out of range in THTensorMath.c #8

nyaong7 · 2016-02-04T06:23:57Z

Hello, I am testing word-rnn with word_level = 1 parameter.

The learning process starts successfully, but after a while I encountered the following error.
There are some other threads same as this issue, they said to use ASCII file format for the input for the resolution. (karpathy/char-rnn#51)
So I checked and tried my input file to be ASCII, but still I have an error.

./util/OneHot.lua:18: index out of range at
.../torch/pkg/torch/lib/TH/generic/THTensorMath.c:141
stack traceback:
[C]: in function 'index'
./util/OneHot.lua:18: in function 'func'
.../torch/install/share/lua/5.1/nngraph/gmodule.lua:275: in function 'neteval'
.../torch/install/share/lua/5.1/nngraph/gmodule.lua:310: in function 'forward'
train.lua:260: in function 'opfunc'
.../torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'optimizer'
train.lua:318: in main chunk
[C]: in function 'dofile'

...

How can I avoid this error? Can anybody help?

nyaong7 · 2016-02-04T07:27:05Z

I have changed the number of nodes in hidden layer from 500 to 300, then it seems working ok.
Seems strange since the error is caused by too many vocabularies.

Is there any other way to solve this keeping the number of nodes?

larspars · 2016-02-28T12:44:20Z

Apologies for not getting back to you sooner (I somehow stopped getting emails for issues).

Not sure what's happening here, but you could try to add "-glove 1" when using word_level, so it will use GloVe embeddings instead of a OneHot encoding. The non-GloVe word-level mode has hardly gotten any testing, unfortunately.

noisyneuron · 2016-07-27T11:53:06Z

Same issue here... I tried with subsets of the dataset, ~3-4MB, with default network size and layers, and it worked.. but upon using a larger subset of ~15mb, consistently getting this error while trying different network sizes. Any suggestions?
Thanks :)

marcociccone · 2016-08-11T15:45:22Z

I think that this is a memory error due to the huge size of the embedding lookup table or the one hot vector that you're using. If you use a threshold on the frequency of the words this will reduce the size of the vocabulary and you can use a bigger corpus as well.

addendum:
I correct myself, I think that the memory error is due to the size of the Linear layer at the end if the vocabulary is too big

addendum:2
I investigated deeper into this problem and I found out that is due to a wrong index input to the lookup table.. This is still related to the vocabulary size that might overflow since the indexes are declared as shortTensor or ByteTensor.. I suggest to declare the variable as Long at this line.

larspars closed this as completed Feb 28, 2016

larspars reopened this Feb 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index out of range in THTensorMath.c #8

index out of range in THTensorMath.c #8

nyaong7 commented Feb 4, 2016

nyaong7 commented Feb 4, 2016

larspars commented Feb 28, 2016

noisyneuron commented Jul 27, 2016

marcociccone commented Aug 11, 2016 •

edited

Loading

index out of range in THTensorMath.c #8

index out of range in THTensorMath.c #8

Comments

nyaong7 commented Feb 4, 2016

...

nyaong7 commented Feb 4, 2016

larspars commented Feb 28, 2016

noisyneuron commented Jul 27, 2016

marcociccone commented Aug 11, 2016 • edited Loading

marcociccone commented Aug 11, 2016 •

edited

Loading