Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index out of range in THTensorMath.c #8

Open
nyaong7 opened this issue Feb 4, 2016 · 4 comments
Open

index out of range in THTensorMath.c #8

nyaong7 opened this issue Feb 4, 2016 · 4 comments

Comments

@nyaong7
Copy link

nyaong7 commented Feb 4, 2016

Hello, I am testing word-rnn with word_level = 1 parameter.

The learning process starts successfully, but after a while I encountered the following error.
There are some other threads same as this issue, they said to use ASCII file format for the input for the resolution. (karpathy/char-rnn#51)
So I checked and tried my input file to be ASCII, but still I have an error.


./util/OneHot.lua:18: index out of range at
.../torch/pkg/torch/lib/TH/generic/THTensorMath.c:141
stack traceback:
[C]: in function 'index'
./util/OneHot.lua:18: in function 'func'
.../torch/install/share/lua/5.1/nngraph/gmodule.lua:275: in function 'neteval'
.../torch/install/share/lua/5.1/nngraph/gmodule.lua:310: in function 'forward'
train.lua:260: in function 'opfunc'
.../torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'optimizer'
train.lua:318: in main chunk
[C]: in function 'dofile'

...

How can I avoid this error? Can anybody help?

@nyaong7
Copy link
Author

nyaong7 commented Feb 4, 2016

I have changed the number of nodes in hidden layer from 500 to 300, then it seems working ok.
Seems strange since the error is caused by too many vocabularies.

Is there any other way to solve this keeping the number of nodes?

@larspars
Copy link
Owner

Apologies for not getting back to you sooner (I somehow stopped getting emails for issues).

Not sure what's happening here, but you could try to add "-glove 1" when using word_level, so it will use GloVe embeddings instead of a OneHot encoding. The non-GloVe word-level mode has hardly gotten any testing, unfortunately.

@larspars larspars reopened this Feb 28, 2016
@noisyneuron
Copy link

Same issue here... I tried with subsets of the dataset, ~3-4MB, with default network size and layers, and it worked.. but upon using a larger subset of ~15mb, consistently getting this error while trying different network sizes. Any suggestions?
Thanks :)

@marcociccone
Copy link

marcociccone commented Aug 11, 2016

I think that this is a memory error due to the huge size of the embedding lookup table or the one hot vector that you're using. If you use a threshold on the frequency of the words this will reduce the size of the vocabulary and you can use a bigger corpus as well.

addendum:
I correct myself, I think that the memory error is due to the size of the Linear layer at the end if the vocabulary is too big

addendum:2
I investigated deeper into this problem and I found out that is due to a wrong index input to the lookup table.. This is still related to the vocabulary size that might overflow since the indexes are declared as shortTensor or ByteTensor.. I suggest to declare the variable as Long at this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants