Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chess SAE trainer - tensor dim mismatch #2

Open
Ivan-Z opened this issue Aug 14, 2024 · 1 comment
Open

Chess SAE trainer - tensor dim mismatch #2

Ivan-Z opened this issue Aug 14, 2024 · 1 comment

Comments

@Ivan-Z
Copy link

Ivan-Z commented Aug 14, 2024

Trying to repro the chess SAE trainining:

python circuits/sae_training/chess_sae_trainer.py --save_dir=/tmp/sae_debug

After modifying this line to pass the meta.pkl from circuits/resources/meta.pkl
https://github.com/adamkarvonen/SAE_BoardGameEval/blob/master/circuits/sae_training/chess_sae_trainer.py#L65

I get:

/home/tmp/SAE_BoardGameEval/circuits/dictionary_learning/buffer.py", line 404, in refresh
    self.activations = t.cat([self.activations, hidden_states.to(self.device)], dim=0)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2048 but got size 512 for tensor number 1 in the list.
@adamkarvonen
Copy link
Owner

adamkarvonen commented Aug 14, 2024

Apologies for this! I fixed chess_sae_trainer.py. There was a mismatch between submodule_type (mlp, dim 2048) and submodule (resid_post, dim 512).

However, after further thought I believe train_saes_parallel.py is the better default training script. I have archived chess_sae_trainer.py and othello_sae_trainer.py, and updated the training README with instructions for using train_saes_parallel.py.

I have tested this script on both ChessGPT and OthelloGPT using a variety of SAE training types (TopK, P_Anneal, Gated, and Standard).

Please let me know if you have any other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants