-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2 #99
base: master
Are you sure you want to change the base?
Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2 #99
Conversation
Why is this needed? We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid". What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing? |
Yes, there are so many things that have to be disabled (disable optimzer, disable backward, no valid job instantiation, everything in .run() basically, ...) just to run it as a new instance of TrainingJob. I found it to be more sensible to call run_epoch from the valid_job, that has access to its parent_job anyhow. It works very nicely and, of couse, I tested it that it gives the same results as before for all TrainingJobs. |
I see & sounds good. Can you very briefly outline the approach/changes? That simplifies the code review. |
I do think that we should create a new instance of the training job from the valid job though (so that it can have a different config). That's cleaner and more future-proof than changing code in the training jobs. |
This would also allow its use from an arbitrary eval job (not necessarily valid) |
I agree. However, I started with an approach where the TrainingJob was acting depending on a flag in but that made it so ugly and hard to read with "if"s everywhere. So I ended up with the current solution. I think about if there is another way, because all we care about is run_epoch(). In the current approach it has the signature
|
That's fine, but you can still create a new job during valid. Then just call |
Should the training job just produce a seperate trace entry and echo it as well? I think its nicer when we retrieve the trace result and do
WDYT? |
You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed. |
Ok. Should the subtraining job echo its trace as well? |
My current plan is: create which are branched to in
which are branched into from WDYT |
Is this all needed? Why not just stick with To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed. |
Ah, did you propose to create the TrainJob but not call run() on it, but only run_epoch? Violates the Job-API a bit, but sure that works.
What would be the advantage of that? |
Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that). For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter. I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create. |
Alright makes sense and should be easy to do.T
Rainer Gemulla <[email protected]> schrieb am Mi., 20. Mai 2020,
14:41:
… Advantage: (i) that additional job has our Job API and gathers all the
"wrapper code" for reuse in other places. E.g., (ii) it could also be run
directly from the config (if we wanted that) and (iii) it can be easily
used as parts of other jobs (again, if we wanted that).
For now, key advantage would be (i), not sure if (ii)+(iii) will ever
matter.
I am also OK with just using run_epoch for now. It's just that the wrapper
job is cleaner, easier to maintain, and probably little additional effort
to create.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#99 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFMYSKYFJPUJHW3RKXHWCMDRSPFYPANCNFSM4NDCXCEQ>
.
|
I wonder if eval.type should not take a list of eval types, runs all of them, and collects their metrics into one trace. |
But this runs multiple eval jobs, right? This could be implemented by having an "aggregate eval" job, which does this. Let's keep this for the future. |
Can you review again? |
I adressed your reviews. I ended up doing a small revision of EvaluationJob: EvaluationJob.run() is implemented now in EvaluationJob which does standard stuff that has to be done for every EvaluationJob and then calls self._run() (suggestions for better name for self.._run() ?). I shifted EntityRanking specific stuff from EvaluationJob to there. I added self.verbose to EvaluationJob and use it. I made EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook. This can be made nicer when we create AggregatingEvalJob that runs and aggregates a list of EvaluationJobs. Then the post_epoch_trace_hook can be removed from EvaluationJob and can be replaced by the entry "eval.type: entity_ranking, training_loss" in config-default-yaml. See #102 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, only 2-3 minor suggestions left.
kge/job/train.py
Outdated
self.is_prepared = False | ||
self.model.train() | ||
|
||
if config.get("job.type") == "train": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to disable validation with an optional constructor argument instead of this more hacky approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only as constructor or as a config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the config, we have it already. That being said, forget the constructor argument and simply set "valid.every" to 0 in the TrainingLossEvacluatinJob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, set train.checkpoint.every to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I just realized that I was still initializing optimizer (and loss). Optimizer creates new parameters based on model.parameters(). In this context querying job.type makes more sense. See new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are assuming that the outside eval job does not have job.type=train (it could have). Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are assuming that the outside eval job does not have job.type=train (it could have).
Huh? Isn't it one config == one job? "job.type" is either eval, train, search ..... When I pass trough the eval's config, then its job.type is "eval". I must misunderstand what you are saying here?
Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).
Ok can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if the job is created via EvaluationJob.create(...). An explicit argument makes this clearer
Please also add a CHANGELOG entry before you merge this PR. |
kge/job/eval.py
Outdated
@@ -175,13 +175,14 @@ def __init__(self, config: Config, dataset: Dataset, parent_job, model): | |||
|
|||
train_job_on_eval_split_config = config.clone() | |||
train_job_on_eval_split_config.set("train.split", self.eval_split) | |||
train_job_on_eval_split_config.set("negative_sampling.filtering.split", self.config.get("train.split")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only do this if unset.
kge/job/train.py
Outdated
self.is_prepared = False | ||
self.model.train() | ||
|
||
if config.get("job.type") == "train": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are assuming that the outside eval job does not have job.type=train (it could have). Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).
positives = ( | ||
index.get(pair).numpy() | ||
if pair in index | ||
else torch.IntTensor([]).numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tensor should contain the positive and not be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not create a numpy array right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still open.
positives_index[pair] = ( | ||
index.get(pair).numpy() | ||
if pair in index | ||
else torch.IntTensor([]).numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here
Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data? |
True, for eval only KvsAll should use Training + eval data. |
For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading. I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much. |
Actually also for KvsAll its clear, its the same as for all losses. It should only be the labels from eval data, i.e. for unseen labels. The only particular thing with NS is, that the current implementation does the filtering of positive triples based on what it is given as |
Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval. |
Yes but at the moment it is automagically the split defined as Lines 33 to 34 in f2bcbdb
This is why one has to set it explicitly, and then the current implementation of NS with filtering didn't cover this case. |
So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported. |
When we were talking about train+eval, this is what I meant, prefixes from valid, but labels from both. I understand what you mean that the loss on train and the loss on valid are not comparable in that case. With "it is clear" I meant that the only-on-valid KvsAll loss would still work to measure overfitting. Would be nicer if it would be a comparable loss, though. |
So should we? |
If we allow filtering with train+valid, we'd need to change the interface. Not sure if we want to go there; it significantly complicates the code if we allow for multiple splits during training. An alternative may be to create new splits by merging existing splits. That's wasting memory, but at least it's less intrusive. The easiest way to do this is (1) extend the dataset.yaml with support for multple files per split, (2) add a train_valid and a train_valid_test split with the appropriate files. That should be a separate PR. For now, why not simply use the valid split and issue a warning or an error when KvsAll is used like that? |
Also, using KvsAll with a different "query split" and "label split" is highly confusing. Option 1: TrainingLossEvaluationJob solely uses the validation split and ignores all other splits (also for negative sampling etc). This way, it's clear what it does and we can issue a warning saying this. How useful such a loss is, is another question. Option 2: we do not add this method of evaluation. For comparabiliy with training data, we can proceed the other way around: run the standard eval job on a sample of the training data (this sampling can be done externally to create a new train_sample split). |
I tend to option 2. |
I'm for option 3: do it and do it correctly. Now we're almost there and understand the implications. Would be a waste, now that we scrutinized it. I also like that this is tightly coupled to the filtering of entity_ranking. This would mean that EvaluationJob and TrainingJob both have a query_split and a label_splits attribute. label_splits would replace filter_splits in EvaluationJob. It stays an automatically set option and is not a configurable option. We can also call it "known answers". For the TrainingJob when not in eval mode -> nothing changes. query split == label split. In eval mode:
|
That would also be a suitable option. The main drawback is that it's more work. I suggest a slightly revised approach: For 1vsAll: do nothing For NS: rename negative_sampling.filtering.split to negative_sampling.filtering.splits. Automatically add train.split, but additionally use the splits mentioned here (along the lines we already do it for eval.filter_splits). For KvsAll: You are right in that the label_splits do not need to be merged apriori (which would mean that we need to copy all data). This can be done in the respective collate() function during data loading. If label_splits is unset, train.split should be used. If set, these splits + train.split should be used (again, along the lines we already do it for eval.filter_splits). |
Can you review 66a9bc1 or is there anything else? |
I cannot review this PR as such. Instead of rebasing it on master, it contains tons of master commits which make it hard to review. Can you rebase on master so that just the changes are here? |
c2ab672
to
083b27c
Compare
Add EvalTrainingLossJob
…onJob with standard stuff that has to be done for every EvaluationJob and then calls self._run() Shift EntityRanking specific stuff to there Make EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook Add self.verbose to EvaluationJob and use it Adress code review
eval.py: set negativ sampling filtering split to train.split rename to TrainingLossEvaluationJob train.py: initialize only things for train if TrainingJob is used for training more ifs -
Now? |
Even worse, you destroyed it. Do not touch, I force push in a minute. |
eval.py: set negativ sampling filtering split to train.split rename to TrainingLossEvaluationJob train.py: initialize only things for train if TrainingJob is used for training more ifs
eval.py: set splits train.py: use self.is_forward_only use dataset.index to load label_index for KvsAll sampler.py: rename filtering splits use dataset.index to load label_index for filtering splits config-default.yaml: add label_splits config for KvsAll indexing.py + misc.py: add index and helper funcs to merge KvsAll dicts
083b27c
to
cd3bd49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased (hopefully correctly) and added some review comments.
@@ -327,7 +331,8 @@ eval: | |||
# mean_reciprocal_rank_filtered_with_test. | |||
filter_with_test: True | |||
|
|||
# Type of evaluation (entity_ranking only at the moment) | |||
# Type of evaluation (entity_ranking or training_loss). Currently, | |||
# entity_ranking runs training_loss as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for disabling the automatic running of training_loss. Makes evaluation too costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation and test data are a lot smaller (1/10 is the worst valid/train ratio in our current datasets).
index_name = f"{split_combi_str}_{key}_to_{value}" | ||
indexes = [dataset.index(f"{_split}_{key}_to_{value}") for _split in split] | ||
dataset._indexes[index_name] = merge_dicts_of_1dim_torch_tensors(indexes) | ||
return dataset._indexes[index_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of merging indexes (very expensive memory wise), the code should support using multiple indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NS I see that it works, for KvsAll I have to merge them. But granted, not neccessarily as an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant here: the correspoding training jobs should support this. Alternatively, one may write a wrapper index that takes muliple indexes and merges there results on demand given a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For KvsAll, you can also just add the labels of each label split sequentially in the collate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow using label_splits other than for forward only. Otherwise they leak valid/test data. Likewise for negative sampling, I guess.
index_name = f"{split_combi_str}_{key}_to_{value}" | ||
dataset.index_functions[index_name] = IndexWrapper( | ||
merge_KvsAll_indexes, split=split_combi, key=key | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? They are not invoked.
As a sidenote: This was neccasary because there is currently no way to determine the cross-validation semantics of a dataset file (train,valid,test). We should (in another PR) add a xvaltype attribute to the dataset files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also not needed. Let's keep the code clean and not add "orphans"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more information to the dataset configuration is a good idea (if useful for something).
@@ -15,6 +18,33 @@ def is_number(s, number_type): | |||
return False | |||
|
|||
|
|||
def merge_dicts_of_1dim_torch_tensors(keys_from_dols, values_from_dols=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these methods can be dropped with the changes suggested above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need this method for KvsAll, right?
self.filtering_split = config.get("train.split") | ||
self.filtering_splits: List[str] = config.get("negative_sampling.filtering.splits") | ||
if len(self.filtering_splits) == 0: | ||
self.filtering_splits.append(config.get("train.split")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always append when not there. also, add handling of depracated configs to config.py
for filtering_split in self.filtering_splits: | ||
dataset.index(f"{filtering_split}_{pair}_to_{slot_str}") | ||
filtering_splits_str = '_'.join(sorted(self.filtering_splits)) | ||
dataset.index(f"{filtering_splits_str}_{pair}_to_{slot_str}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop
positives = ( | ||
index.get(pair).numpy() | ||
if pair in index | ||
else torch.IntTensor([]).numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still open.
positives_index[pair] = index.get(pair).numpy() | ||
positives_index[pair] = ( | ||
index.get(pair).numpy() | ||
if pair in index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead, check here for whether it's in one of the indexes etc
No its a bit broken, but its fixable. |
Yes, I did not know your new code well enough at places. I think all the echo_true and verbose things need to go out, but you'll be much quicker with this.
|
As title says, this PR adds loss/cost on validation data to address #2 . Reuse the relevant parts of TrainingJob to run the train job on validation data. This also brings the additional feature that TrainJobs can now run on arbitrary splits; all dependencies on the train split are removed and only determined by the necessary arguments that are propagated down.