Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2 #99

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

samuelbroscheit
Copy link
Member

As title says, this PR adds loss/cost on validation data to address #2 . Reuse the relevant parts of TrainingJob to run the train job on validation data. This also brings the additional feature that TrainJobs can now run on arbitrary splits; all dependencies on the train split are removed and only determined by the necessary arguments that are propagated down.

@samuelbroscheit samuelbroscheit requested a review from rgemulla May 16, 2020 21:35
@rgemulla
Copy link
Member

Why is this needed?

We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".

What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?

@samuelbroscheit
Copy link
Member Author

We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".

What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?

Yes, there are so many things that have to be disabled (disable optimzer, disable backward, no valid job instantiation, everything in .run() basically, ...) just to run it as a new instance of TrainingJob. I found it to be more sensible to call run_epoch from the valid_job, that has access to its parent_job anyhow. It works very nicely and, of couse, I tested it that it gives the same results as before for all TrainingJobs.

@rgemulla
Copy link
Member

I see & sounds good. Can you very briefly outline the approach/changes? That simplifies the code review.

@rgemulla
Copy link
Member

I do think that we should create a new instance of the training job from the valid job though (so that it can have a different config). That's cleaner and more future-proof than changing code in the training jobs.

@rgemulla
Copy link
Member

This would also allow its use from an arbitrary eval job (not necessarily valid)

@samuelbroscheit
Copy link
Member Author

This would also allow its use from an arbitrary eval job (not necessarily valid)

I agree. However, I started with an approach where the TrainingJob was acting depending on a flag in but that made it so ugly and hard to read with "if"s everywhere. So I ended up with the current solution. I think about if there is another way, because all we care about is run_epoch(). In the current approach it has the signature

trace_entry = self.run_epoch(
                split=self.train_split, echo_trace=True, do_backward=True
            )

@rgemulla
Copy link
Member

That's fine, but you can still create a new job during valid. Then just call run_epoch. The first two arguments are not needed (1st = part of job, 2nd = tresult of run_epoch). I'd rename the third argument to forward_only .

@samuelbroscheit
Copy link
Member Author

Should the training job just produce a seperate trace entry and echo it as well? I think its nicer when we retrieve the trace result and do

            metrics.update(
                avg_loss=train_trace_entry["avg_loss"],
                avg_penalty=train_trace_entry["avg_penalty"],
                avg_penalties=train_trace_entry["avg_penalties"],
                avg_cost=train_trace_entry["avg_cost"],
            )

WDYT?

@rgemulla
Copy link
Member

You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.

@samuelbroscheit
Copy link
Member Author

You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.

Ok. Should the subtraining job echo its trace as well?

@samuelbroscheit
Copy link
Member Author

My current plan is:

create
__init_for_train__ and
__init_for_eval__

which are branched to in __init__ and

run_for_train and
run_for_eval

which are branched into from run(). The run_epoch() trace can be retrieved with a hook.

WDYT

@rgemulla
Copy link
Member

Is this all needed? Why not just stick with run_epoch with flag forward_only?

To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.

@samuelbroscheit
Copy link
Member Author

Is this all needed? Why not just stick with run_epoch with flag forward_only?

Ah, did you propose to create the TrainJob but not call run() on it, but only run_epoch? Violates the Job-API a bit, but sure that works.

To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.

What would be the advantage of that?

@rgemulla
Copy link
Member

Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that).

For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter.

I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create.

@samuelbroscheit
Copy link
Member Author

samuelbroscheit commented May 20, 2020 via email

@samuelbroscheit
Copy link
Member Author

I wonder if eval.type should not take a list of eval types, runs all of them, and collects their metrics into one trace.

@rgemulla
Copy link
Member

But this runs multiple eval jobs, right? This could be implemented by having an "aggregate eval" job, which does this. Let's keep this for the future.

@samuelbroscheit
Copy link
Member Author

Can you review again?

kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/eval.py Show resolved Hide resolved
kge/job/eval.py Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Show resolved Hide resolved
@samuelbroscheit
Copy link
Member Author

I adressed your reviews.

I ended up doing a small revision of EvaluationJob: EvaluationJob.run() is implemented now in EvaluationJob which does standard stuff that has to be done for every EvaluationJob and then calls self._run() (suggestions for better name for self.._run() ?).

I shifted EntityRanking specific stuff from EvaluationJob to there.

I added self.verbose to EvaluationJob and use it.

I made EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook. This can be made nicer when we create AggregatingEvalJob that runs and aggregates a list of EvaluationJobs. Then the post_epoch_trace_hook can be removed from EvaluationJob and can be replaced by the entry "eval.type: entity_ranking, training_loss" in config-default-yaml. See #102

Copy link
Member

@rgemulla rgemulla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, only 2-3 minor suggestions left.

kge/job/train.py Outdated
self.is_prepared = False
self.model.train()

if config.get("job.type") == "train":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to disable validation with an optional constructor argument instead of this more hacky approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only as constructor or as a config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the config, we have it already. That being said, forget the constructor argument and simply set "valid.every" to 0 in the TrainingLossEvacluatinJob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, set train.checkpoint.every to 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I just realized that I was still initializing optimizer (and loss). Optimizer creates new parameters based on model.parameters(). In this context querying job.type makes more sense. See new commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the outside eval job does not have job.type=train (it could have). Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the outside eval job does not have job.type=train (it could have).

Huh? Isn't it one config == one job? "job.type" is either eval, train, search ..... When I pass trough the eval's config, then its job.type is "eval". I must misunderstand what you are saying here?

Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).

Ok can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if the job is created via EvaluationJob.create(...). An explicit argument makes this clearer

kge/job/eval.py Outdated Show resolved Hide resolved
kge/job/train.py Outdated Show resolved Hide resolved
kge/job/train.py Show resolved Hide resolved
@rgemulla
Copy link
Member

Please also add a CHANGELOG entry before you merge this PR.

kge/job/eval.py Outdated
@@ -175,13 +175,14 @@ def __init__(self, config: Config, dataset: Dataset, parent_job, model):

train_job_on_eval_split_config = config.clone()
train_job_on_eval_split_config.set("train.split", self.eval_split)
train_job_on_eval_split_config.set("negative_sampling.filtering.split", self.config.get("train.split"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only do this if unset.

kge/job/train.py Outdated
self.is_prepared = False
self.model.train()

if config.get("job.type") == "train":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the outside eval job does not have job.type=train (it could have). Either set this outside or add a constructor argument "initialize_for_forward_only=False" (preferred since clearer, I think).

positives = (
index.get(pair).numpy()
if pair in index
else torch.IntTensor([]).numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tensor should contain the positive and not be empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not create a numpy array right away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

positives_index[pair] = (
index.get(pair).numpy()
if pair in index
else torch.IntTensor([]).numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here

@rgemulla
Copy link
Member

Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?

@samuelbroscheit
Copy link
Member Author

Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?

True, for eval only KvsAll should use Training + eval data.

@rgemulla
Copy link
Member

For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.

I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.

@samuelbroscheit
Copy link
Member Author

For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.

I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.

Actually also for KvsAll its clear, its the same as for all losses. It should only be the labels from eval data, i.e. for unseen labels.

The only particular thing with NS is, that the current implementation does the filtering of positive triples based on what it is given as train.split. But actually that is always the known KG, i.e., the train split.

@rgemulla
Copy link
Member

Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.

@samuelbroscheit
Copy link
Member Author

samuelbroscheit commented May 25, 2020

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise. However, train.split is set to "valid" when TrainingJob is run as evaluation.

kge/kge/util/sampler.py

Lines 33 to 34 in f2bcbdb

if self.filtering_split == "":
self.filtering_split = config.get("train.split")

This is why one has to set it explicitly, and then the current implementation of NS with filtering didn't cover this case.

@rgemulla
Copy link
Member

rgemulla commented May 26, 2020

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise.

So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.

@samuelbroscheit
Copy link
Member Author

Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.

When we were talking about train+eval, this is what I meant, prefixes from valid, but labels from both. I understand what you mean that the loss on train and the loss on valid are not comparable in that case. With "it is clear" I meant that the only-on-valid KvsAll loss would still work to measure overfitting. Would be nicer if it would be a comparable loss, though.

@samuelbroscheit
Copy link
Member Author

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise.

So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.

So should we?

@rgemulla
Copy link
Member

If we allow filtering with train+valid, we'd need to change the interface. Not sure if we want to go there; it significantly complicates the code if we allow for multiple splits during training.

An alternative may be to create new splits by merging existing splits. That's wasting memory, but at least it's less intrusive. The easiest way to do this is (1) extend the dataset.yaml with support for multple files per split, (2) add a train_valid and a train_valid_test split with the appropriate files. That should be a separate PR. For now, why not simply use the valid split and issue a warning or an error when KvsAll is used like that?

@rgemulla
Copy link
Member

Also, using KvsAll with a different "query split" and "label split" is highly confusing.

Option 1: TrainingLossEvaluationJob solely uses the validation split and ignores all other splits (also for negative sampling etc). This way, it's clear what it does and we can issue a warning saying this. How useful such a loss is, is another question.

Option 2: we do not add this method of evaluation. For comparabiliy with training data, we can proceed the other way around: run the standard eval job on a sample of the training data (this sampling can be done externally to create a new train_sample split).

@rgemulla
Copy link
Member

I tend to option 2.

@samuelbroscheit
Copy link
Member Author

I'm for option 3: do it and do it correctly.

Now we're almost there and understand the implications. Would be a waste, now that we scrutinized it. I also like that this is tightly coupled to the filtering of entity_ranking. This would mean that EvaluationJob and TrainingJob both have a query_split and a label_splits attribute. label_splits would replace filter_splits in EvaluationJob. It stays an automatically set option and is not a configurable option. We can also call it "known answers".

For the TrainingJob when not in eval mode -> nothing changes. query split == label split.

In eval mode:

  • For 1vsAll in eval mode: label_split == query_split.

  • For NS in eval mode: label_split only comes into play when positives are filtered -> label_split = train split + eval split(s)

  • KvsAll in eval mode: label_split are the labels -> label_split = train split + eval split(s)
    For KvsAll we do not need extra dataset files. Its easy to merge the dictionaries and remove the keys that are not in train_split.

@rgemulla
Copy link
Member

That would also be a suitable option. The main drawback is that it's more work.

I suggest a slightly revised approach:

For 1vsAll: do nothing
For NS: add support for multiple filter splits (not just one, as we have now)
For KvsAll: add a label_splits option (under key KvsAll) along the lines you discussed

For NS: rename negative_sampling.filtering.split to negative_sampling.filtering.splits. Automatically add train.split, but additionally use the splits mentioned here (along the lines we already do it for eval.filter_splits).

For KvsAll: You are right in that the label_splits do not need to be merged apriori (which would mean that we need to copy all data). This can be done in the respective collate() function during data loading. If label_splits is unset, train.split should be used. If set, these splits + train.split should be used (again, along the lines we already do it for eval.filter_splits).

@samuelbroscheit
Copy link
Member Author

samuelbroscheit commented May 27, 2020

083b27c contains all the relevant changes against master and 66a9bc1 contains all relevant changes since our last discussion. I messed up the commit historywith rebase .

@samuelbroscheit
Copy link
Member Author

Can you review 66a9bc1 or is there anything else?

@rgemulla
Copy link
Member

I cannot review this PR as such. Instead of rebasing it on master, it contains tons of master commits which make it hard to review. Can you rebase on master so that just the changes are here?

@samuelbroscheit samuelbroscheit force-pushed the add_training_loss_as_metric_to_validation branch from c2ab672 to 083b27c Compare May 28, 2020 14:08
samuelbroscheit and others added 6 commits May 28, 2020 16:09
Add EvalTrainingLossJob
…onJob with standard stuff that has to be done for every EvaluationJob and then calls self._run()

Shift EntityRanking specific stuff to there
Make EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook
Add self.verbose to EvaluationJob and use it
Adress code review
eval.py:
set negativ sampling filtering split to train.split
rename to TrainingLossEvaluationJob

train.py:
initialize only things for train if TrainingJob is used for training
more ifs

-
@samuelbroscheit
Copy link
Member Author

Now?

@rgemulla
Copy link
Member

Even worse, you destroyed it. Do not touch, I force push in a minute.

eval.py:
set negativ sampling filtering split to train.split
rename to TrainingLossEvaluationJob

train.py:
initialize only things for train if TrainingJob is used for training
more ifs
eval.py:
set splits

train.py:
use self.is_forward_only
use dataset.index to load label_index for KvsAll

sampler.py:
rename filtering splits
use dataset.index to load label_index for filtering splits

config-default.yaml:
add label_splits config for KvsAll

indexing.py + misc.py:
add index and helper funcs to merge KvsAll dicts
@rgemulla rgemulla force-pushed the add_training_loss_as_metric_to_validation branch from 083b27c to cd3bd49 Compare May 28, 2020 14:23
Copy link
Member

@rgemulla rgemulla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased (hopefully correctly) and added some review comments.

kge/config-default.yaml Show resolved Hide resolved
kge/config-default.yaml Show resolved Hide resolved
@@ -327,7 +331,8 @@ eval:
# mean_reciprocal_rank_filtered_with_test.
filter_with_test: True

# Type of evaluation (entity_ranking only at the moment)
# Type of evaluation (entity_ranking or training_loss). Currently,
# entity_ranking runs training_loss as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for disabling the automatic running of training_loss. Makes evaluation too costly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation and test data are a lot smaller (1/10 is the worst valid/train ratio in our current datasets).

index_name = f"{split_combi_str}_{key}_to_{value}"
indexes = [dataset.index(f"{_split}_{key}_to_{value}") for _split in split]
dataset._indexes[index_name] = merge_dicts_of_1dim_torch_tensors(indexes)
return dataset._indexes[index_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of merging indexes (very expensive memory wise), the code should support using multiple indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NS I see that it works, for KvsAll I have to merge them. But granted, not neccessarily as an index.

Copy link
Member

@rgemulla rgemulla May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant here: the correspoding training jobs should support this. Alternatively, one may write a wrapper index that takes muliple indexes and merges there results on demand given a key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For KvsAll, you can also just add the labels of each label split sequentially in the collate function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not allow using label_splits other than for forward only. Otherwise they leak valid/test data. Likewise for negative sampling, I guess.

index_name = f"{split_combi_str}_{key}_to_{value}"
dataset.index_functions[index_name] = IndexWrapper(
merge_KvsAll_indexes, split=split_combi, key=key
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it even worse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? They are not invoked.

As a sidenote: This was neccasary because there is currently no way to determine the cross-validation semantics of a dataset file (train,valid,test). We should (in another PR) add a xvaltype attribute to the dataset files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also not needed. Let's keep the code clean and not add "orphans"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more information to the dataset configuration is a good idea (if useful for something).

@@ -15,6 +18,33 @@ def is_number(s, number_type):
return False


def merge_dicts_of_1dim_torch_tensors(keys_from_dols, values_from_dols=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods can be dropped with the changes suggested above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need this method for KvsAll, right?

self.filtering_split = config.get("train.split")
self.filtering_splits: List[str] = config.get("negative_sampling.filtering.splits")
if len(self.filtering_splits) == 0:
self.filtering_splits.append(config.get("train.split"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always append when not there. also, add handling of depracated configs to config.py

for filtering_split in self.filtering_splits:
dataset.index(f"{filtering_split}_{pair}_to_{slot_str}")
filtering_splits_str = '_'.join(sorted(self.filtering_splits))
dataset.index(f"{filtering_splits_str}_{pair}_to_{slot_str}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop

positives = (
index.get(pair).numpy()
if pair in index
else torch.IntTensor([]).numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

positives_index[pair] = index.get(pair).numpy()
positives_index[pair] = (
index.get(pair).numpy()
if pair in index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, check here for whether it's in one of the indexes etc

@samuelbroscheit
Copy link
Member Author

I rebased (hopefully correctly) and added some review comments.

No its a bit broken, but its fixable.

@rgemulla
Copy link
Member

rgemulla commented May 28, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants