Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to generate pseudo labels from baseline detection model #7

Open
liyunsheng13 opened this issue Jul 4, 2019 · 28 comments
Open

how to generate pseudo labels from baseline detection model #7

liyunsheng13 opened this issue Jul 4, 2019 · 28 comments

Comments

@liyunsheng13
Copy link

Hi
I think the the bdd_peds+DETS18k means using bboxes from the source dataset (bdd_peds) and the pseudo labels generated by the baseline detection model for target dataset. Could you let me know how to generate the pseudos labels for this part? Do you run the baseline model on all the training sample in the target dataset and filter ~100000 images?

@AruniRC
Copy link
Owner

AruniRC commented Jul 5, 2019

Hi,

The pseudo-labels are generated by first running a detector on every frame of a video, then keeping the high-confidence detections (threshold 0.8 on detector confidence score). And yes, as you said, we filter out the subset of images.

However, because this process in quite involved and time-consuming, we have the video frames we used, as well as annotations, available for download: https://github.com/AruniRC/detectron-self-train#bdd-hard-examples

I think it will be easier because of engineering/implementation issues to simply download that data, instead of the process of generating the pseudo-labels on BDD (it took us about a week on a GPU cluster).

thanks for your interest in this project and hope this helped.

@liyunsheng13
Copy link
Author

Thanks for you reply. I think generating pseudo labels is the key point of this paper. I can understand the high level idea but I do need to run some code to verify my understanding. I'm still a little new to detectron code. Actually I have no idea how to generate a json file with you code. Could you provide some sample code to do that? I understand your code is based on your machine. But is that possible to share some to me to help me have a better understand the whole project?

I have another question about file of 'bdd_distill100_track100.yaml'. In this file, you use the following hyperparameters:
DISTILL_LAMBDA: 1.0
DISTILL_TEMPERATURE: 1.0
TRACKER_SCORE: 1.0
They are different from the parameters you used in your paper. Could you let me know which one is correct?

@AruniRC
Copy link
Owner

AruniRC commented Jul 5, 2019

Hi,

  1. For training annotations:
    We assume that the detections are written in a simple text file format: the expected format is explained in this code: https://github.com/AruniRC/detectron-self-train/blob/master/lib/utils/face_utils.py#L9 . The JSON for training comes from this script: https://github.com/AruniRC/detectron-self-train#bdd-hard-examples . It is simply the format in which MS-COCO detection training annotations are written. If you load in any sample JSON files in Python and take a look, it should become clearer.

  2. That setting in the YAML is for the "HP-constrained" setting in the paper. It puts a weight of 1 on the hard or tracker-only samples (TRACKER_SCORE: 1.0). From the paper, under I. Constrained hard examples: "We can achieve this by setting θ= 1 in Eq. 3 and λ= 1 in Eq. 4, which would create a label of 1 for tracker-only “hard” examples, and a label equal to baseline detector score for the high-confidence detections, i.e. “easy” examples." That is exactly what is being done in 'bdd_distill100_track100.yaml'

I realize this can be confusing -- based on your feedback, I will write these details into the README. Apologies for the confusion.

@liyunsheng13
Copy link
Author

Thanks for you considerable reply. I have one more question. In the readme , the mIoU for HP-constrained is more than 29. But in the paper, it is 28.43 (table 4). When you change the value for Lambda, you can achieve different results. It seems that when Lambda=0.3, the best performance can be got. Is that always true or it won't make a big difference when you change the value for Lambda.

@AruniRC
Copy link
Owner

AruniRC commented Jul 5, 2019

  1. The difference in mAP (not mIOU i think) is because of different initializations, randomness in SGD etc. But the overall trends hold and it is near the variation over 5 rounds of train+test we report in the paper.

  2. Changing the Lambda to an optimal value does give better performance. However, to find this optimal value, we would need to manually run eval on a validation set. We show even without a manual setting, we can get almost the same performance using HP-constrained.

You're welcome -- hopefully next user will find this easier to use in their research!

@liyunsheng13
Copy link
Author

liyunsheng13 commented Jul 6, 2019

I find a bug when I try to evaluate the model with multiple GPUs. In the file "detectron-self-train/lib/nn/parallel/data_parallel.py" line 82 (mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])), I get the error of "IndexError: list index out of range". It seems that i could only be equal to 0.

Another weird thing is, when I run test_net.py with --multi_gpu_testing, I always have the following assert error:
assert (torch.cuda.device_count() == 1) ^ bool(args.multi_gpu_testing)
I try to do:
print (torch.cuda.device_count())
print (args.multi_gpu_testing)
I get the following results:
2
True
Which I think should pass the assertion. But the truth is it always fails here. I'm not very familiar how the code was implemented in the data parallel manner. Could you take a look at this issue?

@AruniRC
Copy link
Owner

AruniRC commented Jul 6, 2019

Could you try using single GPU and confirm if it is working? (@PCJohn for comments too)

@liyunsheng13
Copy link
Author

single gpu is fine.

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 9, 2019

@liyunsheng13 can you share the command you're using for the evaluation?
Samples of the commands we used are in eval/bdd_distill folder: see this example

Also, make sure that multiple GPUs are visible (as you may have set CUDA_VISIBLE_DEVICES to 1 for training the model).

An additional note about generating the pseudo-labels:
You can look at this script to see exactly how we generated the labels.

@liyunsheng13
Copy link
Author

Thanks for your response. I can use single GPU to evaluate the results. But I still have trouble in generating pseudo labels. In the file threshold_search.py, it seems that file needs the ground truth and is not the one to generate the pseudo labels. I try to generate by myself with the test_net.py to get detection results and set the threshold to 0.8 to save the bounding boxes with the detection model I have. But the result is far away from the results reported in the paper. I also take a look at the json file "bdd_dets18k.json", and I found the id for annotations is not continuous. I guess you might use other techniques to filter the false positive detections. Could you take a look at my question?

@AruniRC
Copy link
Owner

AruniRC commented Jul 10, 2019

Hi @liyunsheng13 , can you provide details on what you are doing?

Are the following steps what you are doing, and not being able to match our reported numbers?

  1. Run the baseline BDD-Source detector on the images in BDD-Dets JSON
  2. Set a threshold of 0.8 on these detections and create a training JSON (pseudo-labels)
  3. Re-train using BDD-Source and pseudo-labeled data
  4. Accuracy does not match what we report?

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 11, 2019

@liyunsheng13 you're right, I was a little confused about what exactly the question was.

The pseudo labels are created during training and we don't explicitly create separate jsons with the pseudo labels as scores. The answer above by @AruniRC is pretty comprehensive.
If you'd like to track these parameters (DISTILL_LAMBDA and TRACKER_SCORE) in the codebase, this is the function that applies the distillation loss, which is called from here in model_builder.py

About the attempt to generate the pseudo labels for bdd_dets18k:
If these are the steps you followed (comment by @AruniRC just above this), point 3 is really important: ensuring that you jointly trained with BDD-source (training with only the pseudo-label json will make it much worse).

@liyunsheng13
Copy link
Author

I find a problem I might have when I test the model. Could you let me know what is the test set you use to get the results in the paper? In the eval file, it seems that the json file is bdd_ped_val, but it only contains ~1700 images. I think it has the same domain as bdd_ped. In the paper, you mentioned the BDD-Target-Test has 8236 images. I though I might use a wrong val test. Could you give me some advice?

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 15, 2019

The correct dataset for BDD-Target-Test is: "bdd_peds_not_clear_any_daytime_val" (see this script)

The json file corresponding to this dataset: bdd_peds_not_clear_any_daytime_train.json (see here in the dataset catalog)

@liyunsheng13
Copy link
Author

So I think the result you show in the README file is based on BDD-Target-Test. But the link for the eval file shows that the val dataset you use is bdd_ped_val. Is that wrong? I thought it might be reason I kept getting different results than you can get.

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 15, 2019

There are multiple eval scripts for each setting (see the folder with all of them). For example, consider using distill with lambda=0.3. There are 3 scripts evaling the same setting on bdd_peds_val, bdd_peds_full and bdd_target_peds

@liyunsheng13
Copy link
Author

But I think different setting has different domain. In the table 4 of your paper, which validation set do you use?

@liyunsheng13
Copy link
Author

I think for domain adaptation problem, the validation set should have the same domain as the target training set. Correct?

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 15, 2019

The results in Table 4. use "bdd_peds_not_clear_any_daytime_val" (the complement of the source domain in the BDD dataset). This is the BDD(Rest) dataset described in section 4.1 of the paper.
Yes, the validation set should be the same as the target domain.

@liyunsheng13
Copy link
Author

I see. I think the model you release in the github is based on the test set of bdd_peds_not_clear_any_daytime_val not bdd_peds_val which is shown in the eval file. I think you might need to change it.

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 15, 2019

That's correct. Those models were tested on bdd_peds_not_clear_any_daytime_val.
You're right, the eval folder can be cleaned up so it's less confusing and clear which dataset and json to test on. Also, changing the links in the README to point to the correct eval scripts.

@liyunsheng13
Copy link
Author

Haha, thanks for your confirmation. I will test my model again with the correct test set.

@liyunsheng13
Copy link
Author

I have one more question. For the baseline result which is around ~15. I think you use bdd_peds_train.json file to train the model and use bdd_peds_val.json to test. However, I think both of them are from the source domain. If I'm correct, the results should not only around 15, it should be some results around ~30. Do I miss something?

@AruniRC
Copy link
Owner

AruniRC commented Jul 16, 2019 via email

@PCJohn
Copy link
Collaborator

PCJohn commented Jul 17, 2019

@liyunsheng13 you're right about the json: bdd_peds_val.json is the val set on the source domain. We did test the model on this, which is why the script is in the eval folder (this is not the reported number. The reported number is on the target domain). When you eval the baseline on the bdd_peds_val.json, you're getting 30 mAP?

Re-iterating some points: The 15 mAP results is on the target domain, which is bdd_peds_not_clear_any_daytime_val.json (as @AruniRC mentioned). The README is misleading, as it has links to the eval script on the source domain, but we will update that (again, mentioned in @AruniRC s answer above)

@hixiaye
Copy link

hixiaye commented Nov 12, 2019

@liyunsheng13 @AruniRC @PCJohn hi, I think that the gzipped tarball which
can be obtained from http://maxwell.cs.umass.edu/self-train/dataset/bdd_HP18k.tar.gz includes all the pseudo-labels from detections and tracking. All I need to do is to download that and run a shell, like bdd_source_and_HP18k_distill100_track100.sh.
Is it right? thanks

@PCJohn
Copy link
Collaborator

PCJohn commented Nov 15, 2019

Yes, that is correct

@hixiaye
Copy link

hixiaye commented Nov 15, 2019

@PCJohn thanks for your reply
There is a problem bothering me.
I use bdd_HP18k.json and bdd_source_and_HP18k_distill100_track100.sh.

However, the error occured: "00000f77c-62c2a288_00000067.jpg" not found.

I checked the images in bdd_peds_HP18k, and found the first image is '_00000096.jpg'. However, the first in 'bdd_HP18k.json' is "_00000067.jpg".

Did I do something wrong?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants