how to generate pseudo labels from baseline detection model #7

liyunsheng13 · 2019-07-04T18:36:54Z

Hi
I think the the bdd_peds+DETS18k means using bboxes from the source dataset (bdd_peds) and the pseudo labels generated by the baseline detection model for target dataset. Could you let me know how to generate the pseudos labels for this part? Do you run the baseline model on all the training sample in the target dataset and filter ~100000 images?

AruniRC · 2019-07-05T05:23:31Z

Hi,

The pseudo-labels are generated by first running a detector on every frame of a video, then keeping the high-confidence detections (threshold 0.8 on detector confidence score). And yes, as you said, we filter out the subset of images.

However, because this process in quite involved and time-consuming, we have the video frames we used, as well as annotations, available for download: https://github.com/AruniRC/detectron-self-train#bdd-hard-examples

I think it will be easier because of engineering/implementation issues to simply download that data, instead of the process of generating the pseudo-labels on BDD (it took us about a week on a GPU cluster).

thanks for your interest in this project and hope this helped.

liyunsheng13 · 2019-07-05T05:48:47Z

Thanks for you reply. I think generating pseudo labels is the key point of this paper. I can understand the high level idea but I do need to run some code to verify my understanding. I'm still a little new to detectron code. Actually I have no idea how to generate a json file with you code. Could you provide some sample code to do that? I understand your code is based on your machine. But is that possible to share some to me to help me have a better understand the whole project?

I have another question about file of 'bdd_distill100_track100.yaml'. In this file, you use the following hyperparameters:
DISTILL_LAMBDA: 1.0
DISTILL_TEMPERATURE: 1.0
TRACKER_SCORE: 1.0
They are different from the parameters you used in your paper. Could you let me know which one is correct?

AruniRC · 2019-07-05T06:26:39Z

Hi,

For training annotations:
We assume that the detections are written in a simple text file format: the expected format is explained in this code: https://github.com/AruniRC/detectron-self-train/blob/master/lib/utils/face_utils.py#L9 . The JSON for training comes from this script: https://github.com/AruniRC/detectron-self-train#bdd-hard-examples . It is simply the format in which MS-COCO detection training annotations are written. If you load in any sample JSON files in Python and take a look, it should become clearer.
That setting in the YAML is for the "HP-constrained" setting in the paper. It puts a weight of 1 on the hard or tracker-only samples (TRACKER_SCORE: 1.0). From the paper, under I. Constrained hard examples: "We can achieve this by setting θ= 1 in Eq. 3 and λ= 1 in Eq. 4, which would create a label of 1 for tracker-only “hard” examples, and a label equal to baseline detector score for the high-confidence detections, i.e. “easy” examples." That is exactly what is being done in 'bdd_distill100_track100.yaml'

I realize this can be confusing -- based on your feedback, I will write these details into the README. Apologies for the confusion.

liyunsheng13 · 2019-07-05T06:31:21Z

Thanks for you considerable reply. I have one more question. In the readme , the mIoU for HP-constrained is more than 29. But in the paper, it is 28.43 (table 4). When you change the value for Lambda, you can achieve different results. It seems that when Lambda=0.3, the best performance can be got. Is that always true or it won't make a big difference when you change the value for Lambda.

AruniRC · 2019-07-05T06:44:31Z

The difference in mAP (not mIOU i think) is because of different initializations, randomness in SGD etc. But the overall trends hold and it is near the variation over 5 rounds of train+test we report in the paper.
Changing the Lambda to an optimal value does give better performance. However, to find this optimal value, we would need to manually run eval on a validation set. We show even without a manual setting, we can get almost the same performance using HP-constrained.

You're welcome -- hopefully next user will find this easier to use in their research!

liyunsheng13 · 2019-07-06T00:47:31Z

I find a bug when I try to evaluate the model with multiple GPUs. In the file "detectron-self-train/lib/nn/parallel/data_parallel.py" line 82 (mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])), I get the error of "IndexError: list index out of range". It seems that i could only be equal to 0.

Another weird thing is, when I run test_net.py with --multi_gpu_testing, I always have the following assert error:
assert (torch.cuda.device_count() == 1) ^ bool(args.multi_gpu_testing)
I try to do:
print (torch.cuda.device_count())
print (args.multi_gpu_testing)
I get the following results:
2
True
Which I think should pass the assertion. But the truth is it always fails here. I'm not very familiar how the code was implemented in the data parallel manner. Could you take a look at this issue?

AruniRC · 2019-07-06T01:24:55Z

Could you try using single GPU and confirm if it is working? (@PCJohn for comments too)

liyunsheng13 · 2019-07-06T01:28:23Z

single gpu is fine.

PCJohn · 2019-07-09T00:44:37Z

@liyunsheng13 can you share the command you're using for the evaluation?
Samples of the commands we used are in eval/bdd_distill folder: see this example

Also, make sure that multiple GPUs are visible (as you may have set CUDA_VISIBLE_DEVICES to 1 for training the model).

An additional note about generating the pseudo-labels:
You can look at this script to see exactly how we generated the labels.

liyunsheng13 · 2019-07-10T03:05:19Z

Thanks for your response. I can use single GPU to evaluate the results. But I still have trouble in generating pseudo labels. In the file threshold_search.py, it seems that file needs the ground truth and is not the one to generate the pseudo labels. I try to generate by myself with the test_net.py to get detection results and set the threshold to 0.8 to save the bounding boxes with the detection model I have. But the result is far away from the results reported in the paper. I also take a look at the json file "bdd_dets18k.json", and I found the id for annotations is not continuous. I guess you might use other techniques to filter the false positive detections. Could you take a look at my question?

AruniRC · 2019-07-10T04:26:49Z

Hi @liyunsheng13 , can you provide details on what you are doing?

Are the following steps what you are doing, and not being able to match our reported numbers?

Run the baseline BDD-Source detector on the images in BDD-Dets JSON
Set a threshold of 0.8 on these detections and create a training JSON (pseudo-labels)
Re-train using BDD-Source and pseudo-labeled data
Accuracy does not match what we report?

PCJohn · 2019-07-11T01:59:06Z

@liyunsheng13 you're right, I was a little confused about what exactly the question was.

The pseudo labels are created during training and we don't explicitly create separate jsons with the pseudo labels as scores. The answer above by @AruniRC is pretty comprehensive.
If you'd like to track these parameters (DISTILL_LAMBDA and TRACKER_SCORE) in the codebase, this is the function that applies the distillation loss, which is called from here in model_builder.py

About the attempt to generate the pseudo labels for bdd_dets18k:
If these are the steps you followed (comment by @AruniRC just above this), point 3 is really important: ensuring that you jointly trained with BDD-source (training with only the pseudo-label json will make it much worse).

liyunsheng13 · 2019-07-14T20:58:23Z

I find a problem I might have when I test the model. Could you let me know what is the test set you use to get the results in the paper? In the eval file, it seems that the json file is bdd_ped_val, but it only contains ~1700 images. I think it has the same domain as bdd_ped. In the paper, you mentioned the BDD-Target-Test has 8236 images. I though I might use a wrong val test. Could you give me some advice?

PCJohn · 2019-07-15T01:48:22Z

The correct dataset for BDD-Target-Test is: "bdd_peds_not_clear_any_daytime_val" (see this script)

The json file corresponding to this dataset: bdd_peds_not_clear_any_daytime_train.json (see here in the dataset catalog)

liyunsheng13 · 2019-07-15T01:51:22Z

So I think the result you show in the README file is based on BDD-Target-Test. But the link for the eval file shows that the val dataset you use is bdd_ped_val. Is that wrong? I thought it might be reason I kept getting different results than you can get.

PCJohn · 2019-07-15T01:55:05Z

There are multiple eval scripts for each setting (see the folder with all of them). For example, consider using distill with lambda=0.3. There are 3 scripts evaling the same setting on bdd_peds_val, bdd_peds_full and bdd_target_peds

liyunsheng13 · 2019-07-15T01:56:34Z

But I think different setting has different domain. In the table 4 of your paper, which validation set do you use?

liyunsheng13 · 2019-07-15T01:57:12Z

I think for domain adaptation problem, the validation set should have the same domain as the target training set. Correct?

PCJohn · 2019-07-15T01:59:26Z

The results in Table 4. use "bdd_peds_not_clear_any_daytime_val" (the complement of the source domain in the BDD dataset). This is the BDD(Rest) dataset described in section 4.1 of the paper.
Yes, the validation set should be the same as the target domain.

liyunsheng13 · 2019-07-15T02:01:30Z

I see. I think the model you release in the github is based on the test set of bdd_peds_not_clear_any_daytime_val not bdd_peds_val which is shown in the eval file. I think you might need to change it.

PCJohn · 2019-07-15T02:07:56Z

That's correct. Those models were tested on bdd_peds_not_clear_any_daytime_val.
You're right, the eval folder can be cleaned up so it's less confusing and clear which dataset and json to test on. Also, changing the links in the README to point to the correct eval scripts.

liyunsheng13 · 2019-07-15T02:09:03Z

Haha, thanks for your confirmation. I will test my model again with the correct test set.

liyunsheng13 · 2019-07-16T17:59:16Z

I have one more question. For the baseline result which is around ~15. I think you use bdd_peds_train.json file to train the model and use bdd_peds_val.json to test. However, I think both of them are from the source domain. If I'm correct, the results should not only around 15, it should be some results around ~30. Do I miss something?

AruniRC · 2019-07-16T18:22:28Z

Hi, we report the ~15 AP by evaluation on the *Target* domain test/val set, which is the usual procedure of evaluating domain adaptation methods. Please note, the eval scripts will be updated to show the correct eval dataset names.

…

On Tue, Jul 16, 2019, 10:59 AM Yunsheng Li ***@***.***> wrote: I have one more question. For the baseline result which is around ~15. I think you use bdd_peds_train.json file to train the model and use bdd_peds_val.json to test. However, I think both of them are from the source domain. If I'm correct, the results should not only around 15, it should be some results around ~30. Do I miss something? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AAIBQT2IC7JWKTREKNYIN43P7YD7LA5CNFSM4H56L7H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BVAPQ#issuecomment-511922238>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIBQT2LYLVHO3SOIOSAMBDP7YD7LANCNFSM4H56L7HQ> .

PCJohn · 2019-07-17T01:28:37Z

@liyunsheng13 you're right about the json: bdd_peds_val.json is the val set on the source domain. We did test the model on this, which is why the script is in the eval folder (this is not the reported number. The reported number is on the target domain). When you eval the baseline on the bdd_peds_val.json, you're getting 30 mAP?

Re-iterating some points: The 15 mAP results is on the target domain, which is bdd_peds_not_clear_any_daytime_val.json (as @AruniRC mentioned). The README is misleading, as it has links to the eval script on the source domain, but we will update that (again, mentioned in @AruniRC s answer above)

hixiaye · 2019-11-12T07:49:50Z

@liyunsheng13 @AruniRC @PCJohn hi, I think that the gzipped tarball which
can be obtained from http://maxwell.cs.umass.edu/self-train/dataset/bdd_HP18k.tar.gz includes all the pseudo-labels from detections and tracking. All I need to do is to download that and run a shell, like bdd_source_and_HP18k_distill100_track100.sh.
Is it right? thanks

PCJohn · 2019-11-15T00:20:30Z

Yes, that is correct

hixiaye · 2019-11-15T01:26:19Z

@PCJohn thanks for your reply
There is a problem bothering me.
I use bdd_HP18k.json and bdd_source_and_HP18k_distill100_track100.sh.

However, the error occured: "00000f77c-62c2a288_00000067.jpg" not found.

I checked the images in bdd_peds_HP18k, and found the first image is '_00000096.jpg'. However, the first in 'bdd_HP18k.json' is "_00000067.jpg".

Did I do something wrong?
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to generate pseudo labels from baseline detection model #7

how to generate pseudo labels from baseline detection model #7

liyunsheng13 commented Jul 4, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 5, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 5, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 6, 2019 •

edited

Loading

AruniRC commented Jul 6, 2019

liyunsheng13 commented Jul 6, 2019

PCJohn commented Jul 9, 2019 •

edited

Loading

liyunsheng13 commented Jul 10, 2019

AruniRC commented Jul 10, 2019

PCJohn commented Jul 11, 2019 •

edited

Loading

liyunsheng13 commented Jul 14, 2019

PCJohn commented Jul 15, 2019 •

edited

Loading

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 •

edited

Loading

liyunsheng13 commented Jul 15, 2019

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 •

edited

Loading

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 •

edited

Loading

liyunsheng13 commented Jul 15, 2019

liyunsheng13 commented Jul 16, 2019

AruniRC commented Jul 16, 2019 via email

PCJohn commented Jul 17, 2019 •

edited

Loading

hixiaye commented Nov 12, 2019

PCJohn commented Nov 15, 2019

hixiaye commented Nov 15, 2019 •

edited

Loading

how to generate pseudo labels from baseline detection model #7

how to generate pseudo labels from baseline detection model #7

Comments

liyunsheng13 commented Jul 4, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 5, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 5, 2019

AruniRC commented Jul 5, 2019

liyunsheng13 commented Jul 6, 2019 • edited Loading

AruniRC commented Jul 6, 2019

liyunsheng13 commented Jul 6, 2019

PCJohn commented Jul 9, 2019 • edited Loading

liyunsheng13 commented Jul 10, 2019

AruniRC commented Jul 10, 2019

PCJohn commented Jul 11, 2019 • edited Loading

liyunsheng13 commented Jul 14, 2019

PCJohn commented Jul 15, 2019 • edited Loading

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 • edited Loading

liyunsheng13 commented Jul 15, 2019

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 • edited Loading

liyunsheng13 commented Jul 15, 2019

PCJohn commented Jul 15, 2019 • edited Loading

liyunsheng13 commented Jul 15, 2019

liyunsheng13 commented Jul 16, 2019

AruniRC commented Jul 16, 2019 via email

PCJohn commented Jul 17, 2019 • edited Loading

hixiaye commented Nov 12, 2019

PCJohn commented Nov 15, 2019

hixiaye commented Nov 15, 2019 • edited Loading

liyunsheng13 commented Jul 6, 2019 •

edited

Loading

PCJohn commented Jul 9, 2019 •

edited

Loading

PCJohn commented Jul 11, 2019 •

edited

Loading

PCJohn commented Jul 15, 2019 •

edited

Loading

PCJohn commented Jul 15, 2019 •

edited

Loading

PCJohn commented Jul 15, 2019 •

edited

Loading

PCJohn commented Jul 15, 2019 •

edited

Loading

PCJohn commented Jul 17, 2019 •

edited

Loading

hixiaye commented Nov 15, 2019 •

edited

Loading