-
-
Notifications
You must be signed in to change notification settings - Fork 7
Meetings
Butch Landingin edited this page Jul 9, 2020
·
1 revision
Attendees: @butchland @tyoc213
Agenda:
- brainstorm roadmap and approaches what we need to study
- status
Discussion:
-
Status
- Things running:
-
dataloaders(...device=tpu)
- will move batch inputs andtargets to tpu device - wrapping
opt_func
withXLAOptFuncWrapper
will callxm.optimizer_step(self.opt)
correctly duringopt.step()
- calling
cnn_learner(dls,...)
will create learner will createmodel.parameters()
in tpu, due to code that sets model device to dataloader device (which has been previously set to tpu) - calling
learn.recorder.plot_loss()
is working.
-
- Things that funky or not running:
- using
batch_tfms=aug_transforms()
in dataloaders seems to slow training down. - using
learner.fine_tune(1)
causes the train and valid loss to go up after the unfreeze step -
lr_find()
shows a funky graph. - using
Classification.Interpretation
and runningmost_confused()
throws an index error. - tyoc213: Adding tensorboard callback and training also throws index error
- using
- Things running:
-
Next steps
- Approach:
- Don't go for big-bang (+ lots of debug) approach
- Start with the simplest model and learner that is running on a TPU
- then start adding more fastai stuff incrementally and at each step, retest
- if bug found, then let's fix that quickly before moving on.
- Goal: start from the simplest stuff that is running, then iterate in small steps, but quickly.
- Along side iterating in rapid cycles, keep track of GPU baseline vs TPU performance.
- slow TPU performance may indicate something funky in our implementation or in fastai code vis-a-vis TPU.
- Put off multi TPU core for later - once we get single TPU running well.
- Plans:
- Next meeting schedule: 2020-07-09 6-830PM PDT
- In the meantime:
- @tyoc213 to fork project
- build a simple baseline model
- run on GPU and measure loading/training/inference time for baseline GPU performance
- simplest learner possible that is running on TPU
- no batch transforms
- minimum training callbacks
- simple and small dataset
- simple model architecture
- measure loading/training/inference time for baseline TPU performance
- start looking at bugs
- @butchland - look at slow running of batch transforms - look at increasing train and valid loss on unfreeze (use fit not fit_one_cycle)
- @tyoc - fork project and start studying it
- Approach: