Meetings

Meeting: 2020-07-06 9-11 PM PDT

Attendees: @butchland @tyoc213

Agenda:

brainstorm roadmap and approaches what we need to study
status

Discussion:

Status
- Things running:
  - dataloaders(...device=tpu) - will move batch inputs andtargets to tpu device
  - wrapping opt_func with XLAOptFuncWrapper will call xm.optimizer_step(self.opt) correctly during opt.step()
  - calling cnn_learner(dls,...) will create learner will create model.parameters() in tpu, due to code that sets model device to dataloader device (which has been previously set to tpu)
  - calling learn.recorder.plot_loss() is working.
- Things that funky or not running:
  - using batch_tfms=aug_transforms() in dataloaders seems to slow training down.
  - using learner.fine_tune(1) causes the train and valid loss to go up after the unfreeze step
  - lr_find() shows a funky graph.
  - using Classification.Interpretation and running most_confused() throws an index error.
  - tyoc213: Adding tensorboard callback and training also throws index error
Next steps
- Approach:
  - Don't go for big-bang (+ lots of debug) approach
  - Start with the simplest model and learner that is running on a TPU
    - then start adding more fastai stuff incrementally and at each step, retest
    - if bug found, then let's fix that quickly before moving on.
  - Goal: start from the simplest stuff that is running, then iterate in small steps, but quickly.
  - Along side iterating in rapid cycles, keep track of GPU baseline vs TPU performance.
    - slow TPU performance may indicate something funky in our implementation or in fastai code vis-a-vis TPU.
  - Put off multi TPU core for later - once we get single TPU running well.
- Plans:
  - Next meeting schedule: 2020-07-09 6-830PM PDT
  - In the meantime:
    - @tyoc213 to fork project
    - build a simple baseline model
      - run on GPU and measure loading/training/inference time for baseline GPU performance
      - simplest learner possible that is running on TPU
        
        no batch transforms
        
        minimum training callbacks
        
        simple and small dataset
        
        simple model architecture
      - measure loading/training/inference time for baseline TPU performance
    - start looking at bugs
      - @butchland - look at slow running of batch transforms - look at increasing train and valid loss on unfreeze (use fit not fit_one_cycle)
      - @tyoc - fork project and start studying it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meetings

Meetings

Meeting: 2020-07-06 9-11 PM PDT

Clone this wiki locally