Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporating cost #2

Open
Malnammi opened this issue Jan 9, 2019 · 3 comments
Open

Incorporating cost #2

Malnammi opened this issue Jan 9, 2019 · 3 comments

Comments

@Malnammi
Copy link
Collaborator

Malnammi commented Jan 9, 2019

See strategy at #1.

Currently, the code implements budget constraints via batch_size with current parameters of [96, 384, 1536] relating to microplate sizes in practice. The problem is that we don't consider molecule costs when selecting clusters/instances in our strategy. We purely exhaust the batch_size.

An alternative would be to use a combination of budget and batch_size, where we want to exhaust the batch_size but not go over the budget.

I see two methods of doing this:

  1. Incorporate cost into the exploitation and exploration weight equations via avg cluster cost. This would mean the strategy tries to also select clusters with low avg cost.
  2. The cost comes into play when selecting instances from a cluster. If the cost of the sampled instance exceeds a certain percentage of the overall budget, then that instance is dropped.

I am leaning towards method 2.

Please discuss or propose any other solutions.

@agitter
Copy link
Member

agitter commented Jan 16, 2019

One important consideration is whether compounds all have uniform cost. Scott is exploring this by obtaining quotes from different vendors.

@Malnammi
Copy link
Collaborator Author

Additional feedback from the group. We now have multiple costs to consider when comparing an iterative screening effort vs an one-big-screen.

  1. Compound Cost: This cost is associated with purchasing molecules. A molecule might already be procured and thus have a cost of 0.
  2. Labor Cost: This generally includes the time-cost of procuring the molecule, setting up and running the physical experiment, and getting back the digital results. We will have a ballpark number for these.

For simulation purposes, each iteration of the active learning pipeline, we record various evaluation metrics. In addition, we should record these cost metrics as well for later analysis.

@agitter
Copy link
Member

agitter commented Jan 30, 2019

There are at least three modes for iterative screening and cherry picking.

Mode 1: cherry pick from compounds at SMSF (LC and MLPCN libraries). The compound cost is low, the labor cost is high because it may involve selecting a different plate for each compound in the batch.

Mode 2: purchasing compounds from a vendor like ChemDiv. The vendor would have a library of > 1 million compounds. They would likely prepare fixed-sized plates for us so the labor of cherry picking would be incorporated in the compound cost. At least for some vendors, we can get a quote for a constant cost per compound.

Mode 3: a virtual library from multiple vendors like ZINC. There would be high labor cost if it takes a lot of time to assess which prioritized compounds can even be purchased. There would be variable compound cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants