Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining optimal C for AEH #4

Open
dpcook opened this issue Feb 28, 2018 · 3 comments
Open

Determining optimal C for AEH #4

dpcook opened this issue Feb 28, 2018 · 3 comments

Comments

@dpcook
Copy link

dpcook commented Feb 28, 2018

Just wondering if there's a good way to determine an optimal C for AEH, or perhaps a metric to assess the quality (may be a bad word for it) of a given pattern.

@Puriney
Copy link

Puriney commented Mar 30, 2018

+1

I guess it is like choosing the dimensions of tSNE / MDS, or even PCA (but PCA has %explained_variance as ref). So far my quick solution is to choose K=5-7 and then run GO to see what are the biological meanings behind the "pattern".

@vals
Copy link
Member

vals commented Mar 30, 2018

It is a very hard problem, and in general something that would be an important (but challenging) problem for the scRNA-seq field in general. The GO enrichment strategy seems pretty reasonable, but you might miss novel things And there's something weird about how findings are typically only reported when they have significant GO categories; this will feed into newer GO annotations (since they are based on literature), then there is like a feedback loop.

I used to be excited about Dirichlet processes for this general problem: we investigated it in Lönnberg et al for number of pseudotime trajectories for example. And it was something we considered for a while with AEH. It didn't work well, and I've seen some papers describing that even when simulating from Dirichlet processes models, the same model cannot infer the correct number of clusters (unfortunately I can't find a reference right now.)

This is why I made C an explicit parameter in this implementation so that it is clear it's a choice made by the researcher.

I'll keep this issue open so people can discuss and suggest things. Maybe eventually we can come up with a great idea.

@jon-xu
Copy link

jon-xu commented Dec 10, 2019

I guess if we can find a metric describing the diversity of the C groups, we can try different C's and find the elbow point as the optimal C... - Not sure but just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants