-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create general na functions for treedata #25
Comments
@uyedaj thoughts? |
Actually this approach won't work with the lapply approach that is used by aceArbor and others. Ergh. |
If want to feed an entire data frame and run it for each column, then the functions need to take care of each column individually (as done in aceArbor), but the other possibilities can be filtered using treeplyFilter right now. We could automate this process with a friendlier wrapper that allowed you to filter selected columns for NA's and select Boolean operators ( "or" or "and"). That way we could cover the possiblities you list. |
yeah that's great. the filtering is not really being used heavily now in aceArbor, maybe persistent issues like #15. |
Would it be best if we coordinated this character & column management between the Romanesco and aRbor layers? I understand if you guys want to standardize it all at the R level to allow aRbor to be functional outside of Arbor proper, just wondering… On Sep 18, 2014, at 9:34 AM, Josef Uyeda [email protected] wrote:
|
Yes, so here is my thought: I think we need both. I think it's great if we have common operations done on data frames and trees available at the aRbor level (like eliminating NAs, filtering by category, select rows by condition etc.). These are duplicated right now in my treeplyr functions, and I don't think treeplyr should replace these most of the time. Where the treeplyr functions are really useful is that they can take any R expression, or combination of R expressions, to filter, select, mutate, or apply a function to a data frame/tree/tree+data.frame. This allows the user in aRbor quickly to apply a function to their data that we wouldn't want to implement as a stand alone function, because it would be too idiosyncratic to their particular purpose (e.g. 'if(island=='Cuba') {SVL * 10}' because your collaborator who measured Cuban anoles measured in centimeters rather than millimeters). Having a specific function for every imaginable operation isn't feasible. |
Agreed. I like the flexibility of having the power at both levels. To me, it seems like Arbor will gradually evolve into having different “collections” of operations. Some will be simple wrappers above the treeplyr/aRbor/rotl layer and others might be more involved at the work step algorithm level. This way there would be simple block collections and “power user” block collections available. A take away for your standup talks today could discuss how to create these separate “collections” of operations. On Sep 18, 2014, at 10:23 AM, Josef Uyeda [email protected] wrote:
|
I think we need three cases: single column (checks for NAs, removes from data and tree as needed); pairwise (removes any taxa not present in BOTH, for things like PGLS); and multivariate (removes any incomplete taxa, for things like phyloPCA).
The text was updated successfully, but these errors were encountered: