-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GROOVY-6992: Added support for collecting over sets #498
base: master
Are you sure you want to change the base?
Conversation
An example showing the ramifications of this breaking change:
So for |
Another example:
Obviously you can get the alternatives:
|
For me, I found the behaviour of mapping over sets surprising. I thought mapping over functors should return the same type and initially I thought a set would be a functor. For reference, the haskell signature is: fmap :: (a -> b) -> f a -> f b The bigger issue is should all collections be transformed into lists by default? I had a look at the JDK to see what this could also apply to and the main interfaces I saw were List, Set and Queue. Would this also apply to the Iterable interface? Hmm, after thought Iterable is probably less relevant because it is accessing a structure sequentially like a list. Now perhaps a set is not a functor. You can't guarantee that composing two functions together will get the same result as applying them individually: set.map(f).map(g) != set.map(f.andThen(g)) For more on the design issue of set and functor in Scalaz: |
Paul, I had not seen using '*' to convert to sets, only the spread operator for collect. How does it convert to a set and what is the difference between a and b below (where a is a set and b is a list)
|
Does the same thing apply to collectMany? |
The list despread operator has nothing to do with Set assignment - it was just a way of removing some dupe in the above example (you could wonder why there is no set despread operator but that's perhaps for another day). There is special target typing performed for list literals. The type given to the list literal is given special treatment for typed declarations with a Collection type. Think "diamond" operator in Java or method parameter target typing in Java 8. So basically |
Yes, the type is retained for methods which return a filter, subset or expansion of some original collection, e.g. things like: One thing that is being thought of for 2.4+ is improved target type inferencing for DGM methods. So for the methods above (but not |
We are chasing our tails because we don't have higher-order types. I want the to use the functor typeclass, or alternatively in OO terms, have a co-variant return type that matches the structure we are performing the method on (which means duplicate code compared to typeclasses). My argument is we should be allowed to override the method with a co-variant return type, rather than keep it invariant as Collection. Why should list be regarded as special that everything should return? Why should collect and collectMany and list be treated specially for the 2.4 type inference proposal? Would the target type inferencing for RetainType work statically? I can see this would be ok dynamically, I guess it could work statically by generating methods for each subclass at compile time. I guess the compiler could use that information to do type inference fro each call to the method. Converting everything back to list makes it harder to write more general libraries if list is treated specially. |
One idea for 2.4 that I have not thought through is to be able to attach a trait to an existing class, rather than have to write extension methods to add those methods. So if trait T is attached to class C, an object of type C could be passed around as a T. |
+1 for higher-order kinds. But how to implement nicely? So many questions! :-) There is no implementation of @RetainType at present - just design discussions and prototyping bits of it. The current early design ideas are the @RetainType would be used only for type inferencing during static compilation. It wouldn't generate a whole family of methods - so no e.g. Set variant would be visible from Java for instance - it would just know about the right type in use when checking for errors and could do appropriate casts in some places if needed at the bytecode level. I'd have to think a bit more about your trait idea but I am sure the Groovy team is open to useful suggestions. Do you have some concrete examples of how the idea would work? |
I think the better implementation way for this is to change the return type of But I think one of the biggest problem is the upper compatibility as Paul said. |
Hi guys. Note that everything that changes the method signature is unlikely to be accepted anytime soon as it is a binary breaking change. |
This discussion shows one fundamental problem in those collections based operations we offer, and that is trying to keep the characteristics of the origin, while at the same time not being able to really do that. For example, even if we produce a LinkedHashSet for an LinkedHashSet, as soon as a custom Set class is used, all bets are of. And maybe Groovy started to develop a bit into the wrong direction here We started of with an List type (I think ArrayList) as basic result of operations. If this is known, then all users have to do is to convert. Looking at Java 8, that result type is Stream. If you feed in a Set, you will later have to convert the Stream back to a Set. Currently this looks to me like a better approach. Then about functors... I am a bit confused, and surely you know much more about functional programming terms than I do... but I thought a functor is a function, not a type. So in my understanding you want collect to be a functor, not Set. Going to So for example:
This assert is already true and for this here it does not matter if the result of Set#collect is a List or a Set. Sure, it is no homomorphism then. I am aware For me the practical problem is more, that without some kind of lazy evaluated structure you won't get any advantage of this. the left side of the assert will performane worse than the right side. |
One option for the upper compatibility problem is to use a new method name
And write "We recommend you to use
In Groovy, if you don't use |
You are probably familiar with these references already @paulk-asert, but perhaps others aren't. The story of how higher order types got into Scala is pretty interesting. Vincent Crement wrote a compiler for Featherweight Generic Java (FGJ) and generalised parameters representing types to type constructors. The paper from his work is from 2008 (http://www.jot.fm/issues/issue_2008_06/article2.pdf). From the implementation section on p 40: Odersky I think was at or around the Swiss university at the time and integrated this into Scala. His paper on the topic is "Generics of a Higher Kind", http://adriaanm.github.io/files/higher.pdf. Perhaps in another dimension Java would have higher order types before Scala! The site where this work was done was down for a long time, but I just followed the link tonight and it looks like it is up again (http://lampwww.epfl.ch/~cremet/FGJ-omega/). I have not had a chance to look around the site yet, but the compiler and the Coq proofs should be interesting. I will have a think more to flesh out the trait idea. At the moment we can add methods to classes, but why not add whole traits/interfaces? I think this would mean Groovy had similar functionality to Haskell typeclasses and Scala implicits. @yukoba made a comment on the return type that we should use createSimilarCollection instead of defaultSet. Note that all of this is not ideal as it will only create a similar collection for those few classes that we code for. This is not a general solution. A general solution here requires a function that creates the structure i.e. (Unit -> C), where C is a collection, most often known as unit on monad with signature (A -> M). I have been exploring these ideas on a series of blog posts (http://mperry.github.io/). I am part of the way through writing up a post about implementing monads, but you can see the results already here (https://github.com/mperry/functionalgroovy/tree/master/typeclass/src/main/groovy/com/github/mperry/fg/typeclass). The concrete folder under the previous URL shows concrete functor, applicative and monad instances for lists and options. @blackdrag, you might find it useful to start here on functors. A functor is just any class with a single generic type T that implements the map/collect/fmap method with signature I have then been adding these methods to existing classes through Groovy extension modules. Here is an example for adding the monadic methods to List in ListMonadExtension (https://github.com/mperry/functionalgroovy/tree/master/main/src/main/groovy/com/github/mperry/fg). In essence we create a concrete method for each monad method whose implementation calls the appropriate monadic method. So we add flatMap/liftM2/other monad methods to SetExtension2, but the implementation calls the monad implementation for Set that just defines unit and flatMap. Here is where attaching the trait to an existing class would be useful. If we attached ListMonad to List then List would get all the monad methods: unit, flatMap, sequence, traverse, liftM2, etc. This would allow us incredible flexibility where the original class writer does not need to see all possible abstraction for their class. For example, this would allow Lists and other monads to be used in monad comprehensions that rely on the presence of flatMap and map. Groovy is moving towards more commitment to type safety. The advantage of all this better abstraction, more reuse, better reasoning about code, fewer defects and faster development time. This is a pretty long winded comment, so I hope others find it useful. |
Ok, so I wrote I longer comment here and decided to delete it all again, because there is one point I don't really have a clue about yet. You define a functor as any class that defines Of course you cannot define a functor trait like this. And I think that is really what you are after in the end. |
I think so too. But isn't changing those system is quite hard task and lost compatibility? |
I will try to explain the pseudo Java for the This is not that important for Functor because it has just a single method. Applicative (http://mperry.github.io/2014/07/02/groovy-applicatives.html) extends Functor and has two abstract methods (pure and apply). From these two methods, many other useful concrete methods can be derived which all use the type parameter of the class. The examples I gave for Applicative were liftA3 and sequenceA which both use the type parameter for Applicative. A good introduction to the topic might be http://learnyouahaskell.com/functors-applicative-functors-and-monoids. What may be confusing is that my proposal to add a trait to the class has a mismatch on the usual 'this' parameter. I have not thought carefully about this yet and so it is unresolved. |
@yukoba The problem you are having is that the map method of Functor can't be implemented by itself in a vacuum because it needs to know how to create the new structure. Any monad (with the unit and flatMap methods) can implement Functor's map method like this (https://github.com/mperry/functionalgroovy/blob/master/typeclass/src/main/groovy/com/github/mperry/fg/typeclass/Monad.groovy). Groovy cheats by knowing about how to implement the On my todo list is to have a close look at what is available in PCollections and compare it to the immutable data structures available in FunctionalJava. |
Mark introductions are often fine and all, but they do for example not really explain why a functor should be a homomorphism. And what implications it has if not... in other words I am missing the bigger picture. |
I will write https://en.wikibooks.org/wiki/Haskell/Understanding_monads/List more simply. If a collection library implements these two methods, those collection classes can handle in the same way. 1."Haskell return" == "Functional Groovy unit()"
2."Haskell binding operator" == "Functional Groovy flatMap()"
However, the characteristic feature of Groovy is Groovy uses normal Java collection classes. |
One possible solution might be this. First create a marker interface.
Because Also in
|
I made a test implementation of above. See yukoba@45402ab . Normal, |
Sorry, |
I think what Mark wants is something like this: trait CollectionFactory<T> {
abstract T createSimilar()
}
trait ArrayListFactory<T> implements CollectionFactory<ArrayList<T>> {
public ArrayList<T> createSimilar() { return new ArrayList<T>() }
}
ArrayList<Integer> foo = ...
def sim = foo.createSimilar() // trait mixed into ArrayList This cannot be done using an extension module. It also fails compiling, and it's a bug using: def list = [1,2].withTraits(ArrayListFactory)
list.createSimilar() Yet, one doesn't need traits to do this. With extension modules, you can already declare those: class CreateSimilarCollections {
static <T> ArrayList<T> createSimilar(ArrayList<T> self) { ... }
} The problem with this approach is that even if the method is available at runtime, there is no marker interface added to the class which will let you know that the method is available. This problem cannot be fixed with the traits approach either: unless Groovy uses class rewriting at load time, the JVM will not let you add methods or interfaces to existing classes. @yukoba: This approach requires to implement the marker interface, which is probably not what users want. Also calling |
So this problem boils down in the inability to specify generic enough signatures on the typing side and a missing unit morphism for Collections on the practical side. createSimilarCollection cannot be a replacement for unit, because it returns not the exact same container, only a similar one The Java8 solution for this is the conversion to the Monad Stream and then convert back to whatever is needed. Of course at least the last step is done by the user. And cannot be done by us, since unit is missing or incomplete. For pcollections I can solve the problem of missing unit, but adding a fitting method to the interface everyone is supposed to overwrite. |
So,
Therefore, |
@melix, is it possible to check the existence of |
It is possible to check the existence of the method at runtime. Performance wise, this would however be terrific. |
The first argument of If you create two caches
and if you check it before searching |
Using cache classes is not really an option. It would be very difficult not to leak classes, especially in a language like Groovy where some classes can have a very short lifecycle. In any case checking against a Set does have a cost. The cascade of |
I want to add, that we don't have to use Stream. We can make a Monad list and Monad set that is returned by our operations plus conversions methods to other types. Just Java standard lists and sets would not be Monads - maybe not even functors. Wouldn't that be a possible way to go? |
If @melix thinks we can't change the interfaces to existing classes, there is another way. Scala and Haskell use implicits/typeclasses to add methods to existing types. We could implement something like this for Groovy that would have implications for the extensions modules. Groovy has the range operator which you can use like this: This could be done in a new, but similar, way by having compiler supported conversion between two types. This change would support something like the code below:
Then when compiling a call A compiler supporting this would, in effect, allow you to be adding interfaces to existing classes without changing the JVM interfaces of existing classes. This could be done in a type safe way. A potential problem is how to handle this using dynamic typing. http://docs.scala-lang.org/overviews/core/implicit-classes.html This would allow us to add abstractions to existing classes. Of course, this could be existing Java classes, classes from other Groovy libraries or separate abstractions and couple them in a structural way, rather than in the nominative Java way. What would be worth exploring is the different ways this could be combined with existing features, like implicits implementing traits, or using the delegatesTo annotation. Combined with some traits like Eq, Ord, Enum could make things interesting (https://en.wikibooks.org/wiki/Haskell/Classes_and_types). It might even simplify some Groovy conversions that I suspect are currently done in an ad-hoc way. |
…ide a default undefined value for better Java compatibility (closes groovy#498)
As of Groovy 4, the @groovy.transform.TypeChecked
void test() {
def one_two_three = [1,2,3]
@groovy.transform.ASTTest({
println node.getNodeMetaData(org.codehaus.groovy.transform.stc.StaticTypesMarker.INFERRED_TYPE)
})
def two_times = one_two_three.collect(new HashSet<Number>()) { it * 2 }
}
test() This script prints "java.util.HashSet<java.lang.Number>" for the inferred type of |
I found your example hard to follow @eric-milles, but it is promising if collect preserves the type. |
Pull request for https://jira.codehaus.org/browse/GROOVY-6992.
Collecting over sets without passing in a collection to add to returns a list. We really want it to return a set.
I added support for this to DefaultGroovyMethods for a set using the default set of LinkedHashSet. I noticed this was used as the default. It worked for HashSet too, but I have kept the current default. I initially tried TreeSet, but this class is broken in Java and should always require an ordering operation for elements.
A warning, this may break existing code that relies on collecting over a set returning a list. For this behaviour, toList() can be called either before or after the collect method.