This repository has been archived by the owner on Feb 22, 2023. It is now read-only.
0.3.0
Major changes:
- We now provide two different extension array implementations.
There now is the more simplerFletcherContinuousArray
which is backed by apyarrow.Array
instance and thus is always a continuous memory segments.
The initialFlectherArray
which is backed by apyarrow.ChunkedArray
is now renamed toFletcherChunkedArray
.
Whilepyarrow.ChunkedArray
allows for more flexibility on how the data is stored, the implementation of algorithms is more complex for it.
As this hinders contributions and also the adoption in downstream libraries, we now provide both implementations with an equal level of support.
We don't provide the more general named classFlectherArray
anymore as there is not a clear opinion on whether this should point toFletcherContinuousArray
orFletcherChunkedArray
.
As usage increases, we might provide such an alias class in future again. - Support for
ArithmeticOps
andComparisonOps
on numerical data as well as numeric reductions such assum
.
This should allow the use of nullable int and float type for many use cases.
Performance of nullable integeter columns is on the same level as inpandas.IntegerArray
as we have similar implementations of the masked arithmetic.
In future versions, we plan to delegate the workload into the C++ code ofpyarrow
and expect significant performance improvements though the usage of bitmasks over bytemasks. any
andall
are now efficiently implemented on boolean arrays.
We blogged about this and how its performance is about twice as fast while only using 1/16 - 1/32 of RAM as the reference boolean array with missing inpandas
.
This is due to the fact that prior topandas=1.0
you have had to use a float array to have a boolean array that can deal with missing values.
Inpandas=1.0
a newBooleanArray
class was added that improves this stituation but also change a bit of the logic.
We will adapt to this class in the next release and also publish new benchmarks.
New features / performance improvements:
- For
FletcherContinuousArray
in general and allFletcherChunkedArray
instances with a single chunk, we now provide an efficient implementation oftake
. - Support for Python 3.8 and Pandas 1.0
- We now check typing in CI using
mypy
and have annotated the code with type hints.
We only plan to mark the packages aspy.typed
whenpandas
is also marked aspy.typed
. - You can query
fletcher
for its version viafletcher.__version__
- Implemented
.str.cat
as.fr_text.cat
for arrays withpa.string()
dtype. unique
is now supported on all array types wherepyarrow
provides aunique
implementation.