Merge pull request #105 from xhochy/changelog

Prepare Changelog for 0.3.0
xhochy · Feb 7, 2020 · d31bc1b · d31bc1b
2 parents d320591 + a36b9e5
commit d31bc1b
Showing 1 changed file with 37 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,43 @@
 Changelog
 =========
 
+0.3.0
+-----
+
+Major changes:
+ * We now provide two different extension array implementations.
+   There now is the more simpler `FletcherContinuousArray` which is backed by a `pyarrow.Array` instance and thus is always a continuous memory segments.
+   The initial `FlectherArray` which is backed by a `pyarrow.ChunkedArray` is now renamed to `FletcherChunkedArray`.
+   While `pyarrow.ChunkedArray` allows for more flexibility on how the data is stored, the implementation of algorithms is more complex for it.
+   As this hinders contributions and also the adoption in downstream libraries, we now provide both implementations with an equal level of support.
+   We don't provide the more general named class `FlectherArray` anymore as there is not a clear opinion on whether this should point to `FletcherContinuousArray` or `FletcherChunkedArray`.
+   As usage increases, we might provide such an alias class in future again.
+ * Support for `ArithmeticOps` and `ComparisonOps` on numerical data as well as numeric reductions such as `sum`.
+   This should allow the use of nullable int and float type for many use cases.
+   Performance of nullable integeter columns is on the same level as in `pandas.IntegerArray` as we have similar implementations of the masked arithmetic.
+   In future versions, we plan to delegate the workload into the C++ code of `pyarrow` and expect significant performance improvements though the usage of bitmasks over bytemasks.
+ * `any` and `all` are now efficiently implemented on boolean arrays.
+   We [blogged about this](https://uwekorn.com/2019/09/02/boolean-array-with-missings.html) and how its performance is about twice as fast while only using 1/16 - 1/32 of RAM as the reference boolean array with missing in `pandas`.
+   This is due to the fact that prior to `pandas=1.0` you have had to use a float array to have a boolean array that can deal with missing values.
+   In `pandas=1.0` a new `BooleanArray` class was added that improves this stituation but also change a bit of the logic.
+   We will adapt to this class in the next release and also publish new benchmarks.
+
+New features / performance improvements:
+ * For `FletcherContinuousArray` in general and all `FletcherChunkedArray` instances with a single chunk, we now provide an efficient implementation of `take`.
+ * Support for Python 3.8 and Pandas 1.0
+ * We now check typing in CI using `mypy` and have annotated the code with type hints.
+   We only plan to mark the packages as `py.typed` when `pandas` is also marked as `py.typed`.
+ * You can query `fletcher` for its version via `fletcher.__version__`
+ * Implemented `.str.cat` as `.fr_text.cat` for arrays with `pa.string()` dtype.
+ * `unique` is now supported on all array types where `pyarrow` provides a `unique` implementation.
+
+0.2.0
+-----
+
+ * Drop Python 2 support
+ * Support for Python 3.7
+ * Fixed handling of `date` columns due to new default behaviours in `pyarrow`.
+
 0.1.2
 -----