Support ffill and bfill operations while remaining sparse #285

p-d-moore · 2019-09-04T07:19:28Z

I would like to add a request for sparse: Support ffill and bfill operations while maintaining the sparse level of data density.

The challenge to overcome is that performing ffill operations on sparse data quickly creates data that is no longer "sparse" in practice and makes dealing with the data challenging.

My suggested implementation (and the way I have previously done this in another programming environment) is to represent the data as rows of contiguous regions with a single (non-sparse) value rather than rows of single points. That is, the data then is represented as a list of values + coordinate ranges rather than a list of values + coordinates. This request might make more sense in the particular context of the sparse value being NaN.

The idea is that you can easily compute operations like ffill without changing the sparsity of the matrix, and thus support typical aggregating functions you might like to apply to the data before you collapse the data and convert to a non-sparse form (e.g. perform a lag difference or a cross-sectional mean). These types of operations can be more useful when the data is "fuller" such as after a forward fill, but often not useful when the data is very sparsely populated (as the cross-sectional operations are unlikely to hit the sparse data among the different dimensions).

Care must be taken to avoid "collisions" between sparse blocks of data, that is, avoiding that the list of sparse blocks accidentally overlap. The implementation can get tricky but I believe the goal to be worthwhile. It may be a large enough change to make it a separate class, at least initially.

hameerabbasi · 2019-09-04T07:27:31Z

Hi! Would making the "fill-value" (as we call it) a reverse-broadcasted version of the "dense" part help? This seems to be a very niche feature.

p-d-moore · 2019-09-04T07:51:40Z

Hi @hameerabbasi, I have to confess, I am not sure what you mean by a reverse-broadcasted version of the dense part? The request is related to this xarray discussion

The request is really to generalise the current Sparse class to represent data where (non fill-value) values are repeated consecutively. Such data often arises from ffill / fillna type operations.

I agree the feature is in danger of being somewhat niche, unless it finds wider support. The usage case is where we have sparse observations of some data which we want to aggregate along a given dimension. Because the data is sparse, it becomes difficult to aggregate unless a ffill / fillna type operation can be first applied, but performing these operations tends to lead to data that is no longer sparse and increases memory usage (the purpose of using sparse to begin with).

hameerabbasi · 2019-09-04T08:49:26Z

I believe in that case making broadcast_to a view would suit your needs, and be of more use generally.

hameerabbasi · 2019-09-04T09:06:14Z

Wait, I just read the documentation for ffill/fillna. This should be possible simply initially keeping the fill-value as NaN, and later simply changing the fill-value to what was suitable via (for example) (np.where(np.isnan(arr), value_to_replace_nan_with, arr)).

p-d-moore · 2019-09-04T09:46:44Z

Sorry I mean ffill as opposed to fillna (a red herring there).

By ffill I mean in Pandas or xarray

That is, copying each point of non-NaN data forward along a given dimension (or down rows of a dataframe) a set number of times or unless it collides with another datapoint. The data now consists of regions with the same value repeated, each region may take on a different value.

hameerabbasi · 2019-09-04T09:48:08Z

This is a lot more niche than I initially assumed... I really doubt this is in scope here.

p-d-moore · 2019-09-05T01:45:03Z

What I want to do might not fit well in this project on further consideration. I would like to replicate something I built in the past but this is probably takes on a different more specialised format than the goal of sparse is.

hameerabbasi added the enhancement Indicates new feature requests label Sep 4, 2019

p-d-moore closed this as completed Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ffill and bfill operations while remaining sparse #285

Support ffill and bfill operations while remaining sparse #285

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 5, 2019

Support ffill and bfill operations while remaining sparse #285

Support ffill and bfill operations while remaining sparse #285

Comments

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 4, 2019

hameerabbasi commented Sep 4, 2019

p-d-moore commented Sep 5, 2019