Terrible performance on datasets which have a high percentage of rows containing PII #204

olorin · 2017-04-26T23:56:22Z

Not sure which yet - something to do with unicode? Details TBA.

olorin · 2017-04-27T01:36:32Z

It's the PII observation state, after number of observations gets above some threshold (presumably governed by the amount of PII vs non-PII rows) performance goes out the window (not sure how much exactly yet, but definitely superlinear). #202 related

olorin · 2017-04-27T01:46:37Z

Got it, updatePIIObservations is accidentally O(n) when it should be constant-time.

olorin changed the title ~~Terrible performance on some datasets~~ Terrible performance on datasets which have a high percentage of rows containing PII Apr 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terrible performance on datasets which have a high percentage of rows containing PII #204

Terrible performance on datasets which have a high percentage of rows containing PII #204

olorin commented Apr 26, 2017

olorin commented Apr 27, 2017

olorin commented Apr 27, 2017

Terrible performance on datasets which have a high percentage of rows containing PII #204

Terrible performance on datasets which have a high percentage of rows containing PII #204

Comments

olorin commented Apr 26, 2017

olorin commented Apr 27, 2017

olorin commented Apr 27, 2017