Add `use_sparse_hessian` functionality to prevent a dense Hessian from being instantiated #543

AlanTuQC · 2022-05-19T20:54:02Z

When use_sparse_hessian=True, state.hessian is stored as a sparse COO matrix. Partially addresses Issue #485 -- the dataset with 4M+ columns can now run without a memory error.

abudis · 2023-02-08T10:16:16Z

Hey Alan, thanks for looking into this! Is this planned to be merged in the near future? I'm experiencing a similar problem with a large sparse matrix.

lbittarello · 2023-03-13T10:06:55Z

Please reopen when it's ready for review.

zmbc · 2024-10-29T21:01:56Z

My data is not much wider than it is long -- it is roughly square (and large). My sense is that this PR, if completed and merged, would make glum work for me -- or should I expect that glum's solvers wouldn't work even if they could be run?

Even if not, @tbenthompson said on #485 that

the implementation of a solver more appropriate for [K >> N] would not be a huge undertaking and 75% of the pieces necessary already exist within glum

What is missing? I'd be happy to contribute something here, as I have yet to find a Python library that can handle this case without regularization.

zmbc · 2024-10-29T22:03:35Z

I started on this in #869, but I ran into this: Quantco/tabmat#17. I wonder if @ElizabethSantorellaQC might be able to offer some guidance. The text in that issue seems to mostly be focused on how to tell whether or not a sandwich product will be sparse, but even the option for the user to say "I know it will be, represent it as a sparse matrix" would be enough for my purposes.

stanmart · 2024-10-30T09:05:25Z

Thank you so much for looking into this feature! I agree, as a first step I would not worry about using heuristics to try to guess whether the hessian will be sparse, but just rely on the user telling glum whether it should be. (Btw I doubt Elizabeth will chime in on this as she hasn't worked on glum in a while.)

AlanTuQC added 9 commits May 13, 2022 15:57

Add option to make state.hessian a sparse COO matrix

b148845

fisher_diag seems to work

2bddab6

Pass sparse array into _cd_fast.pyx instead of dense array.

01fab32

Cleaned and ready for review.

149f890

Address comments; about to split into two PRs

a58dfa1

Address comments; about to split into two PRs

c5f5667

Put back some old comments that were mistakenly removed

684de67

One-letter fix

4d628a4

use_sparse_hessian code isolated in this branch

39c2b10

AlanTuQC requested review from MarcAntoineSchmidtQC and BenBarrettQC May 19, 2022 21:08

lbittarello closed this Mar 13, 2023

zmbc added a commit to zmbc/glum that referenced this pull request Oct 29, 2024

Rebase changes from Quantco#543

df10408

zmbc mentioned this pull request Nov 1, 2024

Support sparse output for sandwich products Quantco/tabmat#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `use_sparse_hessian` functionality to prevent a dense Hessian from being instantiated #543

Add `use_sparse_hessian` functionality to prevent a dense Hessian from being instantiated #543

AlanTuQC commented May 19, 2022

abudis commented Feb 8, 2023

lbittarello commented Mar 13, 2023

zmbc commented Oct 29, 2024

zmbc commented Oct 29, 2024

stanmart commented Oct 30, 2024

Add use_sparse_hessian functionality to prevent a dense Hessian from being instantiated #543

Add use_sparse_hessian functionality to prevent a dense Hessian from being instantiated #543

Conversation

AlanTuQC commented May 19, 2022

abudis commented Feb 8, 2023

lbittarello commented Mar 13, 2023

zmbc commented Oct 29, 2024

zmbc commented Oct 29, 2024

stanmart commented Oct 30, 2024

Add `use_sparse_hessian` functionality to prevent a dense Hessian from being instantiated #543

Add `use_sparse_hessian` functionality to prevent a dense Hessian from being instantiated #543