You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in generating nicheformer embeddings using the available checkpoint. Is there an end-to-end tutorial showing how to tokenize a data source into the format expected by the model? I've tried the following but it seems like the data/model_means/model.h5ad file contains only a single observation (which appears to be unexpected in the related tokenization notebooks).
I installed the nicheformer package and downloaded the pretained model weights from Mendeley.
I downloaded and preprocessed the exemplar spatial and dissociated datasets using the download_* and preprocess_* scripts in data/spatialcorpus-110M/spatial/examplary-Xenium and data/spatialcorpus-110M/dissociated/Lu_2021, respectively. I updated the default paths in the constants file.
Following nicheformer/tree/main/notebooks/tokenization/xenium_human_lung.ipynb, I tried to run the tokenization process for .../spatial/preprocessed/Xenium_Preview_Human_Non_diseased_Lung_With_Add_on_FFPE_outs.h5ad. I mapped DATA_PATH to this h5ad (corresponds to healthy in the notebook?), xenium_mean to data/model_means/xenium_mean_script.npy, and model to data/model_means/model.h5ad.
It appears that my xenium object contains the expected obs, var, etc. subobjects with fewer samples than the shapes logged in the notebook. However, model seems to be missing observations?
I believe this breaks the inner join in the following block since the resulting post-join xenium object ends up with a shape of AnnData object with n_obs × n_vars = 295883 × 391 rather than the notebook's logged AnnData object with n_obs × n_vars = 827048 × 20310:
adata = ad.concat([model, xenium], join='inner', axis=0)
# dropping the first observation
xenium = adata[1:].copy()
# for memory efficiency <
del adata
Would it be possible to add an updated model.h5ad file and a more detailed end-to-end example so that we can try to format out datasets to match the tokenized representations expected by the Nicheformer.get_embeddings() method?
The text was updated successfully, but these errors were encountered:
Hi @AnnaChristina ,
I am interested in generating nicheformer embeddings using the available checkpoint. Is there an end-to-end tutorial showing how to tokenize a data source into the format expected by the model? I've tried the following but it seems like the
data/model_means/model.h5ad
file contains only a single observation (which appears to be unexpected in the related tokenization notebooks).nicheformer
package and downloaded the pretained model weights from Mendeley.download_*
andpreprocess_*
scripts indata/spatialcorpus-110M/spatial/examplary-Xenium
anddata/spatialcorpus-110M/dissociated/Lu_2021
, respectively. I updated the default paths in the constants file..../spatial/preprocessed/Xenium_Preview_Human_Non_diseased_Lung_With_Add_on_FFPE_outs.h5ad
. I mappedDATA_PATH
to thish5ad
(corresponds tohealthy
in the notebook?),xenium_mean
todata/model_means/xenium_mean_script.npy
, andmodel
todata/model_means/model.h5ad
.It appears that my
xenium
object contains the expectedobs
,var
, etc. subobjects with fewer samples than the shapes logged in the notebook. However,model
seems to be missing observations?I believe this breaks the inner join in the following block since the resulting post-join
xenium
object ends up with a shape ofAnnData object with n_obs × n_vars = 295883 × 391
rather than the notebook's loggedAnnData object with n_obs × n_vars = 827048 × 20310
:Would it be possible to add an updated
model.h5ad
file and a more detailed end-to-end example so that we can try to format out datasets to match the tokenized representations expected by theNicheformer.get_embeddings()
method?The text was updated successfully, but these errors were encountered: