Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Generate Linux ARM wheels #190

Merged
merged 1 commit into from
Jan 15, 2023
Merged

ENH: Generate Linux ARM wheels #190

merged 1 commit into from
Jan 15, 2023

Conversation

thewtex
Copy link
Member

@thewtex thewtex commented Jan 9, 2023

Features comes from the updated GitHub Action repository.

@thewtex thewtex requested review from tbirdso and N-Dekker January 9, 2023 15:53
@thewtex
Copy link
Member Author

thewtex commented Jan 9, 2023

@ntatsisk we are seeing a MONAI notebook testing error. Is this related to a MONAI update or something else?

=================================== FAILURES ===================================
40
_ /home/runner/work/ITKElastix/ITKElastix/examples/ITK_Example17_MONAIWithPreregistration.ipynb _
41
---------------------------------------------------------------------------
42
dataset = CacheDataset(data=training_datadict, transform=transforms)
43
dataloader = DataLoader(dataset=dataset, batch_size=16, shuffle=True)
44
---------------------------------------------------------------------------
45
ValueError                                Traceback (most recent call last)
46
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/transforms/transform.py:102, in apply_transform(transform, data, map_items, unpack_items, log_stats)
47
    101         return [_apply_transform(transform, item, unpack_items) for item in data]
48
--> 102     return _apply_transform(transform,data,unpack_items)
49
    103 except Exception as e:
50
    104     # if in debug mode, don't swallow exception so that the breakpoint
51
    105     # appears where the exception was raised.
52

53
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/transforms/transform.py:66, in _apply_transform(transform, parameters, unpack_parameters)
54
     64     return transform(*parameters)
55
---> 66 return transform(parameters)
56

57
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/transforms/utility/dictionary.py:328, in EnsureChannelFirstd.__call__(self, data)
58
    327 for key, meta_key, meta_key_postfix in self.key_iterator(d, self.meta_keys, self.meta_key_postfix):
59
--> 328     d[key] = self.adjuster(d[key],d.get(meta_keyorf"{key}_{meta_key_postfix}"))  # type: ignore
60
    329 return d
61

62
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/transforms/utility/array.py:231, in EnsureChannelFirst.__call__(self, img, meta_dict)
63
    230 if self.strict_check:
64
--> 231     raise ValueError(msg)
65
    232 warnings.warn(msg)
66

67
ValueError: Metadata not available and channel_dim=None, EnsureChannelFirst is not in use.
68

69
The above exception was the direct cause of the following exception:
70

71
RuntimeError                              Traceback (most recent call last)
72
Cell In[5], line 1
73
----> 1 dataset = CacheDataset(data=training_datadict,transform=transforms)
74
      2 dataloader = DataLoader(dataset=dataset, batch_size=16, shuffle=True)
75

76
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/data/dataset.py:814, in CacheDataset.__init__(self, data, transform, cache_num, cache_rate, num_workers, progress, copy_cache, as_contiguous, hash_as_key, hash_func, runtime_cache)
77
    812 self._cache: Union[List, ListProxy] = []
78
    813 self._hash_keys: List = []
79
--> 814 self.set_data(data)
80

81
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/data/dataset.py:841, in CacheDataset.set_data(self, data)
82
    838     indices = list(range(self.cache_num))
83
    840 if self.runtime_cache in (False, None):  # prepare cache content immediately
84
--> 841     self._cache = self._fill_cache(indices)
85
    842     return
86
    843 if isinstance(self.runtime_cache, str) and "process" in self.runtime_cache:
87
    844     # this must be in the main process, not in dataloader's workers
88

89
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/data/dataset.py:870, in CacheDataset._fill_cache(self, indices)
90
    868 with ThreadPool(self.num_workers) as p:
91
    869     if self.progress and has_tqdm:
92
--> 870         return list(tqdm(p.imap(self._load_cache_item,indices),total=len(indices),desc="Loading dataset"))
93
    871     return list(p.imap(self._load_cache_item, indices))
94

95
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/tqdm/std.py:1195, in tqdm.__iter__(self)
96
   1192 time = self._time
97
   1194 try:
98
-> 1195     for obj in iterable:
99
   1196         yield obj
100
   1197         # Update and possibly print the progressbar.
101
   1198         # Note: does not call self.update(1) for speed optimisation.
102

103
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/multiprocessing/pool.py:870, in IMapIterator.next(self, timeout)
104
    868 if success:
105
    869     return value
106
--> 870 raise value
107

108
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
109
    123 job, i, func, args, kwds = task
110
    124 try:
111
--> 125     result = (True, func(*args,**kwds))
112
    126 except Exception as e:
113
    127     if wrap_exception and func is not _helper_reraises_exception:
114

115
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/data/dataset.py:884, in CacheDataset._load_cache_item(self, idx)
116
    882         break
117
    883     _xform = deepcopy(_transform) if isinstance(_transform, ThreadUnsafe) else _transform
118
--> 884     item = apply_transform(_xform,item)
119
    885 if self.as_contiguous:
120
    886     item = convert_to_contiguous(item, memory_format=torch.contiguous_format)
121

122
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/monai/transforms/transform.py:129, in apply_transform(transform, data, map_items, unpack_items, log_stats)
123
    127     else:
124
    128         _log_stats(data=data)
125
--> 129 raise RuntimeError(f"applying transform {transform}") from e
126

127
RuntimeError: applying transform <monai.transforms.utility.dictionary.EnsureChannelFirstd object at 0x7fa66789afa0>

@ntatsisk
Copy link
Collaborator

Hi @thewtex, I managed to replicate the error locally. Indeed, I think it was caused by a MONAI update and I am suspecting that happened here: Project-MONAI/MONAI@7f16a15. I added a PR to fix here #191. However, I noticed that another error appears while pip installing itk-elastix==0.15.0 - I had to use version 0.14.4 for my local test.

@tbirdso
Copy link
Collaborator

tbirdso commented Jan 10, 2023

However, I noticed that another error appears while pip installing itk-elastix==0.15.0 - I had to use version 0.14.4 for my local test.

If I understand correctly you are referring to the macOS notebook failure in #191 , which seems to have been caused by a bad HTTP connection. I have restarted the notebook check in that PR.

Features comes from the updated GitHub Action repository.
@thewtex
Copy link
Member Author

thewtex commented Jan 12, 2023

Rebased on the MONAI notebook fix from @ntatsisk 🙏

@thewtex thewtex merged commit 4742884 into main Jan 15, 2023
@thewtex thewtex deleted the arm-wheels branch January 15, 2023 20:52
@tbirdso
Copy link
Collaborator

tbirdso commented Jan 16, 2023

I have restarted ARM builds in main after timeout at 6hrs. Looking at the checks in this PR it seems that ARM wheels always run very close to timeout at around ~5.5hrs to build. It takes a long time to build both Elastix and then ITKElastix wrappings.

If we see recurring timeouts and/or ARM build time becomes a problem for iterating on changes in PRs we could consider either 1) making ARM builds run only in the main branch for changes that are merged, 2) self-hosting an ARM runner on an ARM platform instance via Cirun, or 3) removing ARM wheels from the build schedule.

@N-Dekker
Copy link
Collaborator

N-Dekker commented Feb 6, 2023

@thewtex Matt, do you have a clue why the CI at main still has failures after the merge of this PR? At commit 4742884

@tbirdso
Copy link
Collaborator

tbirdso commented Feb 6, 2023

@N-Dekker It looks like ARM builds are still consistently taking 5.5 to 6 hours to build, while GitHub Actions runners time out at 6 hours. https://github.com/InsightSoftwareConsortium/ITKElastix/actions/runs/3925249143/jobs/6827215695

ARM builds take a long time because we are using emulation tools to build for the aarch64 target architecture on an x64 platform. See my previous comment for general ways to address. @thewtex may have additional input on how to streamline the ITKElastix build process to avoid timeout.

@thewtex
Copy link
Member Author

thewtex commented Feb 15, 2023

We should have much faster native cross-compilation builds if InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#56 works.

@tbirdso
Copy link
Collaborator

tbirdso commented Feb 15, 2023

We should have much faster native cross-compilation builds if InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction#56 works.

That feature request is for the manylinux2014 toolset, I don't think it would affect existing ITKElastix manylinux_2_28 builds?

@thewtex
Copy link
Member Author

thewtex commented Feb 15, 2023

The manylinux2014 build would replace the current manylinux_2_28 build. We could also create manylinux_2_28 equivalent by adjusting the glibc / c++ standard library version of the dockcross crosstool-ng configuration.

@tbirdso
Copy link
Collaborator

tbirdso commented Feb 15, 2023

Moving discussion to that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants