Notes:
- CPU offload functor to GPU with substantial member size
- Kernel is 'single shot' style, means no loop.
- Kernel access member for calculating tensor array strides and dimension edges
Makefile dissects SYCL compiler into multiple stages.
- Object stage, compile source files into offloading byte-code and host objects, generating integration headers
- Bundle both objects and byte-codes into single object
- Debug device compilation process by
make <appname>.debug
, it'll generate linking and device compilation process for IGC commands