NanoSNAP is a small and portable signal, audio and noise processing library in C++11. 🤞 NanoSNAP depends only on C++11 STL.
- For running TTS(Text-to-speech) and ASR(Automatic Speech Recognition) on C++ Embedded device.
- Image processing with neural netowork inference on C++ and Embedded device.
- Implement audio and speech feature(e.g. using
rfft
,mfcc
stft
,istft
) on your C++ machine learning library.
Simply copy include
and src
folder to your platform.
- CMake(for building examples and tests, and build NanoSNAP as submodules)
- C++11 compiler
- Windows
- Linux
- macOS
- Android(not tested yet, but should work)
- Raspberry Pi(not tested yet, but should work)
- RISC-V(not tested yet, but should work)
If you want to build tests(building tests are enabled by default), you need to checkout submodules.
$ git submodule update --init --recursive --depth 1
$ mkdir build
$ cd build
$ cmake ..
$ make
> vcsetup.bat
Open build/nanosnap.sln
and build it.
$ mkdir build
$ cd build
$ cmake -DNANOSNAP_ENABLE_TESTS=On ..
$ make
$ ./bin/test_nanosnap
NANOSNAP_NO_STDIO
Disable IO. e.g.wav_read
is not available. This feature is useful when you want to use NanoSNAP in Android or embedded devices.
NanoSNAP takes raw pointer for input array values followin its length information(or shape information).
bool proc(const float *input, int n);
Output array is usually std::vector
type so that NanoSNAP can allocate buffer for output internally.
Output array is a functiona argument when a function needs to return the status.
bool proc(int n, std::vector<float> *output);
Otherwise, output array is a return value.
std::vector<float> proc(int n);
All API does not contain its internal state.
NanoSNAP API is re-entrant as it does not have any internal state, so it should be safe to use in multi-threading program unless input/output memory address does not overlap between threads.
-DSANITIZE_ADDRESS=On
: Enable Address Sanitizer(for developer).
NanoSNAP process 2D and higher ND array data as 1D flattened array.
The ordering of array data follows C language(This is same behavior in numpy
array in C mode). For example, img[H][W]
has W
pixels in width(colums) , H
pixels in height(rows).
-> memory address increases
+-----------+-----------+ +-------------+-----------+ +---------------+
| img[0][0] | img[0][1] | ... | img[0][W-1] | img[1][0] | ... | img[H-1][W-1] |
+-----------+-----------+ +-------------+-----------+ +---------------+
In contrary to numpy
or vision/ML library such like OpenCV, The notation of dimensional arguments for a function signature starts from inner most dimension(right-most array dim). This is rather common notation in C language and graphics community. i.e,
// `output` has the shape of [h][w]
void create_image(size_t w, size_t h, float *output);
// `output` has the shape of [d][h][w]
void create_3d_tensor(size_t w, size_t h, size_t d, float *output);
// `input` has the shape of [nrows][nframes].
void rfft(const float *inout, size_t nframes, size_t nrows, ...);
NanoSNAP | Description | Python equivalent |
---|---|---|
reshape_with_strides |
Create an array with the given shape and strides. | numpy.lib.stride_tricks.as_strided |
convolve |
1D convolution | numpy.convolve |
loadtxt |
Load 1D or 2D array | numpy.loadtxt |
savetxt |
Save 1D or 2D array | numpy.savetxt |
NanoSNAP | Description | Python equivalent |
---|---|---|
random_uniform |
Uniform random number | numpy.random.rand |
random_shuffle |
Randomly shuffle array | numpy.random.shuffle |
NanoSNAP | Description | Python equivalent |
---|---|---|
rfft |
Real 1D FFT | numpy.fft.rfft |
ifft |
Inverse Complex FFT | numpy.fft.ifft |
NanoSNAP | Description | Python equivalent |
---|---|---|
lfilter |
Filter data along one-dimension with an IIR or FIR filter. | scipy.signal.lfilt |
medfilt |
Median filter | scipy.signal.medfilt |
wav_read |
Read .WAV file | scipy.io.wavfile.read |
wav_write |
Save .WAV file | scipy.io.wavfile.write |
NanoSNAP | Description | Python equivalent |
---|---|---|
mel2hz |
Mel to Hz | mel2hz |
hz2mel |
Hz to Mel | hz2mel |
lifter |
Apply a cepstral lifter the the matrix of cepstra | lifter |
NanoSNAP | Description | Python equivalent |
---|---|---|
mfcc |
Mel Frequency Cepstral Coefficients | mfcc |
fbank |
Filterbank Energies | fbank |
logfbank |
Log Filterbank Energies | logfbank |
ssc |
Spectral Subband Centroids | ssc |
NanoSNAP | Description | Python equivalent |
---|---|---|
stft |
Short Term Fourier Transform | librosa.stft |
istft |
Inverse STFT | librosa.istft |
mel |
Create a Filterbank matrix to combine FFT bins into Mel-frequency bins | librosa.filters.mel |
NanoSNAP | Description | Python equivalent |
---|---|---|
resize_bilinear |
Resize image with bilinear | cv2.resize_image |
imread |
Load LDR image | cv2.imread |
imsave |
Save image as LDR format | cv2.imsave |
- get_window : 'hann' only.
scipy.signal.get_window
equivalent.
- Better error handling(report error message)
- Multithreading using C++11
thread
.- Use
StackVector
as much as possible.
- Use
- Read/write WAV from/to buffer(memory)
- Integrate with
NanoNumCp
- FFT
- Implement more FFT functions defined in
scipy.fft
. - 2D FFT
- Replace pocketfft with our own C++11 FFT routuine or muFFT https://github.com/Themaister/muFFT.
- Implement more FFT functions defined in
- Port more functions in
python_speech_features
- Implement more speech features implemented in sox, librosa, etc.
- Plot and save figure/image in JPG/PNG/EXR
- Write testvector generator and put it to
tests/gen/
- Generate testvector file(
.inc
)
- Generate testvector file(
- Add .cc to
tests
. Add it to CMakeLists.txt.
NanoSNAP is licensed under MIT license.
- stack_vector.h : Copyright (c) 2006-2008 The Chromium Authors. All rights reserved. Use of this source code is governed by a BSD-style license.
- doctest : The MIT License (MIT). Copyright (c) 2016-2019 Viktor Kirilov
- dr_wav : Public domain or MIT-0. By David Reid.
- python_speech_features : The MIT License (MIT). Copyright (c) 2013 James Lyons. https://github.com/jameslyons/python_speech_features
- pocketfft : FFT library used in numpy. Copyright (C) 2004-2018 Max-Planck-Society. 3-clause BSD-tyle license. https://gitlab.mpcdf.mpg.de/mtr/pocketfft
- c_speech_features : Copyright (c) 2017 Chris Lord. MIT license. https://github.com/Cwiiis/c_speech_features
- STB image : Public domain. https://github.com/nothings/stb
- sRGB transform : Copyright (c) 2017 Project Nayuki. (MIT License) https://www.nayuki.io/page/srgb-transform-library
- fastBPE: Copyright (c) 2019 Guillaume Lample,Timothée Lacroix(MIT License) https://github.com/glample/fastBPE