You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
can this module perform at the level of state of the art?
the f1 score is near SOTA based on Glove(100)+ELMo+CNN(char)+BiLSTM+CRF
92.83% (best, experiments 10, test 16), 92.45%(average, 10 runs, `experiments 10, test 15)
how to make it faster when it comes to using the BiLSTM?
the solution is LSTMBlockFusedCell().
3.13 times faster than LSTMCell() during training time.
1.26 times faster than LSTMCell() during inference time.
can the Transformer have competing results against the BiLSTM? and how much faster?
contextual encoding by the Transformer encoder yields competing results.
in case the sequence to sequence model like translation, the multi-head attention mechanism might be very powerful for alignments.
however, for sequence tagging, the source of power is from point-wise feed forward net with wide range of kernel size. it is not from the multi-head attention only.
if you are using kernel size 1, then the the performance will be very worse than you expect.
it seems that point-wise feed forward net collects contextual information in the layer by layer manner.
this is very similar with hierarchical convolutional neural network.
i'd like to say Attention is Not All you need
you can see the below evaluation results.
multi-layer BiLSTM using LSTMBlockFusedCell() is slightly faster than the Transformer with 4 layers on GPU.
moreover, the BiLSTM is 2 times faster on CPU environment(multi-thread) than on GPU.
LSTMBlockFusedCell() is well optimized for multi-core CPU via multi-threading.
i guess there might be an overhead when copying b/w GPU memory and main memory.
the BiLSTM is 3 ~ 4 times faster than the Transformer version on 1 CPU(single-thread)
during inference time, 1 layer BiLSTM on 1 CPU takes just 4.2 msec per sentence on average.
how to use a trained model from C++? is it much faster?
freeze model, convert to memory mapped format and load it via tensorflow C++ API.
1 layer BiLSTM on multi CPU takes 2.04 msec per sentence on average.
1 layer BiLSTM on single CPU takes 2.68 msec per sentence on average.
$ cd etagger
$ ls embeddings
embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5
$ python --mode line --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --restore checkpoint/ner_model
Obama left office in January 2017 with a 60% approval rating and currently resides in Washington, D.C.
left VBD O O O
office NN O O O
in IN O O O
January NNP O B-DATE O
2017 CD O I-DATE O
with IN O O O
a DT O O O
approval NN O O O
rating NN O O O
and CC O O O
currently RB O O O
resides VBZ O O O
in IN O O O
Washington NNP O B-GPE B-LOC
, , O I-GPE O
The Beatles were an English rock band formed in Liverpool in 1960.
The DT O O O
were VBD O O O
an DT O O O
rock NN O O O
band NN O O O
formed VBN O O O
in IN O O O
Liverpool NNP O B-GPE B-LOC
in IN O O O
1960 CD O B-DATE O
. . O I-DATE O
inference(bucket) using frozen model, tensorRT, C++
* create virtual env `python -m venv python3.6_tfsrc` and activate it.
$ python -m venv python3.6_tfsrc
$ source /home/python3.6_tfsrc/bin/activate
* install bazel ( , )
* ex) bazel 0.15.0 for tensorflow 1.11.0, tensorflow 1.12.0
$ ./bazel-${bazel-version} --user
$ source /data1/index.shin/.bazel/bin/bazel-complete.bash
* build tensorflow from source.
$ git clone tensorflow-src-cpu
$ cd tensorflow-src-cpu
* you should checkout the same version of pip used for training.
$ git checkout r1.11
* modify a source file for memory mapped graph(convert_graphdef_memmapped_format)
./tensorflow/core/platform/posix/ mmap(nullptr, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); in 'NewReadOnlyMemoryRegionFromFile'
* configure without CUDA
$ ./configure
* build pip package (for FMA, AVX and SSE optimization, see ).
$ python -m pip install --upgrade pip
$ python -m pip install --upgrade setuptools
$ python -m pip install keras_applications --no-deps
$ python -m pip install keras_preprocessing --no-deps
$ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
* install pip package
$ python -m pip uninstall tensorflow
$ python -m pip install /tmp/tensorflow_pkg/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl
* build libraries and binaries we need.
$ bazel build --config=opt //
$ bazel build --config=opt //
$ bazel build --config=opt //
$ bazel build --config=opt //tensorflow/python/tools:optimize_for_inference
$ bazel build --config=opt //tensorflow/tools/quantization:quantize_graph
$ bazel build --config=opt //tensorflow/contrib/util:convert_graphdef_memmapped_format
$ bazel build --config=opt //tensorflow/tools/graph_transforms:transform_graph
* copy libraries to dist directory, export dist and includes directory.
$ export TENSORFLOW_SOURCE_DIR='/home/tensorflow-src-cpu'
$ export TENSORFLOW_BUILD_DIR='/home/tensorflow-dist-cpu'
$ cp -rf ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/*.so ${TENSORFLOW_BUILD_DIR}/
* for LSTMBlockFusedCell()
$ rnn_path=`python -c "import tensorflow; print(tensorflow.contrib.rnn.__path__[0])"`
$ rnn_ops_lib=${rnn_path}/python/ops/
$ cp -rf ${rnn_ops_lib} ${TENSORFLOW_BUILD_DIR}
* for QRNN [optional]
$ qrnn_path=`python -c "import tensorflow as tf; print(tf.__path__[0])"`
$ qrnn_lib=${qrnn_path}/../
$ cp -rf ${qrnn_lib} ${TENSORFLOW_BUILD_DIR}
.bashrc sample
# tensorflow so, header dist
export TENSORFLOW_SOURCE_DIR='/home/tensorflow-src-cpu'
export TENSORFLOW_BUILD_DIR='/home/tensorflow-dist-cpu'
# for loading,
test build sample model and inference by C++
$ cd /home/etagger
* build and save sample model
$ cd inference
$ python
* inference using python
$ python python/
* inference using c++
* edit etagger/inference/cc/CMakeLists.txt
find_package(TensorFlow 1.11 EXACT REQUIRED)
$ cd etagger/inference/cc
$ mkdir build
$ cd build
* cmake >= 3.11, set DPYTHON_EXECUTABLE as absolute path
$ cmake .. -DPYTHON_EXECUTABLE=/usr/local/bin/python3.6m
$ make
$ cd ../..
$ ./cc/build/inference_example
test build iris model, freezing and inference by C++
$ cd /home/etagger
* build and save iris model
$ cd inference
$ python
* freeze graph
$ python --model_dir exported --output_node_names logits --frozen_model_name iris_frozen.pb
* inference using python
$ python python/
* inference using C++
* edit etagger/inference/cc/CMakeLists.txt
find_package(TensorFlow 1.11 EXACT REQUIRED)
* cmake >= 3.11, set DPYTHON_EXECUTABLE as absolute path
$ cd etagger/inference/cc
$ mkdir build
$ cd build
$ cmake .. -DPYTHON_EXECUTABLE=/usr/local/bin/python3.6m
$ make
$ cd ../..
$ ./cc/build/inference_iris
export etagger model, freezing and inference by C++
$ cd inference
* let's assume that we have a saved model :
* <note> BiLSTM, LSTMBlockFusedCell()
* : if you can't find `BlockLSTM` when using import_meta_graph()
* : similar issue =>
: how to fix? =>
: what about C++? =>
we can load '' for LSTMBlockFusedCell().
* restore the model to check list of operations, placeholders and tensors for mapping. and export it another place.
$ python --restore ../checkpoint/ner_model --export exported/ner_model --export-pb exported
* freeze graph
$ python --model_dir exported --output_node_names logits_indices,sentence_lengths --frozen_model_name ner_frozen.pb
* freeze graph for bert
$ python --model_dir exported --output_node_names logits_indices,sentence_lengths,bert_embeddings_subgraph --frozen_model_name ner_frozen.pb
$ ln -s ../embeddings embeddings
$ ln -s ../data data
* inference using python
$ python python/ --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/ --emb_path embeddings/glove.6B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/ --emb_path embeddings/glove.840B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
* you may need to modify build_input_feed_dict() in 'python/' for emb_class='bert'.
* since some of input tensor might not exist in the frozen graph. ex) 'input_data_chk_ids'
* inference using python with optimized graph_def via tensorRT (only for GPU)
$ python python/ --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/ --emb_path embeddings/glove.6B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/ --emb_path embeddings/glove.840B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ perl ../etc/conlleval < pred.txt
* for inference by C++, i implemented emb_class='glove' only.
* inference using C++
$ ./cc/build/inference exported/ner_frozen.pb embeddings/vocab.txt < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ perl ../etc/conlleval < pred.txt
optimizing graph for inference, convert it to memory mapped format and inference by C++
$ cd inference
* optimize graph for inference
# not working properly
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/python/tools/optimize_for_inference --input=exported/ner_frozen.pb --output=exported/ner_frozen.pb.optimized --input_names=is_train,sentence_length,input_data_pos_ids,input_data_chk_ids,input_data_word_ids,input_data_wordchr_ids --output_names=logits_indices,sentence_lengths
* quantize graph
# not working properly
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/tools/quantization/quantize_graph --input=exported/ner_frozen.pb --output=exported/ner_frozen.pb.rounded --output_node_names=logits_indices,sentence_lengths --mode=weights_rounded
* transform graph
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=exported/ner_frozen.pb --out_graph=exported/ner_frozen.pb.transformed --inputs=is_train,sentence_length,input_data_pos_ids,input_data_chk_ids,input_data_word_ids,input_data_wordchr_ids --outputs=logits_indices,sentence_lengths --transforms='strip_unused_nodes merge_duplicate_nodes round_weights(num_steps=256) sort_by_execution_order'
* convert to memory mapped format
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=exported/ner_frozen.pb --out_graph=exported/ner_frozen.pb.memmapped
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=exported/ner_frozen.pb.transformed --out_graph=exported/ner_frozen.pb.memmapped
* inference using C++
$ ./cc/build/inference exported/ner_frozen.pb.memmapped embeddings/vocab.txt 1 < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ perl ../etc/conlleval < pred.txt
* inspect the memory mapped graph is opened with MAP_SHARED
$ cat /proc/pid/maps
7fae40522000-7fae4a000000 r--s 00000000 08:11 749936602 /root/etagger/inference/exported/ner_frozen.pb.memmapped