This microservice is aimed to convert any PDF that have text to the outline form or convert all page to JPG inside the PDF using the desired quality and density.
- Architecture
- OS dependencies
- How to check healthy
- How to use
- How to contribute
- How to run it on Docker
- Next steps
You must run it on Linux and must assure that the packages ghostscript
, pdftoppm
and img2pdf
are installed.
$ sudo apt install ghostscript poppler-utils img2pdf
$ gs --version && pdftoppm -v && img2pdf --version
### OUTPUT ###
9.55.0
pdftoppm version 22.02.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
img2pdf 0.4.2
Note: the NODE_ENV
is not mandatory, if you don't have it, then the default values will be used as production mode.
-
To build this to production you must export the env environments
# Those are de default values if there's no env at all NODE_ENV=production NODE_PORT=3000 NODE_ROOT='/' NODE_IN_FOLDER=/tmp/pdf-outliner/input NODE_OUT_FOLDER=/tmp/pdf-outliner/outupt # In minutes NODE_TIMEOUT=10
-
Install the dependencies and run
$ yarn deploy:prod $ yarn prod ### OUTPUT ### yarn run v1.22.17 $ node src Server on port 3000
-
To build this to production you must export the env environments
NODE_ENV=development # Defaults to 3000 NODE_PORT=3000 NODE_ROOT='/' # Place to read and save the file, usually a shared one NODE_IN_FOLDER=/tmp/pdf-outliner/input NODE_OUT_FOLDER=/tmp/pdf-outliner/output # In minutes NODE_TIMEOUT=10
-
Install the dependencies and start in
nodemon
mode$ yarn dev ### OUTPUT ### yarn run v1.22.17 $ yarn install && nodemon src [1/4] Resolving packages... success Already up-to-date. [nodemon] 2.0.20 [nodemon] to restart at any time, enter `rs` [nodemon] watching path(s): *.* [nodemon] watching extensions: js,mjs,json [nodemon] starting `node src` Server on port 3000
Use this as container healthy check. To check if the API is healthy
$ curl -f localhost:3000/ping
### OUTPUT ###
{"message":"pong","success":true,"time":0}
-
Test a failed API call
$ curl -X POST localhost:3000/outline -d file=test.pdf ### OUTPUT ### {"message":"file not found","success":false,"time":0}
-
Add a
test.pdf
file onNODE_IN_FOLDER
and test a successful API call$ curl -X POST localhost:3000/outline -d file=test.pdf ### OUTPUT ### {"message":"/tmp/pdf-outliner/output/test.pdf","success":true,"time":1.606}
-
Check if the file was created as expected
$ ls -lh /tmp/pdf-outliner/* ### OUTPUT ### /tmp/pdf-outliner/input: total 636K -rw-rw-r-- 1 thyarles thyarles 633K Oct 14 20:29 test.pdf /tmp/pdf-outliner/output: total 988K -rw-rw-r-- 1 thyarles thyarles 988K Oct 15 20:40 test.pdf
-
Add a
test.pdf
file onNODE_IN_FOLDER
and test a successful API call$ curl -X POST localhost:3000/frozen -d file=test.pdf ### OUTPUT ### {"message":"/tmp/pdf-outliner/output/test.pdf","success":true,"time":35.689}
-
You can tune the options, the default values are
jpgQuality = 10
andjpgResolution = 300
DPI.$ curl -X POST -H 'Content-Type: application/json' localhost:3000/frozen -d '{ "file": "test.pdf", "options": { "jpgQuality": 10, "jpgResolution": 150 }}' ### OUTPUT ### {"message":"/tmp/pdf-outliner/output/test.pdf","success":true,"time":11.748}
-
As you can see, with the half of the original resolution you processed it in about
1/3
of the original time (36 seconds
against12 seconds
), so the resolution matters on the processing time. Also matters on the size of the file. On the call without the options, the final size was14.2 MB
, with half of the resolution, it was only4.6 MB
.
Let's keep the application as simple as possible by following the best practices for code style. We know every developer have your way, but when working together the code must have standards on the code style that must be followed by all.
To do this job, we relly on ESlint and you should lint your application before the pull requests, otherwise your PR will be deleted. To do so, install yarn dev
packages and work as usual.
When you try to commit the code, the tool Husky will do a lint check and if it finds any error, your commit will be denied.
$ yarn install
### OUTPUT ###
yarn install v1.22.17
[1/4] Resolving packages...
success Already up-to-date.
Done in 0.22s.
$ git commit -m "test: checking if husky will block the commit due a unused const"
### OUTPUT ###
yarn run v1.22.17
$ eslint --ext js,jsx,ts,tsx .
/media/thyarles/LinuxData/bitcot/pdf-outliner/src/index.js
22:7 error 'testLint' is assigned a value but never used no-unused-vars
✖ 1 problem (1 error, 0 warnings)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
husky - pre-commit hook exited with code 1 (error)
So you need to look into do error, fix it and try another commit. The most common IDEs will warn you about those errors in real time.
Most of the code style errors can be automatically fixed, you just need to call like that:
$ yarn lint:format
### OUTPUT ###
yarn run v1.22.17
$ prettier --write '**/*.{ts,tsx,js,jsx,json}'
.eslintrc.json 54ms
package.json 7ms
src/index.js 73ms
Done in 0.51s.
If you believe in me and don't want to install a thing, just use the Docker image:
$ docker run --init -p 3000:3000 -v /tmp/pdf-outliner:/tmp/pdf-outliner thyarles/pdf-outliner:latest
### OUTPUT ###
yarn run v1.22.19
$ node src
Server on port 3000
$ docker run --detach --restart unless-stopped --publish 3000:3000 --volume /tmp/pdf-outliner:/tmp/pdf-outliner thyarles/pdf-outliner:latest
If you don't believe in me (you shouldn't) you can read the code, change it and generate your own image:
$ docker build -t thyarles/pdf-outliner .
### OUTPUT ###
Sending build context to Docker daemon 95.23kB
Step 1/8 : FROM node:18-alpine
---> 867dce98a500
Step 2/8 : WORKDIR /app
---> Using cache
---> 93fd34abb1ea
Step 3/8 : COPY . .
---> 1e233562376f
Step 4/8 : RUN apk update && apk add curl ghostscript && mkdir /efs && chown node:node -R /app /efs && yarn deploy:prod
---> Running in 8f92ab604ac7
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
v3.16.2-299-ga2e2d92ef8 [https://dl-cdn.alpinelinux.org/alpine/v3.16/main]
v3.16.2-299-ga2e2d92ef8 [https://dl-cdn.alpinelinux.org/alpine/v3.16/community]
OK: 17036 distinct packages available
(1/29) Installing ca-certificates (20220614-r0)
(2/29) Installing brotli-libs (1.0.9-r6)
(3/29) Installing nghttp2-libs (1.47.0-r0)
(4/29) Installing libcurl (7.83.1-r3)
(5/29) Installing curl (7.83.1-r3)
(6/29) Installing dbus-libs (1.14.4-r0)
(7/29) Installing libintl (0.21-r2)
(8/29) Installing avahi-libs (0.8-r6)
(9/29) Installing gmp (6.2.1-r2)
(10/29) Installing nettle (3.7.3-r0)
(11/29) Installing libffi (3.4.2-r1)
(12/29) Installing p11-kit (0.24.1-r0)
(13/29) Installing libtasn1 (4.18.0-r0)
(14/29) Installing libunistring (1.0-r0)
(15/29) Installing gnutls (3.7.7-r0)
(16/29) Installing cups-libs (2.4.2-r0)
(17/29) Installing expat (2.4.9-r0)
(18/29) Installing libbz2 (1.0.8-r1)
(19/29) Installing libpng (1.6.37-r1)
(20/29) Installing freetype (2.12.1-r0)
(21/29) Installing fontconfig (2.14.0-r0)
(22/29) Installing jbig2dec (0.19-r0)
(23/29) Installing libjpeg-turbo (2.1.3-r1)
(24/29) Installing lcms2 (2.13.1-r0)
(25/29) Installing xz-libs (5.2.5-r1)
(26/29) Installing libwebp (1.2.3-r0)
(27/29) Installing zstd-libs (1.5.2-r1)
(28/29) Installing tiff (4.4.0-r0)
(29/29) Installing ghostscript (9.56.1-r0)
Executing busybox-1.35.0-r17.trigger
Executing ca-certificates-20220614-r0.trigger
Executing fontconfig-2.14.0-r0.trigger
OK: 91 MiB in 45 packages
yarn run v1.22.19
$ yarn install --production --frozen-lockfile
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 5.34s.
Removing intermediate container 8f92ab604ac7
---> 57d0b208c8f9
Step 5/8 : USER node
---> Running in 3b8da584016b
Removing intermediate container 3b8da584016b
---> 4c75d08f78d8
Step 6/8 : EXPOSE 3000
---> Running in 375207bb9c59
Removing intermediate container 375207bb9c59
---> 5f5c7a435d19
Step 7/8 : VOLUME /efs
---> Running in b79da9e57105
Removing intermediate container b79da9e57105
---> 591fff7f0b5f
Step 8/8 : CMD ["yarn", "prod"]
---> Running in 7cf971a682f2
Removing intermediate container 7cf971a682f2
---> 2cddd5495a66
Successfully built 2cddd5495a66
Successfully tagged thyarles/pdf-outliner:latest
- Enforce lint as branch protection on GitHub Actions (unfortunately Husky allow developer bypass local test, so best make sure)
- Do static analysis check using SonarCloud and block code with low quality gate, with any bugs or with any security issues
- Add unit tests
- Add integration tests
- Block pull requests that doesn't meet the minimum of 60% of coverage using SonarCloud