Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the form of the output? #147

Open
papb opened this issue May 21, 2021 · 3 comments
Open

What is the form of the output? #147

papb opened this issue May 21, 2021 · 3 comments
Labels

Comments

@papb
Copy link

papb commented May 21, 2021

Hello! I noticed that this module returns an array of numbers.

  • What can I expect of the sizes of each number? The largest I got was 303 on a small example, and the smallest was 34. Is there a guaranteed lower bound? And upper bound?

  • What can I expect of the length of this array?

@floydpink
Copy link
Owner

@papb - It looks like you might have confused the compression implemented in this library for something else. This library is useful for applying the popular Lempel–Ziv–Welch lossless compression algorithm on large JSON objects, so that it can be persisted into NoSQL databases or into the IndexedDB on the browser (for offline use).

As mentioned on the README, the output of the pack() call cannot be consumed directly - but if you persist this "packed" output instead of the original JSON into a NoSQL database, you should see the difference in the space utilization.

When you want to consume the "packed", large JSON object - you will then retrieve it from your persistence layer and the call unpack() from the library.

Does that help?

@papb
Copy link
Author

papb commented May 21, 2021

Hello, thanks for the fast reply! I already knew what you said, sorry for not being clear on the question. I don't have any knowledge on how the LZW algorithm works - what I know is that it is a compression algorithm. The fact that the output of this algorithm consists of an array of numbers was something I didn't know, and learned by running this module. Now I want to know what can I expect about the bounds in these numbers and the array length, in order to think better on the persistence layer that I will use. For example, if I have a way to persist an array of numbers from 0 to 1023, can I use it? Or is it possible that this algorithm generates a number above 1023 at some point? I couldn't find this in a quick look into Wikipedia on LZW so I thought I'd ask here. I also looked at the source code to see if I could deduce something but couldn't.

@floydpink
Copy link
Owner

I don't think we can definitively say what the upper bound is, for the numbers in the "compressed" array - and the reason it is like that is because the numbers will depend on the specific input passed into the core LZW compress function (which takes a random string and converts it into an array).

At the core (as can be seen on the function), the LZW algorithm starts with the ASCII character set loaded into a dictionary/map. And during the processing of a given input, it will start accumulating string segments that are repeated within.

Here is an example input of "awesome awesome", that is being processed by this core compress function along with the data structure within the function visualized as it is processed: Link to JavaScript Visualizer in PythonTutor.com

As you can see, for this input - the numbers go all the way up to 260 - hopefully this gives some insight into what you are dealing with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants