A spell checker in Golang

Constructed a spell checker in Golang powered by the very efficient bloom filters.

Bloom Filters

A Bloom filter is a probabilistic data structure that provides a space-efficient way to test whether an element is a member of a set. Bloom filters are particularly useful for applications where the amount of data is large, and memory efficiency is crucial. More about bloom filters and the math behind them can be found here. A few useful features of bloom filters are as follows:

Bloom filters are very space efficient compared to other data structures like hash tables.
Bloom filters are probabilistic in nature. Bloom filters can yield false positives but never false negatives. This means they can tell us if an element is definitely not in the set or if it is possibly in the set.
- We will use this feature for the spell checker. All the common english words are inserted into the bloom filter. For a new word, if it is not present in the bloom filter, we conclude that it is misspelt. Otherwise the word may be correctly spelt ('maybe' because of possible hashing colisions).

Usage

Clone this repository and navigate to it.

git clone https://github.com/manavgondalia/Bloom-Filter-Golang.git
cd Bloom-Filter-Golang

Get the required third-party packages using the command go mod tidy
The program is stored in main.go. The accepted flags are load (default 0, use 1 for loading bloom filter from disk), build (Path to dictionary file to build Bloom filter), bf (Path to Bloom filter file), fp (False probability rate), n (Number of elements to be inserted).
Using in non-loading mode: The load flag is by default 0. Specify any other requirements as stated above and hit enter to run. A new bloom filter will be saved in words.bf file This bloom filter can later be used without the need of making a new bloom filter for existing data
Using in loading mode: Run the program using the command go run main.go -load=1 This will load the bloom filter from the file and apply it on the words present in testing.txt

Key Features

The number of hash functions required is calculated from the bitset size and the number of elements and is dynamic. More hash functions are generated by chaning the seed with respect to the i'th hash function.

The ability to add/load the bloom filter to/from disk saves a lot of calculation time.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
hashes		hashes
README.md		README.md
compiled_bloom_filter.bf		compiled_bloom_filter.bf
generator.py		generator.py
go.mod		go.mod
go.sum		go.sum
main.go		main.go
testing.txt		testing.txt
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A spell checker in Golang

Bloom Filters

Usage

Key Features

About

Releases

Packages

Languages

manavgondalia/Bloom-Filter-Golang

Folders and files

Latest commit

History

Repository files navigation

A spell checker in Golang

Bloom Filters

Usage

Key Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages