Skip to content

manavgondalia/Bloom-Filter-Golang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A spell checker in Golang

Constructed a spell checker in Golang powered by the very efficient bloom filters.

Bloom Filters

A Bloom filter is a probabilistic data structure that provides a space-efficient way to test whether an element is a member of a set. Bloom filters are particularly useful for applications where the amount of data is large, and memory efficiency is crucial. More about bloom filters and the math behind them can be found here. A few useful features of bloom filters are as follows:

  1. Bloom filters are very space efficient compared to other data structures like hash tables.
  2. Bloom filters are probabilistic in nature. Bloom filters can yield false positives but never false negatives. This means they can tell us if an element is definitely not in the set or if it is possibly in the set.
    • We will use this feature for the spell checker. All the common english words are inserted into the bloom filter. For a new word, if it is not present in the bloom filter, we conclude that it is misspelt. Otherwise the word may be correctly spelt ('maybe' because of possible hashing colisions).

Usage

  1. Clone this repository and navigate to it.
git clone https://github.com/manavgondalia/Bloom-Filter-Golang.git
cd Bloom-Filter-Golang
  1. Get the required third-party packages using the command go mod tidy

  2. The program is stored in main.go. The accepted flags are load (default 0, use 1 for loading bloom filter from disk), build (Path to dictionary file to build Bloom filter), bf (Path to Bloom filter file), fp (False probability rate), n (Number of elements to be inserted).

  3. Using in non-loading mode: The load flag is by default 0. Specify any other requirements as stated above and hit enter to run. A new bloom filter will be saved in words.bf file This bloom filter can later be used without the need of making a new bloom filter for existing data

  4. Using in loading mode: Run the program using the command go run main.go -load=1 This will load the bloom filter from the file and apply it on the words present in testing.txt

Key Features

The number of hash functions required is calculated from the bitset size and the number of elements and is dynamic. More hash functions are generated by chaning the seed with respect to the i'th hash function.

The ability to add/load the bloom filter to/from disk saves a lot of calculation time.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published