GitHub - wordtreefoundation/TextGrams.jl: A julia library for ngram analysis

TextGrams is a Julia library that takes documents and slices them into ngrams and counts the occurrences. Given a baseline of ngram frequencies, it can then compare 2 or more documents for similarity using rare ngram matching.

Example Usage

using TextGrams
using MutableStrings

# Normalize some text
text1 = MutableASCIIString("This real-\nly, sorta, works!")
clean!(text1)
println(text1) # => "this really sorta works"


# Create a document
doc1 = Document(text1, "A Real Document")
# Note: you can also open 
doc1.title # => "A Real Document"
doc1.content # => "this really sorta works"

# Slice the document into Ngrams of size 2 or fewer
ng1 = Ngrams(doc1, 2)

# => Dict{ASCIIString,Integer} with 7 entries:
#   "sorta works"  => 1
#   "this really"  => 1
#   "this"         => 1
#   "sorta"        => 1
#   "really"       => 1
#   "really sorta" => 1
#   "works"        => 1

# Create another document
doc2 = Document("I guess it sorta works.")
ng2 = Ngrams(doc2, 2)

# Find the intersection of ngrams
intersectAdd(ng1, ng2)

# => Dict{ASCIIString,Integer} with 3 entries:
#   "sorta works" => 2
#   "sorta"       => 2
#   "works"       => 2

More Info

See our blog, on Using TextGrams.jl.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
src		src
test		test
.gitignore		.gitignore
REQUIRE		REQUIRE
baseline.jl		baseline.jl
compare.jl		compare.jl
compare2.jl		compare2.jl
par_baseline.jl		par_baseline.jl
readme.md		readme.md
utils.jl		utils.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example Usage

More Info

About

Releases

Packages

Languages

wordtreefoundation/TextGrams.jl

Folders and files

Latest commit

History

Repository files navigation

Example Usage

More Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages