Skip to content

Commit

Permalink
Add string::similarity::fuzzy function (#742)
Browse files Browse the repository at this point in the history
  • Loading branch information
Dhghomon authored Aug 22, 2024
1 parent 3cd6469 commit 20933b8
Showing 1 changed file with 112 additions and 23 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,10 @@ These functions can be used when working with and manipulating text and string v
<td scope="row" data-label="Function"><a href="#stringisnumeric"><code>string::is::numeric()</code></a></td>
<td scope="row" data-label="Description">Checks whether a value has only numeric characters</td>
</tr>
<tr>
<td scope="row" data-label="Function"><a href="#stringisrecord"><code>string::is::record()</code></a></td>
<td scope="row" data-label="Description">Checks whether a string is a Record ID, optionally of a certain table</td>
</tr>
<tr>
<td scope="row" data-label="Function"><a href="#stringissemver"><code>string::is::semver()</code></a></td>
<td scope="row" data-label="Description">Checks whether a value matches a semver version</td>
Expand All @@ -155,10 +159,6 @@ These functions can be used when working with and manipulating text and string v
<td scope="row" data-label="Function"><a href="#stringisuuid"><code>string::is::uuid()</code></a></td>
<td scope="row" data-label="Description">Checks whether a string is a UUID</td>
</tr>
<tr>
<td scope="row" data-label="Function"><a href="#stringisrecord"><code>string::is::record()</code></a></td>
<td scope="row" data-label="Description">Checks whether a string is a Record ID, optionally of a certain table</td>
</tr>
<tr>
<td scope="row" data-label="Function"><a href="#stringsemvercompare"><code>string::semver::compare()</code></a></td>
<td scope="row" data-label="Description">Performs a comparison between two semver strings</td>
Expand Down Expand Up @@ -199,6 +199,10 @@ These functions can be used when working with and manipulating text and string v
<td scope="row" data-label="Function"><a href="#stringsemversetpatch"><code>string::semver::set::patch()</code></a></td>
<td scope="row" data-label="Description">Set the patch version of a semver string</td>
</tr>
<tr>
<td scope="row" data-label="Function"><a href="#stringsemversetpatch"><code>string::similarity::fuzzy()</code></a></td>
<td scope="row" data-label="Description">Return the similarity score of fuzzy matching strings</td>
</tr>
</tbody>
</table>

Expand Down Expand Up @@ -793,23 +797,6 @@ true

<br />

## `string::is::uuid`

The `string::is::uuid` function checks whether a string is a UUID.

```surql title="API DEFINITION"
string::is::uuid(string) -> bool
```
The following example shows this function, and its output, when used in a [`RETURN`](/docs/surrealdb/surrealql/statements/return) statement:

```surql
RETURN string::is::uuid("018a6680-bef9-701b-9025-e1754f296a0f");
true
```

<br />

## `string::is::record`

The `string::is::record` function checks whether a string is a Record ID.
Expand All @@ -832,6 +819,23 @@ RETURN string::is::record("not a record id"); -- false

<br />

## `string::is::uuid`

The `string::is::uuid` function checks whether a string is a UUID.

```surql title="API DEFINITION"
string::is::uuid(string) -> bool
```
The following example shows this function, and its output, when used in a [`RETURN`](/docs/surrealdb/surrealql/statements/return) statement:

```surql
RETURN string::is::uuid("018a6680-bef9-701b-9025-e1754f296a0f");
true
```

<br />

## `string::semver::compare`
<Since v="v1.2.0" />

Expand Down Expand Up @@ -1020,7 +1024,92 @@ RETURN string::semver::set::patch("1.2.3", 9);
"1.2.9"
```

<br /><br />
<br />

## `string::similarity::fuzzy`

While [the ~ operator](/docs/surrealdb/surrealql/operators#matches) is a quick go-to to see if two strings are a fuzzy match, it returns a boolean that does not indicate relative similarity.

```surql
RETURN "SurrealDB" ~ "db";
-- true
RETURN "SurrealDB" ~ "surrealdb"
-- true
```

The `string::similarity::fuzzy` function allows a comparison of similarity to be made. Any value that is greater than 0 is considered a fuzzy match.

```surql
-- returns 51
RETURN string::similarity::fuzzy("DB", "DB");
-- returns 47
RETURN string::similarity::fuzzy("DB", "db");
```

The similarity score is not based on a single score such as 1 to 100, but is built up over the course of the algorithm used to compare one string to another and will be higher for longer strings. As a result, similarity can only be compared from a single string to a number of possible matches, but not multiple strings to a number of possible matches.

While the first two uses of the function in the following example compare identical strings, the longer string returns a much higher fuzzy score.

```surql
-- returns 51
RETURN string::similarity::fuzzy("DB", "DB");
-- returns 2997
RETURN string::similarity::fuzzy(
"Surreal Cloud Beta is now live! We are excited to announce that we are inviting users from the waitlist to join. Stay tuned for your invitation!", "Surreal Cloud Beta is now live! We are excited to announce that we are inviting users from the waitlist to join. Stay tuned for your invitation!"
);
-- returns 151 despite nowhere close to exact match
RETURN string::similarity::fuzzy(
"Surreal Cloud Beta is now live! We are excited to announce that we are inviting users from the waitlist to join. Stay tuned for your invitation!", "Surreal"
);
```

A longer example showing a comparison of similarity scores to one another:

```surql
LET $original = "SurrealDB";
LET $strings = ["SurralDB", "surrealdb", "DB", "Surreal", "real", "basebase", "eel", "eal"];
FOR $string IN $strings {
LET $score = string::similarity::fuzzy($original, $string);
IF $score > 0 {
CREATE comparison SET of = $original + '\t' + $string,
score = $score
};
};
SELECT of, score FROM comparison ORDER BY score DESC;
```

```bash title="Response"
[
{
of: 'SurrealDB surrealdb',
score: 187
},
{
of: 'SurrealDB SurralDB',
score: 165
},
{
of: 'SurrealDB Surreal',
score: 151
},
{
of: 'SurrealDB real',
score: 75
},
{
of: 'SurrealDB eal',
score: 55
},
{
of: 'SurrealDB DB',
score: 41
}
]
```

<br />

## Method chaining

Expand Down Expand Up @@ -1062,4 +1151,4 @@ string::concat(

```bash title="Response"
'I'LL SEND YOU A CHEQUE FOR THE CATALOGUE!!!!'
```
```

0 comments on commit 20933b8

Please sign in to comment.