Fulltext-Search (FTS) contains function #5899
Replies: 4 comments 7 replies
-
Hey, thanks for bringing that up. First, let's remember that
To some extent, yes. Though Hibernate Search also provides a way to offload your FTS to another service, as well as various other features such as scoring or faceting. There are reasons for solutions like Elasticsearch/OpenSearch/Lucene to exist even though relational databases are better in non-FTS areas: they provide advanced features that relational databases don't.
I'm not a fan of the idea, mainly because But even if we ignore that, Index definition involves things like tokenizing ( So, while we could probably translate
Another thing to note is that Elasticsearch/OpenSearch/Lucene don't support cross-index joins. So if you were to translate SQL from to Elasticsearch/OpenSearch/Lucene, either you wouldn't support joins at all (limiting queries to a single table) or you would have to define joins at indexing time, like we do in Hibernate Search with And then you have concepts that only exist on Elasticsearch/OpenSearch/Lucene, which you would miss out on, because you're using Hibernate ORM and its SQL-specific APIs:
To sum up:
|
Beta Was this translation helpful? Give feedback.
-
Do I understand correctly that you intend to accept multiple paths? Do all DB dialects support that? I'm not sure about postgres... Also, I think ideally the analyzer config would be optional? I expect the string literal has a specific syntax (surrounded by quotes), so this might not lead to ambiguity?
Just so you know, it's technically possible (at least with ES/OS/Lucene) to tokenize on something else than whitespace. While that's certainly an exotic use case, we can keep in mind that a way to escape spaces could be useful in a future version of the grammar. Probably not something for V1 though. As far as I can see, the features exposed by this grammar all have a relatively direct equivalent in ES/OS/Lucene, so we should be fine if we want to implement it one day in Hibernate Search. I only have a doubt about the "NEAR" operator; ES/OS/Lucene have a phrase query with a "slop" option which seems equivalent if there are only two words in the phrase, but I'm not 100% sure. Tests would help for sure. |
Beta Was this translation helpful? Give feedback.
-
Adding some notes based on a discussions with @gavinking and @marko-bekhta about this:
A few open questions that came up:
|
Beta Was this translation helpful? Give feedback.
-
Note that even the Lucene query language exposes only a small part of what Lucene can do. Which is why Solr/Elasticsearch have their own language - using JSON for Elasticsearch, and I think XML for Solr. It's really a whole world, so reducing it to a "standardized" predicate or query language is likely to only address the most common use cases. It may be enough, though. |
Beta Was this translation helpful? Give feedback.
-
Add support for the full-text function
contains
which is documented nicely for SQL Server: https://docs.microsoft.com/en-us/sql/t-sql/queries/contains-transact-sql?view=sql-server-ver15Also see https://en.wikibooks.org/wiki/Structured_Query_Language/Like_Predicate
Determine if there is overlap with Hibernate Search here and if we can translate
contains()
on ElasticSearch/OpenSearch and Lucene.Some links:
Possible grammar for the query syntax based on an adapted form of https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc01268.1520/html/iquda/BABDFAJI.htm:
BEFORE
is likeNEAR
, except that the order of terms matters.~
is the operator forNEAR
As far as I understand, this syntax is very near to what the various DBs support, so translation should be mostly 1:1.
Translation for PostgreSQL should work out mostly after skimming through https://www.postgresql.org/docs/current/datatype-textsearch.html#DATATYPE-TSQUERY. It seems though that PG is doing stemming for all terms and there is also no way to search for a phrase string from within the
tsquery
.We will have to write some tests to understand how other databases do the matching exactly, but AFAIU phrase string matches will mostly work for words that can't be stemmed, so this should not be a big problem on PG.
Something that is still missing in this proposal but definitely important is affecting scoring/weighting/ranking, but all DBs have different ways to affect this.
ISABOUT
with explicitWEIGHT (float)
>
and<
operators to indicate relevance increase or decrease of options^
operator to specify the weight*
operator to affect the scoreAlso see HHH-11252
Beta Was this translation helpful? Give feedback.
All reactions