Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: no downtime reindex, second attempt #37

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

David-Guillot
Copy link
Contributor

@David-Guillot David-Guillot commented Mar 29, 2023

Fixes #2

Hi @qcoumes ,

This PR supersedes #23 in my attempts to implement #2 . This is a very different approach, based on the Document class much more than on Index class:

  • It simply allows a Document subclass to have multiple indices, by leveraging the ability of opensearch_dsl Document class to do so
  • As far as the management command is concerned, the interface almost stays the same: obviously index rebuild was removed because it was not making any sense anymore, and document migrate has been added, but it doesn't do anything more than switching an alias (you need to call index create then document index to actually put documents into your new index)
  • The issue of existing concrete index was solved: index create is working, as well as document migrate, which delete the concrete index if it exists

I'll let you judge if this architecture fits your vision of this lib. If it doesn't, I'll stop trying to implement #2 because it's been a lot of energy on my end, and if it needs another rewrite I'll just find a way to integrate this solution somewhere in our codebase, as an extension of this lib.

... instead of opensearch_dsl's one. Because we might need to add some logic in there :)
@David-Guillot David-Guillot marked this pull request as draft March 29, 2023 14:28
* The Document Index name does not match a concrete OpenSearch index anymore
* Concrete OpenSearch indices are named after the Document Index name, but with a suffix (which can be set explicitely or automatically)
* The Document Index name is set as an alias to a chosen concrete OpenSearch index
* This allows multiple indices living alongside for a same Document, and the ability to switch the alias from an index to another
* Index creation is now done through the Document.init() to allow naming through a given suffix
* Index rebuild makes no sense anymore
@David-Guillot David-Guillot force-pushed the feat/no-downtime-reindex-v2 branch from e884701 to 4328341 Compare March 29, 2023 14:34
@David-Guillot David-Guillot marked this pull request as ready for review March 29, 2023 16:30
@codecov-commenter
Copy link

Codecov Report

Merging #37 (de27e73) into master (fe1a773) will decrease coverage by 1.40%.
The diff coverage is 95.94%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
- Coverage   95.65%   94.26%   -1.40%     
==========================================
  Files          12       12              
  Lines         576      628      +52     
  Branches       80       92      +12     
==========================================
+ Hits          551      592      +41     
- Misses         21       30       +9     
- Partials        4        6       +2     
Impacted Files Coverage Δ
django_opensearch_dsl/documents.py 96.52% <93.75%> (-1.54%) ⬇️
...o_opensearch_dsl/management/commands/opensearch.py 94.02% <100.00%> (-5.98%) ⬇️
django_opensearch_dsl/management/enums.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@David-Guillot
Copy link
Contributor Author

@qcoumes did you have some time to think about this?

@qcoumes
Copy link
Member

qcoumes commented May 17, 2023

Sorry, had a lot of work last month, I'll look at this tomorrow

Copy link
Member

@qcoumes qcoumes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the two questions below, everything seems good.

I tested the CLI, and everything worked perfectly. It does not work with already existing indices, but it was to be expected.

I will work on this project in the coming month to refactor the CLI and allow more customization on the indexing process (see #39). This will probably lead to other breaking changes so this feature will not be released immediately if you're OK with it.

Comment on lines +114 to +115
if len(actions_on_aliases) == 1 and cls._index.exists():
cls._index.delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you're doing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is supposed to handle the case of a pre-existing non-versioned index: if there is only one action to apply (no index alias to remove from a versioned index), but the index exists, let's delete it. I have seen it working, but it's not unit-tested, and you seem to have seen it not working 😅

Comment on lines +372 to +377
subparser.add_argument(
"--index-suffix",
type=str,
default=None,
help="The suffix for the index name (if you don't provide one, the current index will be used). Required for `migrate` subcommand.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make more sense for the migrate action to be in the index subcommand, since it doesn't really after the documents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's actually plainly related to Documents: a Document is represented by one or more indices, and this migrate subcommand is all about switching from one representation to another. Plus, it doesn't do anything to indices themselves except setting/unsetting aliases, which is an implementation detail of switching representations of a Document.

@David-Guillot
Copy link
Contributor Author

this feature will not be released immediately if you're OK with it.

Of course I'm okay with this, we're already relying on our fork's branch to benefit from this feature, we can continue doing so for a couple of months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature proposal: zero-downtime reindex
3 participants