-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to index MiniCPAN #47
Comments
I can confirm that this continues to happening:
From the error message
I suppose this has to do with the Index Structure. From inside the Docker Cluster the ElasticSearch shows that it has different indices:
The index Which results that the actual index has only 11 documents indexed:
those with a even Version Number |
It's noteworthy that in the Production Environment the responses are generated from the Furthermore I'm experiencing an
As this Exception says |
I wonder if there's something going on here because you've run the test and the production environments at various times? Obviously that shouldn't be an issue, but the tests pass in CI running an a fresh environment. I suppose it's possible that something is getting routed to the wrong ES instance? Can you confirm that when you're running the tests that you don't also have a production ES container up and running? Might be helpful to exclude that as a possibility. |
Thank you for taking interest in this issue. in the intent to start everything over I deleted the conflicting indices with:
then I restarted the cluster and restarted the indexing:
I interrupted the endless failing sequence and inspected the created indices:
also I cannot find the 804 files on my system:
SELinux is not enabled:
|
I deleted the index once more but it continues to run into the same error:
the index script created always the |
Has this been resolved? |
I updated the repositories and rebuilt the images completely
|
I inserted some activity notices which show the indices are rebuilt but they are apparently wrongly rebuilt
|
Do you have the same issue if the |
When you stop the |
I managed to set up an GitHub Action test to represent what is happening when trying to index with It documents the same error as commented above at
interestingly it clearly states it is using the index |
Analysing the activity in
so after this command the wrongly built Understanding this I concluded that deleting all indices manually should make the indices being created correctly:
And actually I can observe now that the packages are correctly imported:
I suppose that creating the
But strangely this does not explain why in the test case there are errors in the indexing and it is not correctly indexed.
|
The Test Result differs drastically from the expected result but effectively illustrates the issue which is discussed here:
That way the Definition of a successful test would be:
as seen in the local environment:
|
I was wondering why the activity message don't appear in the log in all the test runs.
To confirm this I inserted in the test a check of the ElasticSearch Service before starting the Mapping
but in the test run the check failed with:
Which confirms the Conclusion that ElasticSearch is not ready for Connections yet. Therefore when the Indexation starts to import the Packages into the ElasticSearch service the service has become ready for connections but there is NO Index created yet. So, where does the infamous Index Understanding this scenario the steps to reproduce it can be describe as follows:
Still this might depend strongly on the computing capacity of the local desktop and the disk speed. On the other hand steps for a Manual Workaround for local development could be:
|
Now for the Automated Test to succeed and to close this issue definitely there are some New Features needed.
|
Is it possible that |
Looking at the Test Logs it shows that it actually works: |
Right, good call. That's the same thing we're doing to work around |
I think the script in the command and which is also used in the project is quite ok. As another interesting feature I would also report the timestamp when it became ready or when the timeout was reached to be able to check it against the reports from other dependent components as seen now in this incident. |
Those features sound good. If you'd like to implement them, that would be great. |
So, for the first sprint I would work on |
Sounds great! |
When I working on the script Furthermore I found that the Index Health State information is of crucial importance to perform any self-checks in To implement the self-check I used the functionality provided by the To prevent an index corruption the script To ease up the recovery of broken indices for Users as explained before the
|
As document the Test Reports the ElasticSearch Engine needs about 2 - 5 seconds to start up depending on the host capabilities:
The default timeout parameter is set to 15 seconds which seems to be a generous default according to the documented measurements. In case the ElasticSearch Engine cannot start up the Perl Application with set the Error Code
and re-throw the Exception from the ElasticSearch Perl Client:
As seen with:
|
What if instead of having to have everyone interested in doing development build the test image, we just create an elastic search test image that already contains the index? It's small enough that it isn't going to be a problem for image size/download speed. The the index would be updated with every merge by the build process. |
According to the numbers that have been documented 823 Packages from 14165 Authors produce 406137 Documents in ElasticSearch . So, I suspect that this approach is not really scalable.
Also, as for development purposes it is of great value to have tools to be able to introduce and inspect just any Package published on MetaCPAN into the development enviroment. |
With that many documents for testing we're likely indexing too much. My goal is to lower the bar to entry and if developers are going to get stuck loading the index so that they can run a simple test, I'd like to eliminate that as much as possible. Tools as you describe would still be able to be used and I agree they provide tremendous value to the development process for the developers that need it. |
I have a minicpan volume and wanted to index it via index-cpan.sh, but it seems to fail:
It happens somewhere in the lib/MetaCPAN/Model/Release.pm code, but I'm unable to pinpoint it.
The command that I execute is:
Perhaps this is the wrong repository for the bug report, but I read a large disclaimer on the metacpan-api repo for filing issues and my guess is you don't have this issue there otherwise things would be broken on the live platform. So it either has something to do with the docker-compose infra or it is an actual bug.
The text was updated successfully, but these errors were encountered: