Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to index MiniCPAN #47

Open
waterkip opened this issue Oct 26, 2019 · 27 comments
Open

Unable to index MiniCPAN #47

waterkip opened this issue Oct 26, 2019 · 27 comments

Comments

@waterkip
Copy link

I have a minicpan volume and wanted to index it via index-cpan.sh, but it seems to fail:

2019/10/26 15:20:04 I release: Processing /CPAN/authors/id/M/MI/MIYAGAWA/Acme-Module-Authors-0.01.tar.gz
2019/10/26 15:20:04 F release: /CPAN/authors/id/M/MI/MIYAGAWA/Acme-Module-Authors-0.01.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.28.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'status' => 400,'error' => {'root_cause' => [{'type' => 'mapper_parsing_exception','reason' => 'failed to parse [version]'}],'type' => 'mapper_parsing_exception','caused_by' => {'reason' => 'For input string: "0.01"','type' => 'number_format_exception'},'reason' => 'failed to parse [version]'}},'request' => {'serialize' => 'std','path' => '/cpan/release/FKmYJKWUEMj4UpDcy2b__v58iho','ignore' => [],'mime_type' => 'application/json','body' => {'maturity' => 'released','dependency' => [],'resources' => {},'license' => ['unknown'],'stat' => {'mode' => 33188,'size' => 1724,'mtime' => 1037810324},'id' => 'FKmYJKWUEMj4UpDcy2b__v58iho','version_numified' => '0.01','deprecated' => bless( do{\(my $o = 0)},'JSON::PP::Boolean' ),'archive' => 'Acme-Module-Authors-0.01.tar.gz','date' => '2002-11-20T16:38:44','version' => '0.01','author' => 'MIYAGAWA','metadata' => {'abstract' => 'unknown','license' => ['unknown'],'release_status' => 'stable','dynamic_config' => 1,'prereqs' => {},'name' => 'Acme-Module-Authors','meta-spec' => {'version' => 2,'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec'},'generated_by' => 'CPAN::Meta::Converter version 2.150010','author' => ['unknown'],'version' => '0.01','no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']}},'status' => 'cpan','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'name' => 'Acme-Module-Authors-0.01','first' => $VAR1->{'request'}{'body'}{'deprecated'},'distribution' => 'Acme-Module-Authors','provides' => []},'qs' => {'refresh' => 1},'method' => 'POST'},'status_code' => 400}

It happens somewhere in the lib/MetaCPAN/Model/Release.pm code, but I'm unable to pinpoint it.

The command that I execute is:

./bin/metacpan release /CPAN/authors/id/
# and/or
./bin/metacpan release /CPAN/authors/id/W/WA/WATERKIP # Myself
./bin/metacpan release /CPAN/authors/id/M/MS/MSTREEK # just one release, easily testable

Perhaps this is the wrong repository for the bug report, but I read a large disclaimer on the metacpan-api repo for filing issues and my guess is you don't have this issue there otherwise things would be broken on the live platform. So it either has something to do with the docker-compose infra or it is an actual bug.

@bodo-hugo-barwich
Copy link
Contributor

I can confirm that this continues to happening:

2021/08/07 12:58:52 I release: Processing /CPAN/authors/id/N/NE/NEILB/enum-1.12.tar.gz
2021/08/07 12:58:52 F release: /CPAN/authors/id/N/NE/NEILB/enum-1.12.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'status' => 400,'error' => {'type' => 'mapper_parsing_exception','caused_by' => {'reason' => 'For input string: "1.12"','type' => 'number_format_exception'},'root_cause' => [{'reason' => 'failed to parse [version]','type' => 'mapper_parsing_exception'}],'reason' => 'failed to parse [version]'}},'request' => {'method' => 'POST','ignore' => [],'mime_type' => 'application/json','qs' => {'refresh' => 1},'body' => {'checksum_sha256' => '69a7a891cd3888ed8b02c5e99a1f489f92a62ec351c0bc7894fd7bd7d1447ea9','license' => ['perl_5'],'authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'version' => '1.12','checksum_md5' => 'a89225b8c1f5925fefc7788baeedcaa9','resources' => {'homepage' => 'https://github.com/neilb/enum','bugtracker' => {'web' => 'https://github.com/neilb/enum/issues'},'repository' => {'type' => 'git','web' => 'https://github.com/neilb/enum','url' => 'https://github.com/neilb/enum.git'}},'metadata' => {'generated_by' => 'Dist::Zilla version 6.017, CPAN::Meta::Converter version 2.150010','x_serialization_backend' => 'Cpanel::JSON::XS version 4.24','author' => ['Byron Brummer <[email protected]>'],'abstract' => 'C style enumerated types and bitmask flags in Perl','release_status' => 'stable','x_generated_by_perl' => 'v5.28.2','name' => 'enum','no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'dynamic_config' => 0,'resources' => {'homepage' => 'https://github.com/neilb/enum','bugtracker' => {'web' => 'https://github.com/neilb/enum/issues'},'repository' => {'type' => 'git','web' => 'https://github.com/neilb/enum','url' => 'https://github.com/neilb/enum.git'}},'prereqs' => {'test' => {'requires' => {'Test::More' => '0.88','vars' => '0'}},'runtime' => {'requires' => {'strict' => '0','Carp' => '0','warnings' => '0','perl' => '5.006'}},'configure' => {'requires' => {'ExtUtils::MakeMaker' => '0'}}},'x_spdx_expression' => 'Artistic-1.0-Perl OR GPL-1.0-or-later','license' => ['perl_5'],'meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'version' => '1.12'},'deprecated' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'dependency' => [{'relationship' => 'requires','module' => 'warnings','phase' => 'runtime','version' => '0'},{'version' => '5.006','phase' => 'runtime','module' => 'perl','relationship' => 'requires'},{'module' => 'Carp','relationship' => 'requires','phase' => 'runtime','version' => '0'},{'version' => '0','phase' => 'runtime','relationship' => 'requires','module' => 'strict'},{'phase' => 'configure','relationship' => 'requires','module' => 'ExtUtils::MakeMaker','version' => '0'},{'version' => '0.88','phase' => 'test','module' => 'Test::More','relationship' => 'requires'},{'phase' => 'test','relationship' => 'requires','module' => 'vars','version' => '0'}],'stat' => {'mtime' => 1628062109,'size' => 14908,'mode' => 33204},'provides' => [],'distribution' => 'enum','maturity' => 'released','date' => '2021-08-04T07:28:29','archive' => 'enum-1.12.tar.gz','status' => 'cpan','id' => 'Rw43bHK51O5F928NMaSOeCsxP34','name' => 'enum-1.12','version_numified' => '1.12','first' => $VAR1->{'request'}{'body'}{'deprecated'},'abstract' => 'C style enumerated types and bitmask flags in Perl','author' => 'NEILB'},'path' => '/cpan/release/Rw43bHK51O5F928NMaSOeCsxP34','serialize' => 'std'},'status_code' => 400}

From the error message

{'body' => {
'status' => 400,
'error' => {
'type' => 'mapper_parsing_exception',
'caused_by' => {
'reason' => 'For input string: "1.12"',
'type' => 'number_format_exception'
},
'root_cause' => [{
'reason' => 'failed to parse [version]',
'type' => 'mapper_parsing_exception'
}],
'reason' => 'failed to parse [version]'}}

I suppose this has to do with the Index Structure.
Numeric Fields in ElasticSearch

From inside the Docker Cluster the ElasticSearch shows that it has different indices:

root@8c2828626bf5:/metacpan-api# curl -XGET 'elasticsearch_test:9200/_cat/indices'
yellow open cover       1 1  0 0   159b   159b 
yellow open cpan_v1_01  1 1  0 0   159b   159b 
yellow open contributor 1 1  0 0   159b   159b 
yellow open cpan        5 1 11 5 79.9kb 79.9kb 
yellow open user        1 1  0 0   159b   159b 

The index cpan happens to have a field cpan.file.properties.version which is a long which is supposed to accept Integer Values while the module version is mostly a Float Value like version = 1.12.

Which results that the actual index has only 11 documents indexed:

root@8c2828626bf5:/metacpan-api# curl -XGET 'elasticsearch_test:9200/cpan_v1_01/_count?pretty=true'
{
  "count" : 0,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}
root@8c2828626bf5:/metacpan-api# curl -XGET 'elasticsearch_test:9200/cpan/_count?pretty=true'
{
  "count" : 11,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

those with a even Version Number

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Aug 8, 2021

It's noteworthy that in the Production Environment the responses are generated from the cpan_v1_01 Index
like seen at:
Type File Mapping

Furthermore I'm experiencing an invalid_alias_name_exception Exception in elasticsearch_test:

$ docker-compose exec api_test prove -vl t/00_setup.t
t/00_setup.t .. 
ok 1 - /metacpan-api/var/t/tmp exists for testing
ok 2 - got ElasticSearch object
# {
#   'Elasticsearch info' => {
#     'cluster_name' => 'elasticsearch',
#     'cluster_uuid' => 'MfFCEpVLTkmWGMnMAToPDg',
#     'name' => 'Shriker',
#     'tagline' => 'You Know, for Search',
#     'version' => {
#       'build_hash' => '5376dca9f70f3abef96a77f4bb22720ace8240fd',
#       'build_snapshot' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
#       'build_timestamp' => '2017-07-18T12:17:44Z',
#       'lucene_version' => '5.5.4',
#       'number' => '2.4.6'
#     }
#   }
# }
# [...]
[Request] ** [http://elasticsearch_test:9200]-[400] [invalid_alias_name_exception] Invalid alias name [cpan], an index exists with the same name as the alias, with: {"index":"cpan_v1_01"}, called from sub MetaCPAN::Script::Mapping::deploy_mapping at /metacpan-api/lib/MetaCPAN/Script/Mapping.pm line 131. With vars: {'body' => {'error' => {'reason' => 'Invalid alias name [cpan], an index exists with the same name as the alias','root_cause' => [{'type' => 'invalid_alias_name_exception','index' => 'cpan_v1_01','reason' => 'Invalid alias name [cpan], an index exists with the same name as the alias'}],'type' => 'invalid_alias_name_exception','index' => 'cpan_v1_01'},'status' => 400},'request' => {'serialize' => 'std','qs' => {},'path' => '/cpan_v1_01/_alias/cpan','body' => undef,'ignore' => [],'method' => 'PUT'},'status_code' => 400}
 at /usr/local/lib/perl5/site_perl/5.30.1/Search/Elasticsearch/Util.pm line 71.

As this Exception says cpan is an invalid alias for cpan_v1_01 I suppose that the cpan Index should not exist rather it should be an Alias of cpan_v1_01 for the Docker Cluster to be able to index correctly.

@oalders
Copy link
Member

oalders commented Aug 8, 2021

I wonder if there's something going on here because you've run the test and the production environments at various times? Obviously that shouldn't be an issue, but the tests pass in CI running an a fresh environment. I suppose it's possible that something is getting routed to the wrong ES instance? Can you confirm that when you're running the tests that you don't also have a production ES container up and running? Might be helpful to exclude that as a possibility.

@bodo-hugo-barwich
Copy link
Contributor

Thank you for taking interest in this issue.

in the intent to start everything over I deleted the conflicting indices with:

$ docker-compose exec api_test /bin/bash
root@6133962c5f5d:/metacpan-api# curl -XGET 'elasticsearch_test:9200/_cat/indices'
yellow open cover       1 1  0 0   159b   159b 
yellow open cpan_v1_01  1 1  0 0   159b   159b 
yellow open contributor 1 1  0 0   159b   159b 
yellow open cpan        5 1 11 5 79.9kb 79.9kb 
yellow open user        1 1  0 0   159b   159b 
root@6133962c5f5d:/metacpan-api# curl -XDELETE 'elasticsearch_test:9200/cpan'
{"acknowledged":true}
root@6133962c5f5d:/metacpan-api# 
root@6133962c5f5d:/metacpan-api# curl -XDELETE 'elasticsearch_test:9200/cpan_v1_01'
{"acknowledged":true}
root@6133962c5f5d:/metacpan-api# 
root@6133962c5f5d:/metacpan-api# 
root@6133962c5f5d:/metacpan-api# curl -XGET 'elasticsearch_test:9200/_cat/indices'
yellow open cover       1 1 0 0 159b 159b 
yellow open user        1 1 0 0 159b 159b 
yellow open contributor 1 1 0 0 159b 159b 

then I restarted the cluster and restarted the indexing:

# docker-compose up -d api
Creating network "metacpan_traefik-network" with the default driver
Creating network "metacpan_default" with the default driver
Creating network "metacpan_web-network" with the default driver
Creating network "metacpan_elasticsearch" with the default driver
Creating network "metacpan_database" with the default driver
Creating network "metacpan_mongo" with the default driver
Creating metacpan_elasticsearch_1 ... done
Creating metacpan_traefik_1       ... done
Creating metacpan_pgdb_1          ... done
Creating metacpan_api_1           ... done
# docker-compose exec api index-cpan.sh
*** Warning ***: this will delete EVERYTHING and re-create the (empty) indexes
Are you sure you want to do this (type "YES" to confirm) ? yes
bye.
receiving incremental file list
authors/

sent 34 bytes  received 2,998 bytes  673.78 bytes/sec
total size is 1,651,912  speedup is 544.83
receiving incremental file list

sent 31 bytes  received 14,980 bytes  4,288.86 bytes/sec
total size is 36,346,492  speedup is 2,421.32
receiving incremental file list

sent 31 bytes  received 46,480 bytes  13,288.86 bytes/sec
total size is 39,190,648  speedup is 842.61
receiving incremental file list

sent 32 bytes  received 10,471 bytes  2,334.00 bytes/sec
total size is 2,988,452  speedup is 284.53
receiving incremental file list

sent 31 bytes  received 16,718 bytes  4,785.43 bytes/sec
total size is 6,490,026  speedup is 387.49
receiving incremental file list
authors/02STAMP

sent 52 bytes  received 312 bytes  104.00 bytes/sec
total size is 5,145,490  speedup is 14,135.96
receiving incremental file list
modules/

sent 30 bytes  received 464 bytes  109.78 bytes/sec
total size is 61,659,249  speedup is 124,816.29
receiving incremental file list

sent 27 bytes  received 98 bytes  35.71 bytes/sec
total size is 763  speedup is 6.10
2021/08/09 19:27:58 I release: Looking for archives in /CPAN/authors/id/
2021/08/09 19:27:58 I release: 804 archives found
2021/08/09 19:28:07 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM.tar.gz
2021/08/09 19:28:08 F release: /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM.tar.gz [Conflict] ** [http://elasticsearch:9200]-[409] [document_already_exists_exception] [distribution][Apache-AuthCookieNTLM]: document already exists, with: {"shard":"2","index":"cpan"}, called from sub Search::Elasticsearch::Role::Client::Direct::Main::create at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'request' => {'qs' => {'op_type' => 'create'},'method' => 'POST','serialize' => 'std','ignore' => [],'path' => '/cpan/distribution/Apache-AuthCookieNTLM','mime_type' => 'application/json','body' => {'river' => {'bucket' => 0,'total' => 0,'bus_factor' => 1,'immediate' => 0},'name' => 'Apache-AuthCookieNTLM'}},'status_code' => 409,'body' => {'status' => 409,'error' => {'shard' => '2','root_cause' => [{'shard' => '2','index' => 'cpan','type' => 'document_already_exists_exception','reason' => '[distribution][Apache-AuthCookieNTLM]: document already exists'}],'index' => 'cpan','type' => 'document_already_exists_exception','reason' => '[distribution][Apache-AuthCookieNTLM]: document already exists'}}}
2021/08/09 19:28:08 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz
2021/08/09 19:28:08 F release: /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'error' => {'root_cause' => [{'reason' => 'failed to parse [version]','type' => 'mapper_parsing_exception'}],'type' => 'mapper_parsing_exception','caused_by' => {'reason' => 'For input string: "0.07"','type' => 'number_format_exception'},'reason' => 'failed to parse [version]'},'status' => 400},'request' => {'body' => {'name' => 'Apache-AuthCookieNTLM-0.07','provides' => [],'license' => ['unknown'],'version' => '0.07','first' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'deprecated' => $VAR1->{'request'}{'body'}{'first'},'metadata' => {'x_version_from' => 'lib/Apache/AuthCookieNTLM.pm','prereqs' => {'runtime' => {'requires' => {'Test::More' => '0.01','Apache::Cookie' => '0.01','Apache::Request' => '0.01','Apache::AuthenNTLM' => '2.07','Apache::Constants' => '0.01'}}},'meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'release_status' => 'stable','no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'x_installdirs' => 'site','license' => ['unknown'],'version' => '0.07','name' => 'Apache-AuthCookieNTLM','dynamic_config' => 1,'abstract' => 'unknown','author' => ['unknown'],'generated_by' => 'ExtUtils::MakeMaker version 6.12, CPAN::Meta::Converter version 2.150010'},'checksum_sha256' => 'c447411dc44725d958285e12dac56ca832b4bc6097cfdfdb11a611783fca9c84','author' => 'LLAP','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'distribution' => 'Apache-AuthCookieNTLM','checksum_md5' => '11259c43bd9f3e96ccefdcb259751ff7','maturity' => 'released','status' => 'cpan','id' => 'XPQWVIBm1L__vZdwC76Z8yJsxVU','version_numified' => '0.07','date' => '2004-10-03T10:08:11','stat' => {'mode' => 33204,'mtime' => 1096798091,'size' => 5056},'dependency' => [{'phase' => 'runtime','module' => 'Apache::Request','version' => '0.01','relationship' => 'requires'},{'phase' => 'runtime','module' => 'Apache::Constants','version' => '0.01','relationship' => 'requires'},{'version' => '2.07','module' => 'Apache::AuthenNTLM','phase' => 'runtime','relationship' => 'requires'},{'relationship' => 'requires','version' => '0.01','phase' => 'runtime','module' => 'Test::More'},{'relationship' => 'requires','version' => '0.01','phase' => 'runtime','module' => 'Apache::Cookie'}],'archive' => 'Apache-AuthCookieNTLM-0.07.tar.gz','resources' => {}},'mime_type' => 'application/json','path' => '/cpan/release/XPQWVIBm1L__vZdwC76Z8yJsxVU','serialize' => 'std','method' => 'POST','qs' => {'refresh' => 1},'ignore' => []},'status_code' => 400}

I interrupted the endless failing sequence and inspected the created indices:

# docker-compose exec api /bin/bash
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch_test:9200/_cat/indices'
curl: (6) Could not resolve host: elasticsearch_test
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/_cat/indices'
yellow open cpan 5 1 11 10 117.9kb 117.9kb 
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/cpan/_count'
{"count":11,"_shards":{"total":5,"successful":5,"failed":0}}
root@06e3fe5f6379:/metacpan-api# 
root@06e3fe5f6379:/metacpan-api# 
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/cpan/_count?pretty=true'
{
  "count" : 11,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

also I cannot find the 804 files on my system:

# ls -lah cpan | wc -l
3

SELinux is not enabled:

# getsebool -a|wc -l
bash: getsebool: orden no encontrada
0
# ls -lahZ
total 76K
drwxr-xr-x 13 bodo bodo ? 4,0K ago  9 20:21 .
drwxr-xr-x 18 bodo bodo ?  325 ago  7 12:33 ..
drwxr-xr-x  2 bodo bodo ?   80 ago  7 12:33 bin
drwxr-xr-x  2 bodo bodo ?   24 ago  7 12:33 .circleci
drwxr-xr-x  4 bodo bodo ?   46 ago  7 12:33 configs
drwxr-xr-x  2 bodo bodo ?    6 ago  9 20:21 cpan
-rw-r--r--  1 bodo bodo ?  13K ago  7 12:33 docker-compose.yml
drwxr-xr-x  3 bodo bodo ?   57 ago  7 12:33 elasticsearch
-rw-r--r--  1 bodo bodo ?  150 ago  7 12:33 .env
drwxr-xr-x  8 bodo bodo ?  163 ago  7 12:33 .git
drwxr-xr-x  3 bodo bodo ?   23 ago  7 12:33 .github
drwxr-xr-x  2 bodo bodo ?   30 ago  7 12:33 github-meets-cpan
-rw-r--r--  1 bodo bodo ?  198 ago  7 12:33 .gitignore
-rw-r--r--  1 bodo bodo ?  112 ago  7 12:33 grep.env
-rw-r--r--  1 bodo bodo ? 8,7K ago  7 12:33 LICENSE
-rw-r--r--  1 bodo bodo ?  185 ago  7 12:33 localapi.env
-rw-r--r--  1 bodo bodo ?  284 ago  7 12:33 localapi_test.env
-rw-r--r--  1 bodo bodo ?  166 ago  7 12:33 logging.env
drwxr-xr-x  2 bodo bodo ?   41 ago  7 12:33 orepan2
drwxr-xr-x  3 bodo bodo ?   80 ago  7 12:33 pg
-rw-r--r--  1 bodo bodo ?  20K ago  7 12:33 README.md
drwxr-xr-x  6 bodo bodo ?  144 ago  7 12:40 src
# ls -lah cpan
total 4,0K
drwxr-xr-x  2 bodo bodo    6 ago  9 20:21 .
drwxr-xr-x 13 bodo bodo 4,0K ago  9 20:21 ..

@bodo-hugo-barwich
Copy link
Contributor

I deleted the index once more but it continues to run into the same error:

# docker-compose exec api /bin/bash
root@06e3fe5f6379:/metacpan-api# curl -XDELETE 'elasticsearch:9200/cpan?pretty=true' 
{
  "acknowledged" : true
}
root@06e3fe5f6379:/metacpan-api# exit
exit
# docker-compose exec api index-cpan.sh
*** Warning ***: this will delete EVERYTHING and re-create the (empty) indexes
Are you sure you want to do this (type "YES" to confirm) ? yes
bye.
receiving incremental file list
authors/

sent 34 bytes  received 2,998 bytes  673.78 bytes/sec
total size is 1,651,912  speedup is 544.83
receiving incremental file list

sent 31 bytes  received 14,980 bytes  4,288.86 bytes/sec
total size is 36,346,492  speedup is 2,421.32
receiving incremental file list
authors/

sent 34 bytes  received 46,483 bytes  10,337.11 bytes/sec
total size is 39,190,648  speedup is 842.50
receiving incremental file list
authors/

sent 35 bytes  received 10,474 bytes  3,002.57 bytes/sec
total size is 2,988,452  speedup is 284.37
receiving incremental file list

sent 31 bytes  received 16,718 bytes  4,785.43 bytes/sec
total size is 6,490,026  speedup is 387.49
receiving incremental file list
authors/02STAMP

sent 52 bytes  received 312 bytes  104.00 bytes/sec
total size is 5,145,490  speedup is 14,135.96
receiving incremental file list
modules/
modules/07mirror.json
modules/07mirror.yml

sent 80 bytes  received 911 bytes  180.18 bytes/sec
total size is 61,659,249  speedup is 62,219.22
receiving incremental file list

sent 27 bytes  received 98 bytes  27.78 bytes/sec
total size is 763  speedup is 6.10
2021/08/09 19:49:51 I release: Looking for archives in /CPAN/authors/id/
2021/08/09 19:49:51 I release: 804 archives found
2021/08/09 19:50:00 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM.tar.gz
2021/08/09 19:50:04 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz
2021/08/09 19:50:04 F release: /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [metadata.version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'request' => {'mime_type' => 'application/json','body' => {'author' => 'LLAP','deprecated' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'dependency' => [{'module' => 'Apache::Request','phase' => 'runtime','version' => '0.01','relationship' => 'requires'},{'version' => '2.07','relationship' => 'requires','phase' => 'runtime','module' => 'Apache::AuthenNTLM'},{'phase' => 'runtime','version' => '0.01','relationship' => 'requires','module' => 'Apache::Constants'},{'phase' => 'runtime','version' => '0.01','relationship' => 'requires','module' => 'Apache::Cookie'},{'relationship' => 'requires','version' => '0.01','phase' => 'runtime','module' => 'Test::More'}],'distribution' => 'Apache-AuthCookieNTLM','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'date' => '2004-10-03T10:08:11','version_numified' => '0.07','license' => ['unknown'],'provides' => [],'resources' => {},'metadata' => {'abstract' => 'unknown','name' => 'Apache-AuthCookieNTLM','x_installdirs' => 'site','generated_by' => 'ExtUtils::MakeMaker version 6.12, CPAN::Meta::Converter version 2.150010','version' => '0.07','license' => ['unknown'],'meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'dynamic_config' => 1,'prereqs' => {'runtime' => {'requires' => {'Test::More' => '0.01','Apache::Constants' => '0.01','Apache::AuthenNTLM' => '2.07','Apache::Request' => '0.01','Apache::Cookie' => '0.01'}}},'x_version_from' => 'lib/Apache/AuthCookieNTLM.pm','release_status' => 'stable','no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'author' => ['unknown']},'id' => 'XPQWVIBm1L__vZdwC76Z8yJsxVU','archive' => 'Apache-AuthCookieNTLM-0.07.tar.gz','status' => 'cpan','stat' => {'mode' => 33204,'mtime' => 1096798091,'size' => 5056},'version' => '0.07','maturity' => 'released','checksum_sha256' => 'c447411dc44725d958285e12dac56ca832b4bc6097cfdfdb11a611783fca9c84','first' => $VAR1->{'request'}{'body'}{'deprecated'},'name' => 'Apache-AuthCookieNTLM-0.07','checksum_md5' => '11259c43bd9f3e96ccefdcb259751ff7'},'qs' => {'refresh' => 1},'serialize' => 'std','path' => '/cpan/release/XPQWVIBm1L__vZdwC76Z8yJsxVU','method' => 'POST','ignore' => []},'status_code' => 400,'body' => {'status' => 400,'error' => {'root_cause' => [{'reason' => 'failed to parse [metadata.version]','type' => 'mapper_parsing_exception'}],'reason' => 'failed to parse [metadata.version]','type' => 'mapper_parsing_exception','caused_by' => {'type' => 'number_format_exception','reason' => 'For input string: "0.07"'}}}}
2021/08/09 19:50:04 I release: Processing /CPAN/authors/id/L/LL/LLAP/Mac-Glue-Apps-AddressBookExport-0.2.tar.gz
2021/08/09 19:50:04 F release: /CPAN/authors/id/L/LL/LLAP/Mac-Glue-Apps-AddressBookExport-0.2.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'error' => {'root_cause' => [{'reason' => 'failed to parse [version]','type' => 'mapper_parsing_exception'}],'reason' => 'failed to parse [version]','type' => 'mapper_parsing_exception','caused_by' => {'type' => 'number_format_exception','reason' => 'For input string: "0.2"'}},'status' => 400},'status_code' => 400,'request' => {'body' => {'stat' => {'mtime' => 1106516177,'mode' => 33204,'size' => 5544},'status' => 'cpan','id' => 'eaGzdAZSDO9NcJ4oj1ryus770aE','archive' => 'Mac-Glue-Apps-AddressBookExport-0.2.tar.gz','name' => 'Mac-Glue-Apps-AddressBookExport-0.2','checksum_md5' => '1a3f7e792a01b7a8e32433b96e511489','first' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'version' => '0.2','maturity' => 'released','checksum_sha256' => '69055f866decd0cf8476aca91a347fd630bd9a971a234f1395c6434950e49465','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'date' => '2005-01-23T21:36:17','dependency' => [{'relationship' => 'requires','version' => '1.22','phase' => 'runtime','module' => 'Mac::Glue'},{'module' => 'Text::vCard','relationship' => 'requires','version' => '1.9','phase' => 'runtime'},{'module' => 'Encode','phase' => 'runtime','version' => '0.1','relationship' => 'requires'},{'phase' => 'runtime','relationship' => 'requires','version' => '2','module' => 'Template'},{'phase' => 'runtime','relationship' => 'requires','version' => '0.1','module' => 'Test::More'}],'distribution' => 'Mac-Glue-Apps-AddressBookExport','deprecated' => $VAR1->{'request'}{'body'}{'first'},'author' => 'LLAP','metadata' => {'version' => '0.2','generated_by' => 'ExtUtils::MakeMaker version 6.12, CPAN::Meta::Converter version 2.150010','x_installdirs' => 'site','name' => 'Mac-Glue-Apps-AddressBookExport','abstract' => 'unknown','prereqs' => {'runtime' => {'requires' => {'Test::More' => '0.1','Template' => '2','Mac::Glue' => '1.22','Text::vCard' => '1.9','Encode' => '0.1'}}},'dynamic_config' => 1,'meta-spec' => {'version' => 2,'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec'},'license' => ['unknown'],'author' => ['unknown'],'no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'release_status' => 'stable','x_version_from' => 'lib/Mac/Glue/Apps/AddressBookExport.pm'},'provides' => [],'license' => ['unknown'],'resources' => {},'version_numified' => '0.2'},'mime_type' => 'application/json','qs' => {'refresh' => 1},'serialize' => 'std','path' => '/cpan/release/eaGzdAZSDO9NcJ4oj1ryus770aE','method' => 'POST','ignore' => []}}
^C
# docker-compose exec api /bin/bash
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/cpan/_count?pretty=true'
{
  "count" : 11,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/_cat/indices?pretty=true'
yellow open cpan 5 1 11 1 71.4kb 71.4kb 
root@06e3fe5f6379:/metacpan-api# curl -XGET 'elasticsearch:9200/_cat/indices'
yellow open cpan 5 1 11 1 71.4kb 71.4kb 

the index script created always the cpan index and inserts only the first distribution.

@oalders
Copy link
Member

oalders commented Sep 16, 2021

Has this been resolved?

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Sep 16, 2021

I updated the repositories and rebuilt the images completely
but it just keeps falling as before:

$ docker-compose up -d api
metacpan_pgdb_1 is up-to-date
Creating metacpan_traefik_1       ... done
Creating metacpan_elasticsearch_1 ... done
Creating metacpan_api_1           ... done

$ docker-compose exec api index-cpan.sh
Mapping: deleting ...
*** Warning ***: this will delete EVERYTHING and re-create the (empty) indexes
Are you sure you want to do this (type "YES" to confirm) ? yes
bye.
receiving incremental file list
authors/

sent 34 bytes  received 2,998 bytes  673.78 bytes/sec
total size is 1,651,912  speedup is 544.83
receiving incremental file list

sent 31 bytes  received 14,980 bytes  4,288.86 bytes/sec
total size is 36,346,492  speedup is 2,421.32
receiving incremental file list
authors/id/O/OA/OALDERS/
authors/id/O/OA/OALDERS/App-perlimports-0.000018.meta
authors/id/O/OA/OALDERS/App-perlimports-0.000018.readme
authors/id/O/OA/OALDERS/App-perlimports-0.000018.tar.gz
authors/id/O/OA/OALDERS/CHECKSUMS
authors/id/O/OA/OALDERS/Open-This-0.000025.meta
authors/id/O/OA/OALDERS/Open-This-0.000025.readme
authors/id/O/OA/OALDERS/Open-This-0.000025.tar.gz
authors/id/O/OA/OALDERS/libwww-perl-6.56.meta
authors/id/O/OA/OALDERS/libwww-perl-6.56.readme
authors/id/O/OA/OALDERS/libwww-perl-6.56.tar.gz

sent 2,370 bytes  received 471,482 bytes  86,154.91 bytes/sec
total size is 39,611,154  speedup is 83.59
receiving incremental file list

sent 32 bytes  received 10,471 bytes  3,000.86 bytes/sec
total size is 2,988,452  speedup is 284.53
receiving incremental file list

sent 31 bytes  received 16,718 bytes  6,699.60 bytes/sec
total size is 6,490,026  speedup is 387.49
receiving incremental file list
authors/
authors/00whois.html
authors/00whois.xml
authors/01mailrc.txt.gz
authors/02STAMP
authors/02authors.txt.gz

sent 23,051 bytes  received 632,589 bytes  100,867.69 bytes/sec
total size is 5,148,154  speedup is 7.85
receiving incremental file list
modules/
modules/01modules.index.html
modules/01modules.mtime.html
modules/01modules.mtime.rss
modules/02STAMP
modules/02packages.details.txt
modules/02packages.details.txt.gz
modules/06perms.txt
modules/06perms.txt.gz
modules/07mirror.json
modules/07mirror.yml

sent 95,458 bytes  received 5,761,117 bytes  468,526.00 bytes/sec
total size is 61,705,699  speedup is 10.54
receiving incremental file list

sent 27 bytes  received 98 bytes  35.71 bytes/sec
total size is 763  speedup is 6.10
Mapping: rebuilding ...
2021/09/16 19:40:55 I release: Looking for archives in /CPAN/authors/id/
2021/09/16 19:40:55 I release: 807 archives found
2021/09/16 19:41:05 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM.tar.gz
2021/09/16 19:41:06 F release: /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM.tar.gz [Conflict] ** [http://elasticsearch:9200]-[409] [document_already_exists_exception] [distribution][Apache-AuthCookieNTLM]: document already exists, with: {"shard":"2","index":"cpan"}, called from sub Search::Elasticsearch::Role::Client::Direct::Main::create at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'request' => {'qs' => {'op_type' => 'create'},'serialize' => 'std','ignore' => [],'method' => 'POST','path' => '/cpan/distribution/Apache-AuthCookieNTLM','mime_type' => 'application/json','body' => {'river' => {'immediate' => 0,'bucket' => 0,'total' => 0,'bus_factor' => 1},'name' => 'Apache-AuthCookieNTLM'}},'body' => {'error' => {'index' => 'cpan','reason' => '[distribution][Apache-AuthCookieNTLM]: document already exists','root_cause' => [{'shard' => '2','type' => 'document_already_exists_exception','index' => 'cpan','reason' => '[distribution][Apache-AuthCookieNTLM]: document already exists'}],'type' => 'document_already_exists_exception','shard' => '2'},'status' => 409},'status_code' => 409}
2021/09/16 19:41:06 I release: Processing /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz
2021/09/16 19:41:07 F release: /CPAN/authors/id/L/LL/LLAP/Apache-AuthCookieNTLM-0.07.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [metadata.version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'error' => {'type' => 'mapper_parsing_exception','caused_by' => {'type' => 'number_format_exception','reason' => 'For input string: "0.07"'},'reason' => 'failed to parse [metadata.version]','root_cause' => [{'reason' => 'failed to parse [metadata.version]','type' => 'mapper_parsing_exception'}]},'status' => 400},'request' => {'ignore' => [],'serialize' => 'std','qs' => {'refresh' => 1},'body' => {'dependency' => [{'phase' => 'runtime','module' => 'Apache::AuthenNTLM','version' => '2.07','relationship' => 'requires'},{'phase' => 'runtime','relationship' => 'requires','version' => '0.01','module' => 'Apache::Cookie'},{'phase' => 'runtime','relationship' => 'requires','version' => '0.01','module' => 'Test::More'},{'module' => 'Apache::Request','version' => '0.01','relationship' => 'requires','phase' => 'runtime'},{'phase' => 'runtime','relationship' => 'requires','version' => '0.01','module' => 'Apache::Constants'}],'distribution' => 'Apache-AuthCookieNTLM','maturity' => 'released','resources' => {},'metadata' => {'generated_by' => 'ExtUtils::MakeMaker version 6.12, CPAN::Meta::Converter version 2.150010','abstract' => 'unknown','x_version_from' => 'lib/Apache/AuthCookieNTLM.pm','version' => '0.07','prereqs' => {'runtime' => {'requires' => {'Apache::Request' => '0.01','Apache::Constants' => '0.01','Test::More' => '0.01','Apache::Cookie' => '0.01','Apache::AuthenNTLM' => '2.07'}}},'license' => ['unknown'],'x_installdirs' => 'site','author' => ['unknown'],'dynamic_config' => 1,'name' => 'Apache-AuthCookieNTLM','release_status' => 'stable','no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'meta-spec' => {'version' => 2,'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec'}},'version' => '0.07','license' => ['unknown'],'checksum_md5' => '11259c43bd9f3e96ccefdcb259751ff7','author' => 'LLAP','archive' => 'Apache-AuthCookieNTLM-0.07.tar.gz','date' => '2004-10-03T10:08:11','version_numified' => '0.07','deprecated' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'first' => $VAR1->{'request'}{'body'}{'deprecated'},'id' => 'XPQWVIBm1L__vZdwC76Z8yJsxVU','name' => 'Apache-AuthCookieNTLM-0.07','provides' => [],'status' => 'cpan','checksum_sha256' => 'c447411dc44725d958285e12dac56ca832b4bc6097cfdfdb11a611783fca9c84','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'stat' => {'mode' => 33204,'mtime' => 1096798091,'size' => 5056}},'mime_type' => 'application/json','method' => 'POST','path' => '/cpan/release/XPQWVIBm1L__vZdwC76Z8yJsxVU'},'status_code' => 400}
2021/09/16 19:41:07 I release: Processing /CPAN/authors/id/L/LL/LLAP/Mac-Glue-Apps-AddressBookExport-0.2.tar.gz
2021/09/16 19:41:07 F release: /CPAN/authors/id/L/LL/LLAP/Mac-Glue-Apps-AddressBookExport-0.2.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [metadata.version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'request' => {'ignore' => [],'qs' => {'refresh' => 1},'serialize' => 'std','body' => {'maturity' => 'released','resources' => {},'metadata' => {'author' => ['unknown'],'license' => ['unknown'],'x_installdirs' => 'site','prereqs' => {'runtime' => {'requires' => {'Template' => '2','Mac::Glue' => '1.22','Encode' => '0.1','Test::More' => '0.1','Text::vCard' => '1.9'}}},'version' => '0.2','x_version_from' => 'lib/Mac/Glue/Apps/AddressBookExport.pm','abstract' => 'unknown','generated_by' => 'ExtUtils::MakeMaker version 6.12, CPAN::Meta::Converter version 2.150010','meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'release_status' => 'stable','name' => 'Mac-Glue-Apps-AddressBookExport','dynamic_config' => 1},'dependency' => [{'relationship' => 'requires','version' => '1.22','module' => 'Mac::Glue','phase' => 'runtime'},{'version' => '0.1','relationship' => 'requires','module' => 'Encode','phase' => 'runtime'},{'phase' => 'runtime','version' => '2','relationship' => 'requires','module' => 'Template'},{'phase' => 'runtime','module' => 'Text::vCard','version' => '1.9','relationship' => 'requires'},{'phase' => 'runtime','version' => '0.1','relationship' => 'requires','module' => 'Test::More'}],'distribution' => 'Mac-Glue-Apps-AddressBookExport','author' => 'LLAP','archive' => 'Mac-Glue-Apps-AddressBookExport-0.2.tar.gz','version_numified' => '0.2','date' => '2005-01-23T21:36:17','deprecated' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'first' => $VAR1->{'request'}{'body'}{'deprecated'},'id' => 'eaGzdAZSDO9NcJ4oj1ryus770aE','version' => '0.2','license' => ['unknown'],'checksum_md5' => '1a3f7e792a01b7a8e32433b96e511489','provides' => [],'name' => 'Mac-Glue-Apps-AddressBookExport-0.2','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'stat' => {'mode' => 33204,'mtime' => 1106516177,'size' => 5544},'status' => 'cpan','checksum_sha256' => '69055f866decd0cf8476aca91a347fd630bd9a971a234f1395c6434950e49465'},'mime_type' => 'application/json','path' => '/cpan/release/eaGzdAZSDO9NcJ4oj1ryus770aE','method' => 'POST'},'body' => {'status' => 400,'error' => {'reason' => 'failed to parse [metadata.version]','root_cause' => [{'type' => 'mapper_parsing_exception','reason' => 'failed to parse [metadata.version]'}],'caused_by' => {'reason' => 'For input string: "0.2"','type' => 'number_format_exception'},'type' => 'mapper_parsing_exception'}},'status_code' => 400}
2021/09/16 19:41:07 I release: Processing /CPAN/authors/id/L/LL/LLAP/SOAP-Lite-Simple-1.9.tar.gz
2021/09/16 19:41:07 F release: /CPAN/authors/id/L/LL/LLAP/SOAP-Lite-Simple-1.9.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'body' => {'error' => {'type' => 'mapper_parsing_exception','caused_by' => {'type' => 'number_format_exception','reason' => 'For input string: "1.9"'},'root_cause' => [{'reason' => 'failed to parse [version]','type' => 'mapper_parsing_exception'}],'reason' => 'failed to parse [version]'},'status' => 400},'request' => {'ignore' => [],'qs' => {'refresh' => 1},'serialize' => 'std','body' => {'checksum_md5' => 'e7ad6f86f5985d69a6cc076cbe94285d','version' => '1.9','license' => ['unknown'],'date' => '2006-08-30T15:31:43','first' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'deprecated' => $VAR1->{'request'}{'body'}{'first'},'version_numified' => '1.9','id' => '7hjNc_7ix7WeB3FcZLLp2skmYms','author' => 'LLAP','archive' => 'SOAP-Lite-Simple-1.9.tar.gz','distribution' => 'SOAP-Lite-Simple','dependency' => [{'module' => 'SOAP::XML::Client','relationship' => 'requires','version' => '1.9','phase' => 'runtime'}],'resources' => {},'metadata' => {'release_status' => 'stable','name' => 'SOAP-Lite-Simple','dynamic_config' => 1,'no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'abstract' => 'unknown','generated_by' => 'ExtUtils::MakeMaker version 6.17, CPAN::Meta::Converter version 2.150010','author' => ['unknown'],'x_version_from' => undef,'prereqs' => {'runtime' => {'requires' => {'SOAP::XML::Client' => '1.9'}}},'license' => ['unknown'],'x_installdirs' => 'site','version' => '1.9'},'maturity' => 'released','checksum_sha256' => '35f4a0a396b6e6e3477ad42d7d8b558c4c9d78c96c13d0c3eb2d055da09291d8','status' => 'cpan','authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'stat' => {'size' => 3324,'mtime' => 1156951903,'mode' => 33204},'name' => 'SOAP-Lite-Simple-1.9','provides' => []},'mime_type' => 'application/json','path' => '/cpan/release/7hjNc_7ix7WeB3FcZLLp2skmYms','method' => 'POST'},'status_code' => 400}
2021/09/16 19:41:07 I release: Processing /CPAN/authors/id/P/PE/PERLER/DBIx-Class-SaltedPasswords-0.03001.tar.gz
2021/09/16 19:41:07 F release: /CPAN/authors/id/P/PE/PERLER/DBIx-Class-SaltedPasswords-0.03001.tar.gz [Request] ** [http://elasticsearch:9200]-[400] [mapper_parsing_exception] failed to parse [version], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/local/lib/perl5/site_perl/5.30.1/ElasticSearchX/Model/Document/Role.pm line 94. With vars: {'status_code' => 400,'body' => {'error' => {'caused_by' => {'reason' => 'For input string: "0.03001"','type' => 'number_format_exception'},'type' => 'mapper_parsing_exception','root_cause' => [{'type' => 'mapper_parsing_exception','reason' => 'failed to parse [version]'}],'reason' => 'failed to parse [version]'},'status' => 400},'request' => {'qs' => {'refresh' => 1},'serialize' => 'std','ignore' => [],'mime_type' => 'application/json','method' => 'POST','path' => '/cpan/release/RV9lvhVXosw_smNsI4_KHqtnzk4','body' => {'stat' => {'mtime' => 1168279102,'mode' => 33204,'size' => 3324},'authorized' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'status' => 'cpan','checksum_sha256' => 'fa7cea9e5aaa59ac9fa4fd7ee37e8a0b3618351dc609a8cb19c8e57c74164391','provides' => [],'name' => 'DBIx-Class-SaltedPasswords-0.03001','archive' => 'DBIx-Class-SaltedPasswords-0.03001.tar.gz','author' => 'PERLER','first' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),'date' => '2007-01-08T17:58:22','deprecated' => $VAR1->{'request'}{'body'}{'first'},'version_numified' => '0.03001','id' => 'RV9lvhVXosw_smNsI4_KHqtnzk4','license' => ['unknown'],'version' => '0.03001','checksum_md5' => '82436086c50a34c441a0f98ef21d1e5f','maturity' => 'released','metadata' => {'meta-spec' => {'url' => 'http://search.cpan.org/perldoc?CPAN::Meta::Spec','version' => 2},'no_index' => {'directory' => ['t','xt','inc','local','perl5','fatlib','example','blib','examples','eg']},'release_status' => 'stable','name' => 'DBIx-Class-SaltedPasswords','dynamic_config' => 1,'author' => ['unknown'],'prereqs' => {'runtime' => {'requires' => {'DBIx::Class' => '0.07003','Digest::MD5' => '2.36'}}},'license' => ['unknown'],'x_installdirs' => 'site','version' => '0.03001','x_version_from' => 'lib/DBIx/Class/SaltedPasswords.pm','abstract' => 'unknown','generated_by' => 'ExtUtils::MakeMaker version 6.30_01, CPAN::Meta::Converter version 2.150010'},'resources' => {},'dependency' => [{'phase' => 'runtime','module' => 'Digest::MD5','relationship' => 'requires','version' => '2.36'},{'version' => '0.07003','relationship' => 'requires','module' => 'DBIx::Class','phase' => 'runtime'}],'distribution' => 'DBIx-Class-SaltedPasswords'}}}
# [...]

@bodo-hugo-barwich
Copy link
Contributor

I inserted some activity notices which show the indices are rebuilt but they are apparently wrongly rebuilt

$ git diff origin/master
diff --git a/bin/index-cpan.sh b/bin/index-cpan.sh
index c3d99ea..50cd84d 100755
--- a/bin/index-cpan.sh
+++ b/bin/index-cpan.sh
@@ -1,9 +1,13 @@
 #!/bin/sh
 
+echo "Mapping: deleting ..."
+
 ./bin/run bin/metacpan mapping --delete
 
 /bin/partial-cpan-mirror.sh
 
+echo "Mapping: rebuilding ..."
+
 ./bin/run bin/metacpan release /CPAN/authors/id/
 ./bin/run bin/metacpan latest
 ./bin/run bin/metacpan author

@oalders
Copy link
Member

oalders commented Sep 16, 2021

Do you have the same issue if the elasticsearch_test container is not running while you rebuild?

@waterkip
Copy link
Author

When you stop the elasticsearch container you get a warning that the elasticsearch host is down when you run the index script. When the elasticsearch_test container is down (or up), the bug is present.

@bodo-hugo-barwich
Copy link
Contributor

I managed to set up an GitHub Action test to represent what is happening when trying to index with docker-compose
docker-compose Indexation Test
This way we clearly see what we are speaking about and can reproduce any actions.

It documents the same error as commented above at
error number_format_exception on indexing

Bulk error [index]: {"status":400,"_index":"cpan","_id":"_h_ZcHpv5S4tKQtoa6NtJgBA0mE","error":{"caused_by":{"type":"number_format_exception","reason":"For input string: \"1.08\""},"type":"mapper_parsing_exception","reason":"failed to parse [version]"},"_type":"file"} at /usr/local/lib/perl5/site_perl/5.30.1/Search/Elasticsearch/Role/Bulk.pm line 50.

interestingly it clearly states it is using the index cpan

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Sep 18, 2021

Analysing the activity in MetaCPAN::Script::Mapping::deploy_mapping I found that the command ./bin/run bin/metacpan mapping --delete in index-cpan.sh does not delete all indices in the ElasticSearch service but only those which it knows.
Therefore it is unable to fix wrongly built indices:

$ docker-compose exec api index-cpan.sh
Mapping: deleting ...
MetaCPAN::Script::Mapping::run - go ...
mp 0 dmp:
do {
  my $a = bless({
    _surrogate_keys_to_purge => [],
    arg_copy_type => "",
    arg_create_index => "",
    arg_delete_index => "",
    arg_deploy_mapping => 1,
    arg_list_types => 0,
    arg_update_index => "",
    ARGV => ["mapping", "--delete"],
    copy_query => "",
    copy_to_index => "",
    delete_from_type => "",
    die_on_error => 0,
    es => bless({
      api_version => "2_0",
      bulk_helper_class => "Bulk",
      logger => bless({
        log_as     => "elasticsearch.event",
        log_handle => bless({
                        adapter   => bless({ category => "elasticsearch.event" }, "Log::Any::Adapter::Null"),
                        category  => "elasticsearch.event",
                        context   => {},
                        formatter => sub { ... },
                      }, "Log::Any::Proxy::Null"),
        serializer => bless({
                        JSON => bless(do{\(my $o = "")}, "Cpanel::JSON::XS"),
                        mime_type => "application/json",
                      }, "Search::Elasticsearch::Serializer::JSON"),
        trace_as   => "elasticsearch.trace",
      }, "Search::Elasticsearch::Logger::LogAny"),
      scroll_helper_class => "Scroll",
      transport => bless({
        cxn_pool => bless({
          current_cxn_num => 0,
          cxn_factory => bless({
            _factory => sub { ... },
            cxn_class => "Search::Elasticsearch::Cxn::HTTPTiny",
            default_host => "http://localhost:9200",
            max_content_length => 104857600,
          }, "Search::Elasticsearch::Cxn::Factory"),
          cxns => [
            bless({
              _pid                  => 233,
              dead_timeout          => 60,
              default_headers       => {},
              default_qs_params     => {},
              handle_args           => {},
              host                  => "elasticsearch",
              is_https              => "",
              logger                => 'fix',
              max_content_length    => 104857600,
              max_dead_timeout      => 3600,
              next_ping             => -1,
              ping_failures         => 0,
              ping_timeout          => 2,
              port                  => 9200,
              request_timeout       => 30,
              scheme                => "http",
              serializer            => 'fix',
              sniff_request_timeout => 2,
              sniff_timeout         => 1,
              uri                   => bless(do{\(my $o = "http://elasticsearch:9200")}, "URI::http"),
              userinfo              => "",
            }, "Search::Elasticsearch::Cxn::HTTPTiny"),
          ],
          logger => 'fix',
          randomize_cxns => 1,
          retries => 0,
          seed_nodes => ["elasticsearch:9200"],
          serializer => 'fix',
        }, "Search::Elasticsearch::CxnPool::Static"),
        logger => 'fix',
        send_get_body_as => "GET",
        serializer => 'fix',
      }, "Search::Elasticsearch::Transport"),
    }, "Search::Elasticsearch::Client::2_0::Direct"),
    extra_argv => ["mapping"],
    index => "cpan",
    level => "info",
    logger => bless({
      additivity     => 1,
      ALL            => sub { ... },
      appender_names => ["app001", "app002"],
      category       => "mapping",
      DEBUG          => 'fix',
      ERROR          => sub { ... },
      FATAL          => 'fix',
      INFO           => 'fix',
      is_ALL         => sub { ... },
      is_DEBUG       => sub { ... },
      is_ERROR       => sub { ... },
      is_FATAL       => sub { ... },
      is_INFO        => sub { ... },
      is_OFF         => sub { ... },
      is_TRACE       => sub { ... },
      is_WARN        => sub { ... },
      layout         => undef,
      level          => 20000,
      num_appenders  => 2,
      OFF            => 'fix',
      TRACE          => 'fix',
      WARN           => 'fix',
    }, "Log::Log4perl::Logger"),
    patch_mapping => "{}",
    port => 5000,
    proxy => "",
    queue => 0,
    reindex => 0,
    skip_existing_mapping => 0,
    usage => bless({
      leader_text   => "usage: metacpan [-?h] [long options...]",
      options       => [
                         { constraint => {}, desc => " ", name => "quarantine", spec => "quarantine=s" },
                         {
                           constraint => {},
                           desc => "Die on errors instead of simply logging",
                           name => "die_on_error",
                           spec => "die_on_error!",
                         },
                         {
                           constraint => {},
                           desc => "Elasticsearch http connection string",
                           name => "es",
                           spec => "es",
                         },
                         { constraint => {}, desc => " ", name => "home", spec => "home=s" },
                         {
                           constraint => {},
                           desc => " ",
                           name => "cdn_services",
                           spec => "cdn_services",
                         },
                         { constraint => {}, desc => " ", name => "cdn_api", spec => "cdn_api" },
                         { constraint => {}, desc => "Log level", name => "level", spec => "level=s" },
                         { constraint => {}, desc => " ", name => "ua", spec => "ua" },
                         {
                           constraint => {},
                           desc => "add indexing jobs to the minion queue",
                           name => "queue",
                           spec => "queue!",
                         },
                         {
                           constraint => {},
                           desc => "Port for the proxy, defaults to 5000",
                           name => "port",
                           spec => "port=i",
                         },
                         {
                           constraint => {},
                           desc => "Index to use, defaults to \"cpan\" (when used: also export ES_SCRIPT_INDEX)",
                           name => "index",
                           spec => "index=s",
                         },
                         {
                           constraint => {},
                           desc => "Location of a local CPAN mirror, looks for \$ENV{MINICPAN} and ~/CPAN",
                           name => "cpan",
                           spec => "cpan=s",
                         },
                         {
                           constraint => {},
                           desc => "Prints this usage information.",
                           name => "help",
                           spec => "help|usage|?|h!",
                         },
                         { constraint => {}, desc => " ", name => "proxy", spec => "proxy=s" },
                         {
                           constraint => {},
                           desc => "delete index if it exists already",
                           name => "delete",
                           spec => "delete!",
                         },
                         {
                           constraint => {},
                           desc => "list available index type names",
                           name => "list_types",
                           spec => "list_types!",
                         },
                         {
                           constraint => {},
                           desc => "create a new empty index (copy mappings)",
                           name => "create_index",
                           spec => "create_index=s",
                         },
                         {
                           constraint => {},
                           desc => "update existing index (add mappings)",
                           name => "update_index",
                           spec => "update_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type mapping patches",
                           name => "patch_mapping",
                           spec => "patch_mapping=s",
                         },
                         {
                           constraint => {},
                           desc => "do NOT copy mappings other than patch_mapping",
                           name => "skip_existing_mapping",
                           spec => "skip_existing_mapping!",
                         },
                         {
                           constraint => {},
                           desc => "index to copy type to",
                           name => "copy_to_index",
                           spec => "copy_to_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type to copy",
                           name => "copy_type",
                           spec => "copy_type=s",
                         },
                         {
                           constraint => {},
                           desc => "match query (default: monthly time slices,  if provided must be a valid json query OR \"match_all\")",
                           name => "copy_query",
                           spec => "copy_query=s",
                         },
                         {
                           constraint => {},
                           desc => "reindex data from source index for exact mapping types",
                           name => "reindex",
                           spec => "reindex!",
                         },
                         {
                           constraint => {},
                           desc => "delete an existing index",
                           name => "delete_index",
                           spec => "delete_index=s",
                         },
                         {
                           constraint => {},
                           desc => "delete data from an existing type",
                           name => "delete_from_type",
                           spec => "delete_from_type=s",
                         },
                       ],
      show_defaults => undef,
    }, "Getopt::Long::Descriptive::Usage"),
  }, "MetaCPAN::Script::Mapping");
  $a->{es}{transport}{cxn_pool}{cxns}[0]{logger} = $a->{es}{logger};
  $a->{es}{transport}{cxn_pool}{cxns}[0]{serializer} = $a->{es}{logger}{serializer};
  $a->{es}{transport}{cxn_pool}{logger} = $a->{es}{logger};
  $a->{es}{transport}{cxn_pool}{serializer} = $a->{es}{logger}{serializer};
  $a->{es}{transport}{logger} = $a->{es}{logger};
  $a->{es}{transport}{serializer} = $a->{es}{logger}{serializer};
  $a->{logger}{DEBUG} = $a->{logger}{ALL};
  $a->{logger}{FATAL} = $a->{logger}{ERROR};
  $a->{logger}{INFO} = $a->{logger}{ERROR};
  $a->{logger}{OFF} = $a->{logger}{ERROR};
  $a->{logger}{TRACE} = $a->{logger}{ALL};
  $a->{logger}{WARN} = $a->{logger}{ERROR};
  $a;
}
'MetaCPAN::Script::Mapping::run' : Call on 'MetaCPAN::Script::Mapping::deploy_mapping'
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 11:41:23 I mapping: Deleting index: contributor
2021/09/18 11:41:23 I mapping: Creating index: contributor
2021/09/18 11:41:23 I mapping: Adding mapping: contributor/contributor
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 11:41:23 I mapping: Deleting index: cover
2021/09/18 11:41:23 I mapping: Creating index: cover
2021/09/18 11:41:23 I mapping: Adding mapping: cover/cover
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 11:41:24 I mapping: Deleting index: cpan_v1_01
2021/09/18 11:41:24 I mapping: Creating index: cpan_v1_01
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/author
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/distribution
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/favorite
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/file
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/mirror
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/package
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/permission
2021/09/18 11:41:24 I mapping: Adding mapping: cpan_v1_01/rating
2021/09/18 11:41:25 I mapping: Adding mapping: cpan_v1_01/release
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 11:41:25 I mapping: Deleting index: user
2021/09/18 11:41:25 I mapping: Creating index: user
2021/09/18 11:41:25 I mapping: Adding mapping: user/account
2021/09/18 11:41:25 I mapping: Adding mapping: user/identity
2021/09/18 11:41:25 I mapping: Adding mapping: user/session
[Request] ** [http://elasticsearch:9200]-[400] [invalid_alias_name_exception] Invalid alias name [cpan], an index exists with the same name as the alias, with: {"index":"cpan_v1_01"}, called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /metacpan-api/bin/../lib/MetaCPAN/Script/Mapping.pm line 466. With vars: {'body' => {'status' => 400,'error' => {'type' => 'invalid_alias_name_exception','index' => 'cpan_v1_01','reason' => 'Invalid alias name [cpan], an index exists with the same name as the alias','root_cause' => [{'type' => 'invalid_alias_name_exception','index' => 'cpan_v1_01','reason' => 'Invalid alias name [cpan], an index exists with the same name as the alias'}]}},'status_code' => 400,'request' => {'method' => 'PUT','path' => '/cpan_v1_01/_alias/cpan','qs' => {},'body' => undef,'ignore' => [],'serialize' => 'std'}}
Mapping: delete done.
ElasticSearch 'elasticsearch:9200': Indices showing ...
* Expire in 0 ms for 6 (transfer 0x55e7b5a06dc0)
* Expire in 1 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 0 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 2 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 0 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 1 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 4 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 1 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 1 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 4 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 2 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 2 ms for 1 (transfer 0x55e7b5a06dc0)
* Expire in 3 ms for 1 (transfer 0x55e7b5a06dc0)
*   Trying 172.22.0.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55e7b5a06dc0)
* Connected to elasticsearch (172.22.0.2) port 9200 (#0)
> GET /_cat/indices HTTP/1.1
> Host: elasticsearch:9200
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 240
< 
yellow open cover       1 1  0 0   130b   130b 
yellow open cpan_v1_01  1 1  0 0   130b   130b 
yellow open contributor 1 1  0 0   130b   130b 
yellow open cpan        5 1 11 2 83.2kb 83.2kb 
yellow open user        1 1  0 0   130b   130b 
* Connection #0 to host elasticsearch left intact

so after this command the wrongly built cpan index is still their and the alias creation has failed

Understanding this I concluded that deleting all indices manually should make the indices being created correctly:

$ docker-compose exec api /bin/bash
root@34ecb8e99713:/metacpan-api# curl -XDELETE 'elasticsearch:9200/cpan?pretty=true'
{
  "acknowledged" : true
}
root@34ecb8e99713:/metacpan-api# exit
exit

$ docker-compose exec api index-cpan.sh
Mapping: deleting ...
MetaCPAN::Script::Mapping::run - go ...
mp 0 dmp:
do {
  my $a = bless({
    _surrogate_keys_to_purge => [],
    arg_copy_type => "",
    arg_create_index => "",
    arg_delete_index => "",
    arg_deploy_mapping => 1,
    arg_list_types => 0,
    arg_update_index => "",
    ARGV => ["mapping", "--delete"],
    copy_query => "",
    copy_to_index => "",
    delete_from_type => "",
    die_on_error => 0,
    es => bless({
      api_version => "2_0",
      bulk_helper_class => "Bulk",
      logger => bless({
        log_as     => "elasticsearch.event",
        log_handle => bless({
                        adapter   => bless({ category => "elasticsearch.event" }, "Log::Any::Adapter::Null"),
                        category  => "elasticsearch.event",
                        context   => {},
                        formatter => sub { ... },
                      }, "Log::Any::Proxy::Null"),
        serializer => bless({
                        JSON => bless(do{\(my $o = "")}, "Cpanel::JSON::XS"),
                        mime_type => "application/json",
                      }, "Search::Elasticsearch::Serializer::JSON"),
        trace_as   => "elasticsearch.trace",
      }, "Search::Elasticsearch::Logger::LogAny"),
      scroll_helper_class => "Scroll",
      transport => bless({
        cxn_pool => bless({
          current_cxn_num => 0,
          cxn_factory => bless({
            _factory => sub { ... },
            cxn_class => "Search::Elasticsearch::Cxn::HTTPTiny",
            default_host => "http://localhost:9200",
            max_content_length => 104857600,
          }, "Search::Elasticsearch::Cxn::Factory"),
          cxns => [
            bless({
              _pid                  => 264,
              dead_timeout          => 60,
              default_headers       => {},
              default_qs_params     => {},
              handle_args           => {},
              host                  => "elasticsearch",
              is_https              => "",
              logger                => 'fix',
              max_content_length    => 104857600,
              max_dead_timeout      => 3600,
              next_ping             => -1,
              ping_failures         => 0,
              ping_timeout          => 2,
              port                  => 9200,
              request_timeout       => 30,
              scheme                => "http",
              serializer            => 'fix',
              sniff_request_timeout => 2,
              sniff_timeout         => 1,
              uri                   => bless(do{\(my $o = "http://elasticsearch:9200")}, "URI::http"),
              userinfo              => "",
            }, "Search::Elasticsearch::Cxn::HTTPTiny"),
          ],
          logger => 'fix',
          randomize_cxns => 1,
          retries => 0,
          seed_nodes => ["elasticsearch:9200"],
          serializer => 'fix',
        }, "Search::Elasticsearch::CxnPool::Static"),
        logger => 'fix',
        send_get_body_as => "GET",
        serializer => 'fix',
      }, "Search::Elasticsearch::Transport"),
    }, "Search::Elasticsearch::Client::2_0::Direct"),
    extra_argv => ["mapping"],
    index => "cpan",
    level => "info",
    logger => bless({
      additivity     => 1,
      ALL            => sub { ... },
      appender_names => ["app001", "app002"],
      category       => "mapping",
      DEBUG          => 'fix',
      ERROR          => sub { ... },
      FATAL          => 'fix',
      INFO           => 'fix',
      is_ALL         => sub { ... },
      is_DEBUG       => sub { ... },
      is_ERROR       => sub { ... },
      is_FATAL       => sub { ... },
      is_INFO        => sub { ... },
      is_OFF         => sub { ... },
      is_TRACE       => sub { ... },
      is_WARN        => sub { ... },
      layout         => undef,
      level          => 20000,
      num_appenders  => 2,
      OFF            => 'fix',
      TRACE          => 'fix',
      WARN           => 'fix',
    }, "Log::Log4perl::Logger"),
    patch_mapping => "{}",
    port => 5000,
    proxy => "",
    queue => 0,
    reindex => 0,
    skip_existing_mapping => 0,
    usage => bless({
      leader_text   => "usage: metacpan [-?h] [long options...]",
      options       => [
                         {
                           constraint => {},
                           desc => "Elasticsearch http connection string",
                           name => "es",
                           spec => "es",
                         },
                         {
                           constraint => {},
                           desc => "Index to use, defaults to \"cpan\" (when used: also export ES_SCRIPT_INDEX)",
                           name => "index",
                           spec => "index=s",
                         },
                         {
                           constraint => {},
                           desc => "Die on errors instead of simply logging",
                           name => "die_on_error",
                           spec => "die_on_error!",
                         },
                         {
                           constraint => {},
                           desc => " ",
                           name => "cdn_services",
                           spec => "cdn_services",
                         },
                         { constraint => {}, desc => " ", name => "proxy", spec => "proxy=s" },
                         {
                           constraint => {},
                           desc => "Port for the proxy, defaults to 5000",
                           name => "port",
                           spec => "port=i",
                         },
                         { constraint => {}, desc => " ", name => "ua", spec => "ua" },
                         {
                           constraint => {},
                           desc => "Location of a local CPAN mirror, looks for \$ENV{MINICPAN} and ~/CPAN",
                           name => "cpan",
                           spec => "cpan=s",
                         },
                         { constraint => {}, desc => " ", name => "quarantine", spec => "quarantine=s" },
                         {
                           constraint => {},
                           desc => "add indexing jobs to the minion queue",
                           name => "queue",
                           spec => "queue!",
                         },
                         {
                           constraint => {},
                           desc => "Prints this usage information.",
                           name => "help",
                           spec => "help|usage|?|h!",
                         },
                         { constraint => {}, desc => " ", name => "cdn_api", spec => "cdn_api" },
                         { constraint => {}, desc => " ", name => "home", spec => "home=s" },
                         { constraint => {}, desc => "Log level", name => "level", spec => "level=s" },
                         {
                           constraint => {},
                           desc => "delete index if it exists already",
                           name => "delete",
                           spec => "delete!",
                         },
                         {
                           constraint => {},
                           desc => "list available index type names",
                           name => "list_types",
                           spec => "list_types!",
                         },
                         {
                           constraint => {},
                           desc => "create a new empty index (copy mappings)",
                           name => "create_index",
                           spec => "create_index=s",
                         },
                         {
                           constraint => {},
                           desc => "update existing index (add mappings)",
                           name => "update_index",
                           spec => "update_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type mapping patches",
                           name => "patch_mapping",
                           spec => "patch_mapping=s",
                         },
                         {
                           constraint => {},
                           desc => "do NOT copy mappings other than patch_mapping",
                           name => "skip_existing_mapping",
                           spec => "skip_existing_mapping!",
                         },
                         {
                           constraint => {},
                           desc => "index to copy type to",
                           name => "copy_to_index",
                           spec => "copy_to_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type to copy",
                           name => "copy_type",
                           spec => "copy_type=s",
                         },
                         {
                           constraint => {},
                           desc => "match query (default: monthly time slices,  if provided must be a valid json query OR \"match_all\")",
                           name => "copy_query",
                           spec => "copy_query=s",
                         },
                         {
                           constraint => {},
                           desc => "reindex data from source index for exact mapping types",
                           name => "reindex",
                           spec => "reindex!",
                         },
                         {
                           constraint => {},
                           desc => "delete an existing index",
                           name => "delete_index",
                           spec => "delete_index=s",
                         },
                         {
                           constraint => {},
                           desc => "delete data from an existing type",
                           name => "delete_from_type",
                           spec => "delete_from_type=s",
                         },
                       ],
      show_defaults => undef,
    }, "Getopt::Long::Descriptive::Usage"),
  }, "MetaCPAN::Script::Mapping");
  $a->{es}{transport}{cxn_pool}{cxns}[0]{logger} = $a->{es}{logger};
  $a->{es}{transport}{cxn_pool}{cxns}[0]{serializer} = $a->{es}{logger}{serializer};
  $a->{es}{transport}{cxn_pool}{logger} = $a->{es}{logger};
  $a->{es}{transport}{cxn_pool}{serializer} = $a->{es}{logger}{serializer};
  $a->{es}{transport}{logger} = $a->{es}{logger};
  $a->{es}{transport}{serializer} = $a->{es}{logger}{serializer};
  $a->{logger}{DEBUG} = $a->{logger}{ALL};
  $a->{logger}{FATAL} = $a->{logger}{ERROR};
  $a->{logger}{INFO} = $a->{logger}{ERROR};
  $a->{logger}{OFF} = $a->{logger}{ERROR};
  $a->{logger}{TRACE} = $a->{logger}{ALL};
  $a->{logger}{WARN} = $a->{logger}{ERROR};
  $a;
}
'MetaCPAN::Script::Mapping::run' : Call on 'MetaCPAN::Script::Mapping::deploy_mapping'
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 12:54:00 I mapping: Deleting index: contributor
2021/09/18 12:54:00 I mapping: Creating index: contributor
2021/09/18 12:54:01 I mapping: Adding mapping: contributor/contributor
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 12:54:01 I mapping: Deleting index: cover
2021/09/18 12:54:02 I mapping: Creating index: cover
2021/09/18 12:54:02 I mapping: Adding mapping: cover/cover
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 12:54:02 I mapping: Deleting index: cpan_v1_01
2021/09/18 12:54:02 I mapping: Creating index: cpan_v1_01
2021/09/18 12:54:02 I mapping: Adding mapping: cpan_v1_01/author
2021/09/18 12:54:02 I mapping: Adding mapping: cpan_v1_01/distribution
2021/09/18 12:54:02 I mapping: Adding mapping: cpan_v1_01/favorite
2021/09/18 12:54:02 I mapping: Adding mapping: cpan_v1_01/file
2021/09/18 12:54:03 I mapping: Adding mapping: cpan_v1_01/mirror
2021/09/18 12:54:03 I mapping: Adding mapping: cpan_v1_01/package
2021/09/18 12:54:03 I mapping: Adding mapping: cpan_v1_01/permission
2021/09/18 12:54:03 I mapping: Adding mapping: cpan_v1_01/rating
2021/09/18 12:54:03 I mapping: Adding mapping: cpan_v1_01/release
'MetaCPAN::Script::Mapping::deploy_mapping' : Call on 'MetaCPAN::Script::Mapping::_delete_index'
2021/09/18 12:54:03 I mapping: Deleting index: user
2021/09/18 12:54:03 I mapping: Creating index: user
2021/09/18 12:54:03 I mapping: Adding mapping: user/account
2021/09/18 12:54:03 I mapping: Adding mapping: user/identity
2021/09/18 12:54:03 I mapping: Adding mapping: user/session
2021/09/18 12:54:04 I mapping: Done.
mp 1 dmp:
do {
  require Symbol;
  my $a = bless({
    _surrogate_keys_to_purge => [],
    arg_copy_type => "",
    arg_create_index => "",
    arg_delete_index => "",
    arg_deploy_mapping => 1,
    arg_list_types => 0,
    arg_update_index => "",
    ARGV => ["mapping", "--delete"],
    copy_query => "",
    copy_to_index => "",
    delete_from_type => "",
    die_on_error => 0,
    es => bless({
      api_version => "2_0",
      bulk_helper_class => "Bulk",
      indices => bless({
        api_version => "2_0",
        logger      => bless({
                         log_as       => "elasticsearch.event",
                         log_handle   => bless({
                                           adapter   => bless({ category => "elasticsearch.event" }, "Log::Any::Adapter::Null"),
                                           category  => "elasticsearch.event",
                                           context   => {},
                                           formatter => sub { ... },
                                         }, "Log::Any::Proxy::Null"),
                         serializer   => bless({
                                           JSON => bless(do{\(my $o = "")}, "Cpanel::JSON::XS"),
                                           mime_type => "application/json",
                                         }, "Search::Elasticsearch::Serializer::JSON"),
                         trace_as     => "elasticsearch.trace",
                         trace_handle => bless({
                                           adapter   => bless({ category => "elasticsearch.trace" }, "Log::Any::Adapter::Null"),
                                           category  => "elasticsearch.trace",
                                           context   => 'fix',
                                           formatter => 'fix',
                                         }, "Log::Any::Proxy::Null"),
                       }, "Search::Elasticsearch::Logger::LogAny"),
        transport   => bless({
                         cxn_pool => bless({
                           current_cxn_num => 0,
                           cxn_factory => bless({
                             _factory => sub { ... },
                             cxn_class => "Search::Elasticsearch::Cxn::HTTPTiny",
                             default_host => "http://localhost:9200",
                             max_content_length => 104857600,
                           }, "Search::Elasticsearch::Cxn::Factory"),
                           cxns => [
                             bless({
                               _pid                  => 264,
                               dead_timeout          => 60,
                               default_headers       => {},
                               default_qs_params     => {},
                               handle                => bless({
                                                          _has_proxy      => {},
                                                          agent           => "HTTP-Tiny/0.076",
                                                          default_headers => 'fix',
                                                          handle          => bless({
                                                                               fh => bless(Symbol::gensym(), "IO::Socket::IP"),
                                                                               host => "elasticsearch",
                                                                               keep_alive => 1,
                                                                               local_address => undef,
                                                                               max_header_lines => 64,
                                                                               max_line_size => 16384,
                                                                               peer => "elasticsearch",
                                                                               pid => 264,
                                                                               port => 9200,
                                                                               rbuf => "",
                                                                               scheme => "http",
                                                                               SSL_options => undef,
                                                                               tid => 0,
                                                                               timeout => 30,
                                                                               verify_SSL => 0,
                                                                             }, "HTTP::Tiny::Handle"),
                                                          keep_alive      => 1,
                                                          max_redirect    => 5,
                                                          no_proxy        => [],
                                                          timeout         => 30,
                                                          verify_SSL      => 0,
                                                        }, "HTTP::Tiny"),
                               handle_args           => {},
                               host                  => "elasticsearch",
                               is_https              => "",
                               logger                => 'fix',
                               max_content_length    => 104857600,
                               max_dead_timeout      => 3600,
                               next_ping             => 0,
                               ping_failures         => 0,
                               ping_timeout          => 2,
                               port                  => 9200,
                               request_timeout       => 30,
                               scheme                => "http",
                               serializer            => 'fix',
                               sniff_request_timeout => 2,
                               sniff_timeout         => 1,
                               uri                   => bless(do{\(my $o = "http://elasticsearch:9200")}, "URI::http"),
                               userinfo              => "",
                             }, "Search::Elasticsearch::Cxn::HTTPTiny"),
                           ],
                           logger => 'fix',
                           randomize_cxns => 1,
                           retries => 0,
                           seed_nodes => ["elasticsearch:9200"],
                           serializer => 'fix',
                         }, "Search::Elasticsearch::CxnPool::Static"),
                         logger => 'fix',
                         send_get_body_as => "GET",
                         serializer => 'fix',
                       }, "Search::Elasticsearch::Transport"),
      }, "Search::Elasticsearch::Client::2_0::Direct::Indices"),
      logger => 'fix',
      scroll_helper_class => "Scroll",
      transport => 'fix',
    }, "Search::Elasticsearch::Client::2_0::Direct"),
    extra_argv => ["mapping"],
    index => "cpan",
    level => "info",
    logger => bless({
      additivity     => 1,
      ALL            => sub { ... },
      appender_names => ["app001", "app002"],
      category       => "mapping",
      DEBUG          => 'fix',
      ERROR          => sub { ... },
      FATAL          => 'fix',
      INFO           => 'fix',
      is_ALL         => sub { ... },
      is_DEBUG       => sub { ... },
      is_ERROR       => sub { ... },
      is_FATAL       => sub { ... },
      is_INFO        => sub { ... },
      is_OFF         => sub { ... },
      is_TRACE       => sub { ... },
      is_WARN        => sub { ... },
      layout         => undef,
      level          => 20000,
      num_appenders  => 2,
      OFF            => 'fix',
      TRACE          => 'fix',
      WARN           => 'fix',
    }, "Log::Log4perl::Logger"),
    patch_mapping => "{}",
    port => 5000,
    proxy => "",
    queue => 0,
    reindex => 0,
    skip_existing_mapping => 0,
    usage => bless({
      leader_text   => "usage: metacpan [-?h] [long options...]",
      options       => [
                         {
                           constraint => {},
                           desc => "Elasticsearch http connection string",
                           name => "es",
                           spec => "es",
                         },
                         {
                           constraint => {},
                           desc => "Index to use, defaults to \"cpan\" (when used: also export ES_SCRIPT_INDEX)",
                           name => "index",
                           spec => "index=s",
                         },
                         {
                           constraint => {},
                           desc => "Die on errors instead of simply logging",
                           name => "die_on_error",
                           spec => "die_on_error!",
                         },
                         {
                           constraint => {},
                           desc => " ",
                           name => "cdn_services",
                           spec => "cdn_services",
                         },
                         { constraint => {}, desc => " ", name => "proxy", spec => "proxy=s" },
                         {
                           constraint => {},
                           desc => "Port for the proxy, defaults to 5000",
                           name => "port",
                           spec => "port=i",
                         },
                         { constraint => {}, desc => " ", name => "ua", spec => "ua" },
                         {
                           constraint => {},
                           desc => "Location of a local CPAN mirror, looks for \$ENV{MINICPAN} and ~/CPAN",
                           name => "cpan",
                           spec => "cpan=s",
                         },
                         { constraint => {}, desc => " ", name => "quarantine", spec => "quarantine=s" },
                         {
                           constraint => {},
                           desc => "add indexing jobs to the minion queue",
                           name => "queue",
                           spec => "queue!",
                         },
                         {
                           constraint => {},
                           desc => "Prints this usage information.",
                           name => "help",
                           spec => "help|usage|?|h!",
                         },
                         { constraint => {}, desc => " ", name => "cdn_api", spec => "cdn_api" },
                         { constraint => {}, desc => " ", name => "home", spec => "home=s" },
                         { constraint => {}, desc => "Log level", name => "level", spec => "level=s" },
                         {
                           constraint => {},
                           desc => "delete index if it exists already",
                           name => "delete",
                           spec => "delete!",
                         },
                         {
                           constraint => {},
                           desc => "list available index type names",
                           name => "list_types",
                           spec => "list_types!",
                         },
                         {
                           constraint => {},
                           desc => "create a new empty index (copy mappings)",
                           name => "create_index",
                           spec => "create_index=s",
                         },
                         {
                           constraint => {},
                           desc => "update existing index (add mappings)",
                           name => "update_index",
                           spec => "update_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type mapping patches",
                           name => "patch_mapping",
                           spec => "patch_mapping=s",
                         },
                         {
                           constraint => {},
                           desc => "do NOT copy mappings other than patch_mapping",
                           name => "skip_existing_mapping",
                           spec => "skip_existing_mapping!",
                         },
                         {
                           constraint => {},
                           desc => "index to copy type to",
                           name => "copy_to_index",
                           spec => "copy_to_index=s",
                         },
                         {
                           constraint => {},
                           desc => "type to copy",
                           name => "copy_type",
                           spec => "copy_type=s",
                         },
                         {
                           constraint => {},
                           desc => "match query (default: monthly time slices,  if provided must be a valid json query OR \"match_all\")",
                           name => "copy_query",
                           spec => "copy_query=s",
                         },
                         {
                           constraint => {},
                           desc => "reindex data from source index for exact mapping types",
                           name => "reindex",
                           spec => "reindex!",
                         },
                         {
                           constraint => {},
                           desc => "delete an existing index",
                           name => "delete_index",
                           spec => "delete_index=s",
                         },
                         {
                           constraint => {},
                           desc => "delete data from an existing type",
                           name => "delete_from_type",
                           spec => "delete_from_type=s",
                         },
                       ],
      show_defaults => undef,
    }, "Getopt::Long::Descriptive::Usage"),
  }, "MetaCPAN::Script::Mapping");
  $a->{es}{indices}{logger}{trace_handle}{context} = $a->{es}{indices}{logger}{log_handle}{context};
  $a->{es}{indices}{logger}{trace_handle}{formatter} = $a->{es}{indices}{logger}{log_handle}{formatter};
  $a->{es}{indices}{transport}{cxn_pool}{cxns}[0]{handle}{default_headers} = $a->{es}{indices}{transport}{cxn_pool}{cxns}[0]{default_headers};
  *{$a->{es}{indices}{transport}{cxn_pool}{cxns}[0]{handle}{handle}{fh}} = {
    io_socket_domain         => 2,
    io_socket_ip_blocking    => 1,
    io_socket_ip_errors      => [undef, undef, undef],
    io_socket_ip_idx         => 0,
    io_socket_ip_infos       => [
                                  {
                                    family    => 2,
                                    localaddr => undef,
                                    peeraddr  => "\2\0#\xF0\xAC\26\0\2\0\0\0\0\0\0\0\0",
                                    protocol  => 6,
                                    socktype  => 1,
                                  },
                                ],
    io_socket_ip_listenqueue => undef,
    io_socket_ip_sockopts    => [],
    io_socket_ip_v6only      => undef,
    io_socket_proto          => 6,
    io_socket_timeout        => 30,
    io_socket_type           => 1,
  };
  $a->{es}{indices}{transport}{cxn_pool}{cxns}[0]{logger} = $a->{es}{indices}{logger};
  $a->{es}{indices}{transport}{cxn_pool}{cxns}[0]{serializer} = $a->{es}{indices}{logger}{serializer};
  $a->{es}{indices}{transport}{cxn_pool}{logger} = $a->{es}{indices}{logger};
  $a->{es}{indices}{transport}{cxn_pool}{serializer} = $a->{es}{indices}{logger}{serializer};
  $a->{es}{indices}{transport}{logger} = $a->{es}{indices}{logger};
  $a->{es}{indices}{transport}{serializer} = $a->{es}{indices}{logger}{serializer};
  $a->{es}{logger} = $a->{es}{indices}{logger};
  $a->{es}{transport} = $a->{es}{indices}{transport};
  $a->{logger}{DEBUG} = $a->{logger}{ALL};
  $a->{logger}{FATAL} = $a->{logger}{ERROR};
  $a->{logger}{INFO} = $a->{logger}{ERROR};
  $a->{logger}{OFF} = $a->{logger}{ERROR};
  $a->{logger}{TRACE} = $a->{logger}{ALL};
  $a->{logger}{WARN} = $a->{logger}{ERROR};
  $a;
}
Mapping: delete done.
ElasticSearch 'elasticsearch:9200': Indices showing ...
* Expire in 0 ms for 6 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 1 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 2 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
* Expire in 0 ms for 1 (transfer 0x55d920680dc0)
*   Trying 172.22.0.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55d920680dc0)
* Connected to elasticsearch (172.22.0.2) port 9200 (#0)
> GET /_cat/indices HTTP/1.1
> Host: elasticsearch:9200
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 172
< 
yellow open cover       1 1 0 0 130b 130b 
yellow open cpan_v1_01  1 1 0 0 130b 130b 
yellow open contributor 1 1 0 0 130b 130b 
yellow open user        1 1 0 0 130b 130b 
* Connection #0 to host elasticsearch left intact

And actually I can observe now that the packages are correctly imported:

Mapping: rebuilding ...
2021/09/18 12:54:38 I release: Looking for archives in /CPAN/authors/id/
2021/09/18 12:54:38 I release: 808 archives found
# [...]
2021/09/18 13:32:33 I latest: Upgrading release CatalystX-Controller-ExtJS-REST-SimpleExcel-0.1.1
2021/09/18 13:32:33 I latest: Upgrading release WebService-Speechmatics-0.02
2021/09/18 13:32:51 I author: Indexing 14156 authors
2021/09/18 13:33:11 I author: done
2021/09/18 13:35:31 I permission: finished indexing 06perms

$ docker-compose exec api curl -v 'elasticsearch:9200/_cat/indices'
* Expire in 0 ms for 6 (transfer 0x55fcc44d1dc0)
* Expire in 1 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 1 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 1 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 2 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 2 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 2 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
* Expire in 0 ms for 1 (transfer 0x55fcc44d1dc0)
*   Trying 172.22.0.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55fcc44d1dc0)
* Connected to elasticsearch (172.22.0.2) port 9200 (#0)
> GET /_cat/indices HTTP/1.1
> Host: elasticsearch:9200
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 232
< 
yellow open cover       1 1      0     0    159b    159b 
yellow open cpan_v1_01  1 1 402391 19624 160.6mb 160.6mb 
yellow open contributor 1 1      0     0    159b    159b 
yellow open user        1 1      0     0    159b    159b 
* Connection #0 to host elasticsearch left intact

I suppose that creating the docker-compose services without starting them with the --no-start as seen in my test case should create the ElasticSearch indices correctly.

  mkdir -p cpan src ; chmod a+w cpan -R ; cd src
  git clone https://github.com/metacpan/metacpan-web.git metacpan-web
  git clone https://github.com/metacpan/metacpan-api.git metacpan-api 
  git clone https://github.com/metacpan/metacpan-grep-front-end.git metacpan-grep-front-end
  git clone https://github.com/metacpan/metacpan-cpan-extracted-lite.git metacpan-cpan-extracted-lite
  ln -s metacpan-cpan-extracted-lite metacpan-cpan-extracted
  cd ../
  docker volume create --driver local --opt device=:$(pwd)/src/metacpan-cpan-extracted --opt o=bind --opt type=none metacpan_git_shared
  echo "Docker: Volumes listing ..."
  docker volume ls
  docker volume inspect metacpan_git_shared
  echo "Docker: Images building ..."
  docker-compose up --build --no-start traefik
  docker-compose up --build --no-start elasticsearch
  docker-compose up --build --no-start pgdb
  docker-compose up --build --no-start api

But strangely this does not explain why in the test case there are errors in the indexing and it is not correctly indexed.
I can only suspect that this might have to do with the description in MetaCPAN::Script::Mapping::usage:

                         {
                           constraint => {},
                           desc => "Index to use, defaults to \"cpan\" (when used: also export ES_SCRIPT_INDEX)",
                           name => "index",
                           spec => "index=s",
                         },

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Sep 19, 2021

The Test Result differs drastically from the expected result but effectively illustrates the issue which is discussed here:
The Report 8_Run the Elastic Search Indexation.txt reads:

2021-09-19T10:55:41.3629113Z Indexation: Indices showing ...^M
2021-09-19T10:55:42.2513168Z * Expire in 0 ms for 6 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2521020Z * Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2581731Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current^M
2021-09-19T10:55:42.2582528Z                                  Dload  Upload   Total   Spent    Left  Speed^M
2021-09-19T10:55:42.2583007Z ^M
2021-09-19T10:55:42.2584269Z   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2585320Z * Expire in 2 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2586161Z * Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2586718Z * Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2615350Z * Expire in 2 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2615966Z * Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2616525Z * Expire in 1 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2617091Z * Expire in 4 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2617649Z * Expire in 2 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2618221Z * Expire in 2 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2626361Z * Expire in 3 ms for 1 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2631123Z *   Trying 172.21.0.2...^M
2021-09-19T10:55:42.2635704Z * TCP_NODELAY set^M
2021-09-19T10:55:42.2658608Z * Expire in 200 ms for 4 (transfer 0x560a26987f50)^M
2021-09-19T10:55:42.2659529Z * Connected to elasticsearch (172.21.0.2) port 9200 (#0)^M
2021-09-19T10:55:42.2660211Z > GET /_cat/indices HTTP/1.1^M
2021-09-19T10:55:42.2660693Z > Host: elasticsearch:9200^M
2021-09-19T10:55:42.2661774Z > User-Agent: curl/7.64.0^M
2021-09-19T10:55:42.2662195Z > Accept: */*^M
2021-09-19T10:55:42.2662542Z > ^M
2021-09-19T10:55:42.2989630Z < HTTP/1.1 200 OK^M
2021-09-19T10:55:42.2996243Z < Content-Type: text/plain; charset=UTF-8^M
2021-09-19T10:55:42.3002138Z < Content-Length: 45^M
2021-09-19T10:55:42.3007647Z < ^M
2021-09-19T10:55:42.3039510Z { [45 bytes data]^M
2021-09-19T10:55:42.3040014Z yellow open cpan 5 1 319856 0 64.6mb 64.6mb ^M
2021-09-19T10:55:42.3040391Z ^M
2021-09-19T10:55:42.3080941Z 100    45  100    45    0     0    918      0 --:--:-- --:--:-- --:--:--   937^M
2021-09-19T10:55:42.3082429Z * Connection #0 to host elasticsearch left intact^M

That way the Definition of a successful test would be:

  • index cpan_v1_01 > 0
  • alias cpan -> cpan_v1_01

as seen in the local environment:

$ export ES_PORT=32768
$ curl "localhost:$ES_PORT/_cat/indices"
yellow open cover       1 1      0     0    159b    159b 
yellow open cpan_v1_01  1 1 402391 19624 160.6mb 160.6mb 
yellow open contributor 1 1      0     0    159b    159b 
yellow open user        1 1      0     0    159b    159b
$ curl "localhost:$ES_PORT/_cat/aliases"
cpan cpan_v1_01 - - - 

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Sep 19, 2021

I was wondering why the activity message don't appear in the log in all the test runs.
I suspected a Crash happening in the Mapping Process in MetaCPAN::Script::Mapping::deploy_mapping().
So I inserted it bit of success check logic in index-cpan.sh.
Then as seen in the test run
test run with success check
the command ./bin/run bin/metacpan mapping --delete actually turned out to crash in MetaCPAN::Script::Mapping::deploy_mapping()

Indexation: starting ...
Mapping: deleting ...
Mapping: Delete finished with [111]
Mapping: Delete Log:
'[NoNodes] ** No nodes are available: [http://elasticsearch:9200], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /metacpan-api/bin/../lib/MetaCPAN/Script/Mapping.pm line 448.MetaCPAN::Script::Mapping::run - go ...
'MetaCPAN::Script::Mapping::run' : Call on 'MetaCPAN::Script::Mapping::deploy_mapping''
Mapping: Delete failed!
Error: Process completed with exit code 111.

111 - is the Linux Exit Code for "connection refused"
So the Mapping had failed because the connection to ElasticSearch had failed.

To confirm this I inserted in the test a check of the ElasticSearch Service before starting the Mapping
as seen in the test run:
check ElasticSearch before starting the Mapping
the expected output should be as seen in local development:

$ export ES_PORT=32768
$ curl "localhost:$ES_PORT" 2>&1
{
  "name" : "Tom Corsi",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "lyYO-WwMTOWud7zkAVmhdA",
  "version" : {
    "number" : "2.4.6",
    "build_hash" : "5376dca9f70f3abef96a77f4bb22720ace8240fd",
    "build_timestamp" : "2017-07-18T12:17:44Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.4"
  },
  "tagline" : "You Know, for Search"
}

but in the test run the check failed with:

ElasticSearch Port: '49153'
ElasticSearch Version:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:49153...
* TCP_NODELAY set
* Connected to localhost (::1) port 49153 (#0)
> GET / HTTP/1.1
> Host: localhost:49153
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Recv failure: Connection reset by peer

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Error: Process completed with exit code 56.

Which confirms the Conclusion that ElasticSearch is not ready for Connections yet.

Therefore when the Indexation starts to import the Packages into the ElasticSearch service the service has become ready for connections but there is NO Index created yet.

So, where does the infamous Index cpan come from?
The ElasticSearch index is commonly referenced throughout the Project as "cpan" which should be an Alias for the actual Index cpan_v1_01 but since the Mapping has failed there isn't any alias nor index with that name.
Therefore ElasticSearch auto-generates the cpan Index from the first document that is introduced.
This happens to be a file from the Apache-AuthCookieNTLM but which is incomplete and thus the Auto-generated cpan Index has a corrupt structure.

Understanding this scenario the steps to reproduce it can be describe as follows:

  1. delete all indices manually from within the docker cluster (as described above)
  2. shutdown the cluster with docker-compose down
  3. startup the cluster and run the indexation immediately with docker-compose up -d api ; docker-compose exec api index-cpan.sh ;

Still this might depend strongly on the computing capacity of the local desktop and the disk speed.

On the other hand steps for a Manual Workaround for local development could be:

  1. delete all indices manually from within the docker cluster (as described above)
  2. no shutdown necessary here
  3. startup the cluster (if needed) as normal and describe in the "Get Started" Manual docker-compose up -d api
  4. check manually the availability of the ElasticSearch service (as described above) docker-compose exec api curl 'elasticsearch:9200'
  5. only when the availability check succeeds proceed to the indexation step with docker-compose exec api index-cpan.sh

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Sep 19, 2021

Now for the Automated Test to succeed and to close this issue definitely there are some New Features needed.
But as this issue comes to explain they are crucial to avoid accidental Index Corruption.

  • An Availabity Check for ElasticSearch before starting the Index Creation.
  • a Success Check for the Mapping Script (and for the rest of scripts too)
  • on --delete Option a logic that searches all indexes and deletes them before creating new ones in MetaCPAN::Script::Mapping::deploy_mapping() perhaps in combination with an --all option like --delete --all
  • a Integrity Check which checks if all Indices are created correctly after concluding the Index Creation
  • a Integrity Check before starting the Import to avoid the creation of corrupted Indices.
  • a --yes Option to skip all are_you_sure() calls

@oalders
Copy link
Member

oalders commented Sep 20, 2021

Is it possible that https://github.com/metacpan/metacpan-docker/blob/master/docker-compose.yml#L165 is not performing as expected?

@bodo-hugo-barwich
Copy link
Contributor

Looking at the Test Logs it shows that it actually works:
Message "Elastic Search is unavailable"
but only for the api service
which I think will be unavailable too at the time the index-cpan.sh script starts.
api service unresponsive
Certainly it is the -d option for docker-compose which makes it return before the Start-Up Sequences is actually finished.
The index-cpan.sh script does need the same logic as well to work correctly.
Calling it at the beginning of the index-cpan.sh script could actually help.

@oalders
Copy link
Member

oalders commented Sep 20, 2021

Right, good call. That's the same thing we're doing to work around -d in the CircleCI config: https://github.com/metacpan/metacpan-docker/blob/master/.circleci/config.yml#L31-L37

@bodo-hugo-barwich
Copy link
Contributor

I think the script in the command and which is also used in the project is quite ok.
ElasticSearch - known for slow startup
but it does not feature a "timeout" like promised in the comment.
In case the ElasticSearch has any issue the script will hang on calling it forever and will not proceed to do the incident analysis and report.

As another interesting feature I would also report the timestamp when it became ready or when the timeout was reached to be able to check it against the reports from other dependent components as seen now in this incident.

@oalders
Copy link
Member

oalders commented Sep 20, 2021

Those features sound good. If you'd like to implement them, that would be great.

@bodo-hugo-barwich
Copy link
Contributor

So, for the first sprint I would work on wait-for-es.sh and including it in index-cpan.sh and also introducing the --yes option in MetaCPAN::Script::Mapping. This will get most developers already running and the integration test succeed.
Next the other features I presented above will add an interesting level of resilience and observability to the project although they might not be seen that obviously.

@oalders
Copy link
Member

oalders commented Sep 21, 2021

Sounds great!

@bodo-hugo-barwich
Copy link
Contributor

When I working on the script wait-for-es.sh I found that it makes the ElasticSearch Service Availability depending on the General Health State not to be "red" but when doing the indexation test I found that this state happens when the indexation process is forcefully interrupted. Then all indices enter "red" state and the cluster as well.
But once entered this state it can only be healed re-creating the indices correctly.
So, Cluster Health State cannot be a condition for Cluster Availability.

Furthermore I found that the Index Health State information is of crucial importance to perform any self-checks in MetaCPAN::Role::Script. As this current thread documents the healthy index is a requirement for any further operations on the ElasticSearch cluster. So, a self-check can ensure the correct Index Creation.

To implement the self-check I used the functionality provided by the Search::Elasticsearch::Client library
Search::Elasticsearch::Client::2_0::Direct::ping()
Search::Elasticsearch::Client::2_0::Direct::cat()
and stored the information in MetaCPAN::Role::Script::cluster_info, MetaCPAN::Role::Script::indices_info and MetaCPAN::Role::Script::aliases_info to make the functionality available in different applications derived from it.

To prevent an index corruption the script index-cpan.sh checks the result of the MetaCPAN::Script::Mapping application and interrupts the process on any Failure Exit Code.

To ease up the recovery of broken indices for Users as explained before the index-cpan.sh accepts now parameters and passes them on to the Perl applications.
So, the parameter --all can be used to delete ALL indices from the ElasticSearch cluster.
A broken cluster can be fixed with:

$ docker-compose up -d api
$ docker-compose exec api index-cpan.sh --all

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Oct 31, 2021

As document the Test Reports the ElasticSearch Engine needs about 2 - 5 seconds to start up depending on the host capabilities:

2021/10/31 12:44:58 I mapping: Awaiting Elasticsearch ...
2021/10/31 12:44:58 I mapping: Awaiting 0 / 15 : unavailable - sleeping ...
2021/10/31 12:44:59 I mapping: Awaiting 1 / 15 : unavailable - sleeping ...
2021/10/31 12:45:00 I mapping: Awaiting 2 / 15 : ready

The default timeout parameter is set to 15 seconds which seems to be a generous default according to the documented measurements.

In case the ElasticSearch Engine cannot start up the Perl Application with set the Error Code 112

$ errno -l |sed -re 's/^([a-z0-9]+) ([0-9]+) (.*)$/\2 - \1 - \3/i' |sort -n|grep -i host
112 - EHOSTDOWN - Host is down

and re-throw the Exception from the ElasticSearch Perl Client:

MetaCPAN::Role::Script::await - go ...
ping 0 ex dmp:
bless({
  msg   => "[NoNodes] ** No nodes are available: [http://elasticsearch:9200], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /metacpan-api/bin/../lib/MetaCPAN/Role/Script.pm line 315.",
  stack => [
             [
               "MetaCPAN::Role::Script",
               "/metacpan-api/bin/../lib/MetaCPAN/Role/Script.pm",
               315,
               "Search::Elasticsearch::Role::Client::Direct::__ANON__",
             ],
           ],
  text  => "No nodes are available: [http://elasticsearch:9200]",
  type  => "NoNodes",
  vars  => undef,
}, "Search::Elasticsearch::Error::NoNodes")

As seen with:

$ docker-compose up -d api
Creating network "metacpan_traefik-network" with the default driver
Creating network "metacpan_default" with the default driver
Creating network "metacpan_web-network" with the default driver
Creating network "metacpan_elasticsearch" with the default driver
Creating network "metacpan_database" with the default driver
Creating network "metacpan_mongo" with the default driver
Creating metacpan_traefik_1       ... done
Creating metacpan_pgdb_1          ... done
Creating metacpan_elasticsearch_1 ... done
Creating metacpan_api_1           ... done
$ docker-compose exec api index-cpan.sh --await 0
2021-10-31 13:55:24 I index-cpan.sh: Indices re-creating ...
2021-10-31 13:55:29 I index-cpan.sh: Re-creation finished with [112]
2021-10-31 13:55:29 I index-cpan.sh: Re-creation Log:
'[NoNodes] ** No nodes are available: [http://elasticsearch:9200], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /metacpan-api/bin/../lib/MetaCPAN/Role/Script.pm line 309.'
2021-10-31 13:55:29 E index-cpan.sh: Re-creation failed!

@ssoriche
Copy link
Contributor

What if instead of having to have everyone interested in doing development build the test image, we just create an elastic search test image that already contains the index? It's small enough that it isn't going to be a problem for image size/download speed. The the index would be updated with every merge by the build process.

@bodo-hugo-barwich
Copy link
Contributor

bodo-hugo-barwich commented Oct 31, 2021

According to the numbers that have been documented 823 Packages from 14165 Authors produce 406137 Documents in ElasticSearch . So, I suspect that this approach is not really scalable.

2021/10/31 12:45:19 I release: 823 archives found
2021/10/31 12:53:30 I author: Indexing 14165 authors
2021/10/31 12:53:38 I author: done
2021/10/31 12:54:16 I permission: finished indexing 06perms
Indexation: Document Count '406137'

Also, as for development purposes it is of great value to have tools to be able to introduce and inspect just any Package published on MetaCPAN into the development enviroment.
That enables the developers to reproduce just any issue observed in the production environment.

@ssoriche
Copy link
Contributor

With that many documents for testing we're likely indexing too much. My goal is to lower the bar to entry and if developers are going to get stuck loading the index so that they can run a simple test, I'd like to eliminate that as much as possible.

Tools as you describe would still be able to be used and I agree they provide tremendous value to the development process for the developers that need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants