Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github actions: run integration test on cassandra #339

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

fruch
Copy link

@fruch fruch commented Jun 24, 2024

to make sure we don't break compitability completly with cassandra introducing tests with one version of cassandra as well

@fruch fruch force-pushed the integration_test_cassandra branch from 4a5dea2 to 7aeeef4 Compare June 24, 2024 21:55
@fruch
Copy link
Author

fruch commented Jul 4, 2024

we need to skip the serverless tests, cause they are scylla specific

@fruch fruch force-pushed the integration_test_cassandra branch from 7aeeef4 to ac140e1 Compare July 4, 2024 10:41
'user_defined_functions_enabled': True,
'scripted_user_defined_functions_enabled': True,
'materialized_views_enabled': True,
'sasi_indexes_enabled': True,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to enable sasi_indexes_enabled ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a clue, it's just same as the driver test was doing before for configuring cassandra, but in a new syntax.

@fruch
Copy link
Author

fruch commented Jul 4, 2024

here's the failing tests:

2024-07-04T11:52:49.6219047Z FAILED tests/integration/standard/test_custom_protocol_handler.py::CustomProtocolHandlerTest::test_protocol_divergence_v5_fail_by_continuous_paging - Failed: Unexpected success
2024-07-04T11:52:49.6219804Z FAILED tests/integration/standard/test_custom_protocol_handler.py::CustomProtocolHandlerTest::test_protocol_divergence_v5_fail_by_flag_uses_int - Failed: Unexpected success
2024-07-04T11:52:49.6220556Z FAILED tests/integration/standard/test_custom_protocol_handler.py::CustomProtocolHandlerTest::test_protocol_v5_uses_flag_int - Failed: Unexpected success
2024-07-04T11:52:49.6221541Z FAILED tests/integration/standard/test_ip_change.py::TestIpAddressChange::test_change_address_during_live_session - ccmlib.node.TimeoutError: 04 Jul 2024 11:34:19 [node3] Missing: ['Starting listening for CQL clients']:
2024-07-04T11:52:49.6221750Z INFO  [main] 2024-07-04 11:24:20,202 YamlConfigura.....
2024-07-04T11:52:49.6221851Z See system.log for remainder
2024-07-04T11:52:49.6222250Z FAILED tests/integration/standard/test_metadata.py::TestCodeCoverage::test_case_sensitivity
2024-07-04T11:52:49.6222548Z FAILED tests/integration/standard/test_metadata.py::TestCodeCoverage::test_replicas
2024-07-04T11:52:49.6223767Z FAILED tests/integration/standard/test_rate_limit_exceeded.py::TestRateLimitExceededException::test_rate_limit_exceeded - cassandra.protocol.SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="Unknown property 'per_partition_rate_limit'">
2024-07-04T11:52:49.6226136Z FAILED tests/integration/standard/test_types.py::TypeTests::test_insert_collection_with_null_fails - cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.0.0.2:9042 dc1>: <Error from server: code=0000 [Server error] message="java.lang.NullPointerException">, <Host: 127.0.0.3:9042 dc1>: <Error from server: code=0000 [Server error] message="java.lang.NullPointerException">, <Host: 127.0.0.1:9042 dc1>: <Error from server: code=0000 [Server error] message="java.lang.NullPointerException">})
  • test_rate_limit_exceeded : should be scylla only
  • test_custom_protocol_handler: seems like they are now passing (guessing we are missing updates from upstream about that)
  • test_ip_change: using scylla specific command line arguments
  • test_metadata.py::TestCodeCoverage::test_replicas :test_case_sensitivity : not clear what's the failure maybe xfail ?

@fruch fruch force-pushed the integration_test_cassandra branch 3 times, most recently from fe5d19e to 83dcd61 Compare July 7, 2024 09:27
@fruch fruch force-pushed the integration_test_cassandra branch 2 times, most recently from 6343573 to c68f40c Compare November 6, 2024 11:10
@fruch
Copy link
Author

fruch commented Nov 6, 2024

last failures on cassandra runs:

FAILED tests/integration/standard/test_cluster.py::ClusterTests::test_refresh_schema_no_wait - cassandra.DriverException: Schema metadata was not refreshed. See log for details.
FAILED tests/integration/standard/test_metadata.py::MetadataTimeoutTest::test_timeout - AssertionError: query `SELECT * FROM system.peers_v2` does not contain `USING TIMEOUT 2000ms`

@fruch fruch force-pushed the integration_test_cassandra branch 2 times, most recently from 99a37c9 to ab2b233 Compare November 13, 2024 16:15
@fruch
Copy link
Author

fruch commented Nov 13, 2024

asyncio:

2024-11-13T17:20:52.1674462Z FAILED tests/integration/standard/test_cluster.py::TestAddressTranslation::test_address_translator_basic - cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionShutdown('CRC mismatch on header 8cd102. Received 8556", computed cc0464.')})
2024-11-13T17:20:13.9750646Z FAILED tests/integration/standard/test_cluster.py::ClusterTests::test_refresh_schema - cassandra.DriverException: Schema metadata was not refreshed. See log for details.
2024-11-13T17:20:13.9751429Z FAILED tests/integration/standard/test_cluster.py::HostStateTest::test_down_event_with_active_connection - AssertionError: False is not true : Host was not up on iteration 0
202

libev and asyncore:

2024-11-13T17:19:27.9248641Z >           start_cluster_wait_for_up(ccm_cluster)
2024-11-13T17:19:27.9248893Z 
2024-11-13T17:19:27.9249081Z tests/integration/standard/test_authentication.py:44: 
2024-11-13T17:19:27.9249561Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2024-11-13T17:19:27.9250073Z tests/integration/__init__.py:501: in start_cluster_wait_for_up
2024-11-13T17:19:27.9250608Z     cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
2024-11-13T17:19:27.9251233Z .test-venv/lib/python3.10/site-packages/ccmlib/cluster.py:489: in start
2024-11-13T17:19:27.9252054Z     p = node.start(update_pid=False, jvm_args=jvm_args, profile_options=profile_options, verbose=verbose, quiet_start=quiet_start)
2024-11-13T17:19:27.9253011Z .test-venv/lib/python3.10/site-packages/ccmlib/node.py:590: in start
2024-11-13T17:19:27.9253470Z     common.check_socket_available(itf)
2024-11-13T17:19:27.9254063Z .test-venv/lib/python3.10/site-packages/ccmlib/common.py:774: in check_socket_available
2024-11-13T17:19:27.9254606Z     pid = pid_listening_on(addr, port)
2024-11-13T17:19:27.9255003Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2024-11-13T17:19:27.9255325Z 
2024-11-13T17:19:27.9255453Z addr = '127.0.0.1', port = 7000
2024-11-13T17:19:27.9255651Z 
2024-11-13T17:19:27.9255784Z     def pid_listening_on(addr: str, port: int):
2024-11-13T17:19:27.9256259Z         '''get pid of the process listening on the given port'''
2024-11-13T17:19:27.9256645Z         pid = -1
2024-11-13T17:19:27.9256921Z         if lsof := shutil.which('lsof'):
2024-11-13T17:19:27.9257415Z             result = subprocess.run([lsof, '-t', '-i', f'tcp@{addr}:{port}'],
2024-11-13T17:19:27.9257890Z                                     stdout=subprocess.PIPE,
2024-11-13T17:19:27.9258264Z                                     check=True)
2024-11-13T17:19:27.9258584Z             if result.stdout:
2024-11-13T17:19:27.9258866Z >               pid = int(result.stdout)
2024-11-13T17:19:27.9259420Z E               ValueError: invalid literal for int() with base 10: b'33926\n33950\n33975\n'
2024-11-13T17:19:27.9259804Z 
2024-11-13T17:19:27.9260108Z .test-venv/lib/python3.10/site-packages/ccmlib/common.py:741: ValueError
2024-11-13T17:19:27.9260724Z _________ ERROR at setup of AuthenticationTests.test_connect_wrong_pwd _________

the last one seems like leftover from the first run session

@fruch fruch force-pushed the integration_test_cassandra branch 2 times, most recently from 0a46a34 to 4536002 Compare November 14, 2024 08:42
@fruch
Copy link
Author

fruch commented Nov 14, 2024

multiple issues identified

one addressed in 4536002

and one in ccm (a by product of the first issue):
scylladb/scylla-ccm#626

@fruch
Copy link
Author

fruch commented Nov 14, 2024

now we have 1 failing across the board:

FAILED tests/integration/standard/test_query.py::LightweightTransactionTests::test_no_connection_refused_on_timeout - AssertionError: Unexpected exception ErrorMessage: CAS operation result is unknown - proposal accepted by 1 but not a quorum.

and those in some:

FAILED tests/integration/standard/test_rack_aware_policy.py::RackAwareRoundRobinPolicyTests::test_rack_aware - AssertionError: <cassandra.cluster.ResultSet object at 0x7f6fac5d2950> != [(0, 0, 0)]
FAILED tests/integration/standard/test_udts.py::UDTTests::test_can_insert_prepared_registered_udts - cassandra.cluster.UserTypeDoesNotExist: User type user does not exist in keyspace udttests
FAILED tests/integration/standard/test_cluster.py::HostStateTest::test_down_event_with_active_connection - AssertionError: False is not true : Host was not up on iteration 0

@fruch fruch force-pushed the integration_test_cassandra branch from dd65e7a to 374e0c7 Compare November 14, 2024 13:49
@fruch
Copy link
Author

fruch commented Nov 14, 2024

left with asyncio, not being too stable:

FAILED tests/integration/standard/test_cluster.py::HostStateTest::test_down_event_with_active_connection - AssertionError: False is not true : Host was not up on iteration 0
FAILED tests/integration/standard/test_metadata.py::SchemaMetadataTests::test_refresh_schema_metadata - cassandra.DriverException: Schema metadata was not refreshed. See log for details.
FAILED tests/integration/standard/test_connection.py::HeartbeatTest::test_heart_beat_timeout - AssertionError: None is not true
FAILED tests/integration/standard/test_metadata.py::SchemaMetadataTests::test_refresh_schema_metadata - cassandra.DriverException: Schema metadata was not refreshed. See log for details.
FAILED tests/integration/standard/test_metadata.py::SchemaMetadataTests::test_table_extensions - KeyError: 'test_table_extensionsview'
FAILED tests/integration/standard/test_metadata.py::SchemaMetadataTests::test_metadata_pagination_keyspaces - cassandra.DriverException: Schema metadata was not refreshed. See log for details.
FAILED tests/integration/standard/test_metadata.py::SchemaMetadataTests::test_refresh_schema_metadata - cassandra.DriverException: Schema metadata was not refreshed. See log for details.

@fruch
Copy link
Author

fruch commented Nov 14, 2024

@dkropachev @Lorak-mmk

seems like the only lasting issue is with asyncio, maybe we leave it out from test with cassandra ?

@fruch fruch marked this pull request as ready for review November 14, 2024 17:13
run: |
export EVENT_LOOP_MANAGER=${{ matrix.event_loop_manager }}
export SCYLLA_VERSION='release:5.1'
./scripts/run_integration_test.sh tests/integration/standard/ tests/integration/cqlengine/

- name: Test tablets
- name: Test with scylla - tablets
run: |
export EVENT_LOOP_MANAGER=${{ matrix.event_loop_manager }}
export SCYLLA_VERSION='release:6.0.2'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change it to 6.2.0, it is way faster then 6.0.2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to this PR

@@ -101,7 +101,7 @@ def test_get_control_connection_host(self):
self.assertNotEqual(host, new_host)

# TODO: enable after https://github.com/scylladb/python-driver/issues/121 is fixed
@unittest.skip('Fails on scylla due to the broadcast_rpc_port is None')
@pytest.mark.skip('Fails on scylla due to the broadcast_rpc_port is None')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference, why did you replace it ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in the commit message, unittest.skip doesn't skip the setup/teardown code when using pytest as test runner, hence creating the cluster even for skipped tests.

for some of those test it was an issue, since the cassandra couldn't create the cluster.

@@ -16,6 +16,7 @@ def setup_module():
use_cluster('test_ip_change', [3], start=True)

@local
@scylla_only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some string or comment that will describe reason why it is only supported by scylla ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like test is setting api_address which isn't legal for cassandra
I'll try fixing the test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just removing it, works fine of both scylla and cassnadra


LOGGER = logging.getLogger(__name__)

def setup_module():
use_multidc({'DC1': {'RC1': 2, 'RC2': 2}, 'DC2': {'RC1': 3}})

# cassandra is failing in a weird way:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create an issue of that and link it in a comment

@@ -1207,7 +1207,7 @@ def test_export_keyspace_schema_udts(self):
cluster.shutdown()

@greaterthancass21
@pytest.mark.xfail(reason='Column name in CREATE INDEX is not quoted. It\'s a bug in driver or in Scylla')
@xfail_scylla(reason='Column name in CREATE INDEX is not quoted. It\'s a bug in driver or in Scylla')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create and issue (in the driver repo) for that and reference it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1277,13 +1277,13 @@ def test_already_exists_exceptions(self):
cluster.shutdown()

@local
@pytest.mark.xfail(reason='AssertionError: \'RAC1\' != \'r1\' - probably a bug in driver or in Scylla')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefere to create an issue and investigate what is going on here, why on scylla driver get's different rack name ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ccm code

scylla default to GossipingPropertyFileSnitch and the logic there uses RAC1
cassandra default to PropertyFileSnitch and the logic default to r1

we can align it on ccm, if you insist

@@ -1332,7 +1332,7 @@ def test_token(self):
self.assertEqual(expected_node_count, len(tmap.ring))
cluster.shutdown()


@scylla_only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have explanation why it is ran only on scylla ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment, basically seems this feature didn't cover cassandra specific queries (i.e. peers_v2)


LOGGER = logging.getLogger(__name__)

def setup_module():
use_cluster('rate_limit', [3], start=True)

@scylla_only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have explanation why to run it only on scylla ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RateLimitExceededException is a scylla only feature... it's error only scylla generate, and cassandra doesn't

@@ -45,7 +46,7 @@ def tearDown(self):
self.cluster.shutdown()

# TODO: enable after https://github.com/scylladb/python-driver/issues/121 is fixed
@unittest.skip('Fails on scylla due to the broadcast_rpc_port is None')
@pytest.mark.skip('Fails on scylla due to the broadcast_rpc_port is None')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is None on scylla, is it expected ? If you don't want to look into it, can you please create an issue for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an issue for it on the comment, you raise it yourself :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -704,6 +704,7 @@ def test_can_insert_tuples_with_nulls(self):
self.assertEqual(('', None, None, b''), result[0].t)
self.assertEqual(('', None, None, b''), s.execute(read)[0].t)

@scylla_only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only on scylla ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was introduce as part of fixing this issue:
#201

the behavior of scylla and cassandra isn't the same for what covered in this test

@Lorak-mmk might remember the details, as he introduced this test

@fruch fruch force-pushed the integration_test_cassandra branch from 374e0c7 to ccb5e03 Compare November 19, 2024 22:43
@fruch fruch self-assigned this Nov 24, 2024
@fruch fruch force-pushed the integration_test_cassandra branch 2 times, most recently from 7aab14c to 48666ad Compare November 25, 2024 08:27
@dkropachev
Copy link
Collaborator

@fruch, is it ready to be reviewed ?

@fruch
Copy link
Author

fruch commented Nov 26, 2024

@fruch, is it ready to be reviewed ?

yes it's ready
and now all tests are passing (but some group of test that was excluded failing with asyncio eventloop)

- name: Test with cassandra
run: |
export EVENT_LOOP_MANAGER=${{ matrix.event_loop_manager }}
export CASSANDRA_VERSION='4.1.5'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 4.1.7 (latest 4.1.x) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause the the version that was available when I started this effort

'sasi_indexes_enabled': True,
'transient_replication_enabled': True,
})
elif Version(cassandra_version) >= Version('2.2'):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever test earlier than 2.2 versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far we didn't test cassandra at all, for sure we haven't test anything else but 4.1.5 we are testing now.

to make sure we don't break compitability completly with cassandra
introducing tests with one version of cassandra as well
those test can't be run ontop of cassandra
and we should mark them in a why we can skip them
cause some of them are missing, we are getting to situations
we fail to the the next test session on the github action builders
cassandra 4.1 and up, removed ms/min and so from name of config
options, and it need to be specific in the value itself

we have few tests using it, and need to be adapted
…n_timeout`

seems like this test was fixed in upsteam to ignore `ErrorMessage` type
I don't know the exact reason, but since it take us quite some time to sync
with upstream, taking this change on it's own

it happens only with cassandra (tested with 4.1)

Ref: datastax@1b335d4
when we use `unittest.skip` on a test function and run via pytest,
we still collect the test and run the setup/teardown even the
test function would be skipped.

when running with cassandra we get into situations where create
cluster is also failing for all kind of reasons, which we don't
need that cluster.
couldn't identified the root cause of the failures in those tests
and they seem to be failing just on asyncio (which is still exprimatal
for cassandra), so they are marked as xfail for now
@fruch fruch force-pushed the integration_test_cassandra branch from 48666ad to 4b47f4d Compare December 26, 2024 08:44
@fruch fruch requested a review from dkropachev December 26, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants