Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EM-933 Throttling errors withing send-message state machine #159

Draft
wants to merge 73 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
151abdf
EM-933 - dep updates
johnmarston-nhs Sep 16, 2024
a745c0d
EM-933: resolved warning about multiple event endpoints ebing set.
johnmarston-nhs Sep 17, 2024
da87b5b
EM-933. Added a lock table, locking mechanism and a test for a simple…
johnmarston-nhs Sep 18, 2024
39d0aa0
EM-933. Added failure to lock and associated test.
johnmarston-nhs Sep 19, 2024
deb3ffb
EM-933. Implemented lock release and tests.
johnmarston-nhs Sep 20, 2024
3ae251a
EM-933. Fixed linter issues.
johnmarston-nhs Sep 20, 2024
c2a5d6d
EM-933. TFSec pacification.
johnmarston-nhs Sep 20, 2024
8342d4d
EM-933. TFLint pacification.
johnmarston-nhs Sep 20, 2024
ff7491d
EM-933. Fixing existing tests to work with the new lock table and fie…
johnmarston-nhs Sep 20, 2024
4f69672
EM-933. Fixing existing tests to work with the new lock table and fie…
johnmarston-nhs Sep 20, 2024
8c92024
EM-933: version bumps.
johnmarston-nhs Oct 1, 2024
a07b466
EM-933: version bumps.
johnmarston-nhs Oct 1, 2024
cad5a86
EM-933. Remove obsolete tests around invocation directly fro EventBri…
johnmarston-nhs Oct 1, 2024
0a8e7b5
EM-933. Merged in changes from other dev branch
johnmarston-nhs Oct 1, 2024
09d13dc
EM-933: ruff fix
johnmarston-nhs Oct 1, 2024
3936ccb
EM-933. Proper release handling and mocked tests.
johnmarston-nhs Oct 2, 2024
849f5df
EM-933. Fixed some import ordering.
johnmarston-nhs Oct 2, 2024
9ba6520
EM-933. Comment rewording.
johnmarston-nhs Oct 2, 2024
ace6aee
EM-933. Build fixes.
johnmarston-nhs Oct 3, 2024
97b8c45
EM-933. handling missing lock details when releasing... + tests
johnmarston-nhs Oct 3, 2024
c68e057
EM-933. Linting.
johnmarston-nhs Oct 3, 2024
ff66289
EM-933. Added print statemente for test run.
johnmarston-nhs Oct 3, 2024
beab61a
EM-933. Added print statemente for test run.
johnmarston-nhs Oct 3, 2024
bc2fb2c
EM-933. Accepting that pulling out the lock row is the only way to fi…
johnmarston-nhs Oct 4, 2024
f5eb951
EM-993. Implemented fetch locking, renamed execution_id to owner_id i…
johnmarston-nhs Oct 4, 2024
3b5c725
EM-933 reverted in advertent change
johnmarston-nhs Oct 4, 2024
2188abc
EM-933. Fix typo in log call.
johnmarston-nhs Oct 9, 2024
c18ee39
EM-933. Fix test log exytractor to use new wording.
johnmarston-nhs Oct 11, 2024
1fe1e20
EM-933. Handle non-step function invocations for the poll application…
johnmarston-nhs Oct 14, 2024
7f51205
EM-933. Optional locking for the poll function.
johnmarston-nhs Oct 15, 2024
612a116
EM-933. BAtter existing lock handling for the get_messages sfn
johnmarston-nhs Oct 15, 2024
c7fed8d
EM-933. Fix linter errors.
johnmarston-nhs Oct 15, 2024
fd8165f
EM-933. Fixed mocked test assertions.
johnmarston-nhs Oct 15, 2024
88fbb91
EM-933. Fixed a ruff failure.
johnmarston-nhs Oct 15, 2024
f599fb8
EM-933. Dropped an obsolete mocked test for the poll singleton check.
johnmarston-nhs Oct 15, 2024
ebb575a
EM-933. Refectored the send test to handle a log race condition.
johnmarston-nhs Oct 16, 2024
d101606
EM-933. Fix race condition in the lambda completion log searches.
johnmarston-nhs Oct 16, 2024
37bf728
EM-933. Enahanced failure logging.
johnmarston-nhs Oct 16, 2024
0335240
EM-933. Expand test covergae for the poll application.
johnmarston-nhs Oct 16, 2024
eb8b94d
EM-933. More linting fixes.
johnmarston-nhs Oct 16, 2024
f96297b
EM-933. More linting fixes.
johnmarston-nhs Oct 16, 2024
810ad60
EM-933. Refactored locking. Tests are going to be a mess.
johnmarston-nhs Oct 18, 2024
ec984cc
EM-933. Changed get lock release logic.
johnmarston-nhs Oct 21, 2024
419eb0f
EM-933. Fix ruff issues.
johnmarston-nhs Oct 21, 2024
8b57c76
EM-933. Change the fetch fuction to use the new common locking methods.
johnmarston-nhs Oct 21, 2024
50fa4e8
EM-933. Fix ruff issues.
johnmarston-nhs Oct 21, 2024
2a0e62e
EM-933. Log level changes.
johnmarston-nhs Oct 22, 2024
7ad35c9
EM-933. Added expected warnings to the log assertions.
johnmarston-nhs Oct 23, 2024
2227bdf
EM-933. Accept more WARN logs.
johnmarston-nhs Oct 23, 2024
94683eb
em-933: fix failing tests in send_message_test.py
james-bradley-nhs Oct 23, 2024
c605132
em-933: update hooks
james-bradley-nhs Oct 23, 2024
afdd33e
em-933: add internal ID to _get_lock_details_from_log_capture
james-bradley-nhs Oct 23, 2024
8dffeea
EM-933: Fix mocked tests.
johnmarston-nhs Oct 23, 2024
93bb698
em-933: find MESHLOCK logs from expected log name
james-bradley-nhs Oct 23, 2024
4ad0f17
EM-933: Mypy pacification.
johnmarston-nhs Oct 23, 2024
d31f1e8
EM-933: extended coverage.
johnmarston-nhs Oct 23, 2024
4c40cd1
em-933: add lock manger lambda
james-bradley-nhs Oct 30, 2024
4ee319d
em-933: assert lock does not remain in dynamoDB
james-bradley-nhs Oct 30, 2024
bb0be32
em-933: add unit tests
james-bradley-nhs Nov 4, 2024
db28c8d
em-933: add dynamo to IAM policy document
james-bradley-nhs Nov 5, 2024
c8fa23e
em-933: send messages uses lock manager lambda to release lock
james-bradley-nhs Nov 6, 2024
50b673c
em-933: remove MESHLOCK0007 assertion from pagination test
james-bradley-nhs Nov 6, 2024
a9596f0
em-933: remove MESHLOCK0007 from get handshake test
james-bradley-nhs Nov 6, 2024
96fa0d2
em-933: fix lock manager unit test
james-bradley-nhs Nov 6, 2024
88ec875
em-933: add dynamo db access to check send paramaters
james-bradley-nhs Nov 13, 2024
b6eae24
em-933: give check_send_parameters UpdateItem permission
james-bradley-nhs Nov 19, 2024
007da3d
em-933: fix resource name for dynamo permissions
james-bradley-nhs Nov 19, 2024
86dbbe8
em-933: remove / in resource name for dynamo permissions
james-bradley-nhs Nov 19, 2024
876eb79
em-933: give send_message_chunk dynamodb:updateItem permission
james-bradley-nhs Nov 19, 2024
b2aae93
em-933: give step function lambda:InvokeFunction permission for the l…
james-bradley-nhs Nov 20, 2024
f8c79b7
em-933: add iam policies for fectching messages
james-bradley-nhs Nov 20, 2024
1264e40
em-933: update tflint.hcl
james-bradley-nhs Nov 21, 2024
a24c05d
em-933: clear up print statements in lock manager
james-bradley-nhs Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions module/dynamodb_table_locktable.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
locals {
locktable_name = "${local.name}-lock-table"
}

#tfsec:ignore:aws-dynamodb-enable-at-rest-encryption
resource "aws_dynamodb_table" "lock_table" {
name = local.locktable_name
billing_mode = "PROVISIONED"
read_capacity = 20
write_capacity = 20
hash_key = "LockName"
stream_enabled = false
point_in_time_recovery {
enabled = true
}

server_side_encryption {
enabled = true
}

attribute {
name = "LockName"
type = "S"
}

attribute {
name = "LockType"
type = "S"
}

attribute {
name = "LockOwner"
type = "S"
}

global_secondary_index {
name = "LockTypeOwnerTableIndex"
hash_key = "LockType"
range_key = "LockOwner"
write_capacity = 5
read_capacity = 5
projection_type = "KEYS_ONLY"
}
}
19 changes: 18 additions & 1 deletion module/lambda_check_send_parameters.tf
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,25 @@ data "aws_iam_policy_document" "check_send_parameters" {
]
}

statement {
sid = "DynamoDBAccess"
effect = "Allow"

actions = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:UpdateItem",
]

resources = [
"arn:aws:dynamodb:eu-west-2:${var.account_id}:table/${local.locktable_name}"
]
}

dynamic "statement" {
for_each = local.vpc_enabled ? [true] : []
content {

sid = "EC2Interfaces"
effect = "Allow"

Expand Down Expand Up @@ -216,4 +231,6 @@ data "aws_iam_policy_document" "check_send_parameters_check_sfn" {
"${replace(aws_sfn_state_machine.send_message.arn, "stateMachine", "execution")}*"
]
}


}
16 changes: 16 additions & 0 deletions module/lambda_fetch_message_chunk.tf
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,22 @@ data "aws_iam_policy_document" "fetch_message_chunk" {
]
}

statement {
sid = "DynamoDBAccess"
effect = "Allow"

actions = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:UpdateItem",
]

resources = [
"arn:aws:dynamodb:eu-west-2:${var.account_id}:table/${local.locktable_name}"
]
}

dynamic "statement" {
for_each = var.use_secrets_manager ? [true] : []
content {
Expand Down
147 changes: 147 additions & 0 deletions module/lambda_lock_manager.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
locals {
lock_manager_name = "${local.name}-lock-manager"
}

resource "aws_lambda_function" "lock_manager" {
function_name = local.lock_manager_name
filename = data.archive_file.app.output_path
handler = "mesh_lock_manager_application.lambda_handler"
runtime = local.python_runtime
timeout = local.lambda_timeout
source_code_hash = data.archive_file.app.output_base64sha256
role = aws_iam_role.lock_manager.arn
layers = [aws_lambda_layer_version.mesh_aws_client_dependencies.arn]

publish = true

environment {
variables = local.common_env_vars
}

dynamic "vpc_config" {
for_each = local.vpc_enabled ? [local.vpc_enabled] : []
content {
subnet_ids = var.subnet_ids
security_group_ids = [aws_security_group.lock_manager[0].id]
}
}

depends_on = [
aws_cloudwatch_log_group.lock_manager,
]
}

resource "aws_cloudwatch_log_group" "lock_manager" {
name = "/aws/lambda/${local.lock_manager_name}"
retention_in_days = var.cloudwatch_retention_in_days
kms_key_id = aws_kms_key.mesh.arn
lifecycle {
ignore_changes = [
log_group_class, # localstack not currently returning this
]
}
}

resource "aws_iam_role" "lock_manager" {
name = "${local.lock_manager_name}-role"
description = "${local.lock_manager_name}-role"
assume_role_policy = data.aws_iam_policy_document.lock_manager_assume.json
}

data "aws_iam_policy_document" "lock_manager_assume" {
statement {
actions = ["sts:AssumeRole"]

principals {
type = "Service"

identifiers = [
"lambda.amazonaws.com",
]
}
}
}

resource "aws_iam_role_policy_attachment" "lock_manager" {
role = aws_iam_role.lock_manager.name
policy_arn = aws_iam_policy.lock_manager.arn
}

resource "aws_iam_policy" "lock_manager" {
name = "${local.lock_manager_name}-policy"
description = "${local.lock_manager_name}-policy"
policy = data.aws_iam_policy_document.lock_manager.json
}

data "aws_iam_policy_document" "lock_manager" {
statement {
sid = "CloudWatchAllow"
effect = "Allow"

actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]

resources = [
"${aws_cloudwatch_log_group.lock_manager.arn}*"
]
}

statement {
sid = "SSMDescribe"
effect = "Allow"

actions = [
"ssm:DescribeParameters"
]

resources = [
"arn:aws:ssm:eu-west-2:${var.account_id}:parameter/${local.name}/*"
]
}

statement {
sid = "SSMGet"
effect = "Allow"

actions = [
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:GetParametersByPath"
]

resources = [
"arn:aws:ssm:eu-west-2:${var.account_id}:parameter/${local.name}/*",
"arn:aws:ssm:eu-west-2:${var.account_id}:parameter/${local.name}"
]
}

statement {
sid = "KMSDecrypt"
effect = "Allow"

actions = [
"kms:Decrypt"
]

resources = concat(
[aws_kms_alias.mesh.target_key_arn],
var.use_secrets_manager ? local.secrets_kms_key_arns : []
)
}

statement {
sid = "DynamoDBDelete"
effect = "Allow"

actions = [
"dynamodb:DeleteItem"
]

resources = [
"arn:aws:dynamodb:eu-west-2:${var.account_id}:table/${local.locktable_name}"
]
}
}
17 changes: 17 additions & 0 deletions module/lambda_poll_mailbox.tf
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,22 @@ data "aws_iam_policy_document" "poll_mailbox" {
var.use_secrets_manager ? local.secrets_kms_key_arns : []
)
}

statement {
sid = "DynamoDBAccess"
effect = "Allow"

actions = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:UpdateItem",
]

resources = [
"arn:aws:dynamodb:eu-west-2:${var.account_id}:table/${local.locktable_name}"
]
}

dynamic "statement" {
for_each = var.use_secrets_manager ? [true] : []
Expand Down Expand Up @@ -170,6 +186,7 @@ data "aws_iam_policy_document" "poll_mailbox" {
}
}


}

resource "aws_iam_role_policy_attachment" "poll_mailbox_lambda_insights" {
Expand Down
15 changes: 15 additions & 0 deletions module/lambda_send_message_chunk.tf
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,21 @@ data "aws_iam_policy_document" "send_message_chunk" {
"${aws_s3_bucket.mesh.arn}/*"
]
}
statement {
sid = "DynamoDBAccess"
effect = "Allow"

actions = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:UpdateItem",
]

resources = [
"arn:aws:dynamodb:eu-west-2:${var.account_id}:table/${local.locktable_name}"
]
}

dynamic "statement" {
for_each = var.use_secrets_manager ? [true] : []
Expand Down
2 changes: 2 additions & 0 deletions module/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ locals {
MESH_URL = local.mesh_url[var.mesh_env]
MESH_BUCKET = aws_s3_bucket.mesh.bucket

DDB_LOCK_TABLE_NAME = aws_dynamodb_table.lock_table.name

CHUNK_SIZE = var.chunk_size
CRUMB_SIZE = var.crumb_size == null ? var.chunk_size : var.crumb_size
NEVER_COMPRESS = var.never_compress
Expand Down
46 changes: 42 additions & 4 deletions module/stepfunctions_get_messages.tf
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ resource "aws_sfn_state_machine" "get_messages" {
"Failed?" = {
Choices = [
{
Next = "Poll complete"
Next = "Poll complete release lock"
NumericEquals = 204
Variable = "$.statusCode"
},
Expand Down Expand Up @@ -99,7 +99,10 @@ resource "aws_sfn_state_machine" "get_messages" {
OutputPath = "$.Payload"
Parameters = {
FunctionName = "${aws_lambda_function.poll_mailbox.arn}:${aws_lambda_function.poll_mailbox.version}"
"Payload.$" = "$"
Payload = {
"EventDetail.$" = "$"
"ExecutionId.$" = "$$.Execution.Id"
}
}
Resource = "arn:aws:states:::lambda:invoke"
Retry = [
Expand Down Expand Up @@ -132,9 +135,42 @@ resource "aws_sfn_state_machine" "get_messages" {
Variable = "$.body.message_count"
},
]
Default = "Poll complete"
Default = "Poll complete release lock"
Type = "Choice"
}
"Poll complete release lock" = {
Next = "Poll complete"
OutputPath = "$.Payload"
Parameters = {
FunctionName = "${aws_lambda_function.lock_manager.arn}:${aws_lambda_function.lock_manager.version}"
Payload = {
"EventDetail.$" = "$"
"Operation" = "release"
}
}
Resource = "arn:aws:states:::lambda:invoke"
Retry = [
{
BackoffRate = 2
ErrorEquals = [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
]
IntervalSeconds = 2
MaxAttempts = 3
},
{
ErrorEquals = [
"States.TaskFailed"
],
BackoffRate = 1,
IntervalSeconds = 300,
MaxAttempts = 2
},
]
Type = "Task"
}
}
})
}
Expand Down Expand Up @@ -214,7 +250,9 @@ data "aws_iam_policy_document" "get_messages" {
aws_lambda_function.fetch_message_chunk.arn,
"${aws_lambda_function.fetch_message_chunk.arn}:*",
aws_lambda_function.poll_mailbox.arn,
"${aws_lambda_function.poll_mailbox.arn}:*"
"${aws_lambda_function.poll_mailbox.arn}:*",
aws_lambda_function.lock_manager.arn,
"${aws_lambda_function.lock_manager.arn}:*"
]

actions = ["lambda:InvokeFunction"]
Expand Down
Loading
Loading