Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bedrock): implement new data source structure #668

Merged
merged 26 commits into from
Sep 25, 2024

Conversation

aws-rafams
Copy link
Contributor

@aws-rafams aws-rafams commented Aug 27, 2024

Fixes #655

Draft, work in progress.

Just linking the draft to get some comments on the planned structure / interface.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@aws-rafams
Copy link
Contributor Author

aws-rafams commented Aug 28, 2024

The implementation is practically finished, just needs extensive testing. Would love to have comments on the final structure and on the interfaces. Once the structure has been validated, will proceed with:

  • further unit tests
  • further integ tests
  • documentation updates

Big update that would solve #666, #655, and #587.

Working example

const kb = new KnowledgeBase(stack, 'MyKnowledgeBase', {
  name: 'MyKnowledgeBase',
  embeddingsModel: BedrockFoundationModel.COHERE_EMBED_MULTILINGUAL_V3,
});

const bucket = new Bucket(stack, 'Bucket', {});
const lambdaFunction = new Function(stack, 'MyFunction', {
  runtime: cdk.aws_lambda.Runtime.PYTHON_3_9,
  handler: 'index.handler',
  code: cdk.aws_lambda.Code.fromInline('print("Hello, World!")'),
});

kb.addWebCrawlerDataSource({
  sourceUrls: ['https://docs.aws.amazon.com/'],
  chunkingStrategy: ChunkingStrategy.HIERARCHICAL_COHERE,
  customTransformation: CustomTransformation.lambda({
    lambdaFunction: lambdaFunction,
    s3BucketUri: `s3://${bucket.bucketName}/chunk-processor/`,
  }),
});

kb.addS3DataSource({
  bucket,
  chunkingStrategy: ChunkingStrategy.SEMANTIC,
  parsingStrategy: ParsingStategy.foundationModel({
    parsingModel: BedrockFoundationModel.ANTHROPIC_CLAUDE_SONNET_V1_0.asIModel(stack),
  }),
});

@krokoko
Copy link
Collaborator

krokoko commented Sep 4, 2024

Just FYI CDK v.2.155 has updates impacting the resources used in this PR, see aws/aws-cdk#31193

@aws-rafams aws-rafams reopened this Sep 6, 2024
@aws-rafams aws-rafams marked this pull request as ready for review September 6, 2024 15:50
@krokoko
Copy link
Collaborator

krokoko commented Sep 17, 2024

Overall LGTM ! Thanks for this ! @aws-rafams I am fine with the structure and interfaces, do you still need to work on:

  • further unit tests
  • further integ tests
  • documentation updates
    ?
    If not, I will run some tests with the package through our samples
    There is couple of breaking changes too that we will need to highlight in the release notes

@aws-rafams
Copy link
Contributor Author

aws-rafams commented Sep 17, 2024

Overall LGTM ! Thanks for this ! @aws-rafams I am fine with the structure and interfaces, do you still need to work on:

  • further unit tests
  • further integ tests
  • documentation updates
    ?
    If not, I will run some tests with the package through our samples
    There is couple of breaking changes too that we will need to highlight in the release notes

I have completed the tests and documentation updates. However, I would really appreciate a hand with the Python snippets on the docs, which I believe are the only remaining task.

The primary breaking change is in the S3DataSource resource and the ChunkingStrategy enum which is now a class. The previous structure and properties will not work with the new version, so we should highlight this in the release notes.

The changes to the KnowledgeBase are primarily the addition of helper methods, while the rest are new resources that did not exist previously.

@krokoko
Copy link
Collaborator

krokoko commented Sep 18, 2024

Overall LGTM ! Thanks for this ! @aws-rafams I am fine with the structure and interfaces, do you still need to work on:

  • further unit tests
  • further integ tests
  • documentation updates
    ?
    If not, I will run some tests with the package through our samples
    There is couple of breaking changes too that we will need to highlight in the release notes

I have completed the tests and documentation updates. However, I would really appreciate a hand with the Python snippets on the docs, which I believe are the only remaining task.

The primary breaking change is in the S3DataSource resource and the ChunkingStrategy enum which is now a class. The previous structure and properties will not work with the new version, so we should highlight this in the release notes.

The changes to the KnowledgeBase are primarily the addition of helper methods, while the rest are new resources that did not exist previously.

Thanks @aws-rafams, I'll run some tests with the package and update here !

@krokoko
Copy link
Collaborator

krokoko commented Sep 24, 2024

Thanks @aws-rafams !
Ran several tests using our bedrock agent sample, everything seems to work as expected. Non exhaustive list includes:

  • testing the different chunking strategies
  • import of KB

image

As discussed, only detected issue is on permissions related to FM parsing and lambda in custom transformation that are missing. Readme seems good, we will update it soon as we need to add support for the new languages.

Could you please:

  • Fix the permission issue
  • Remove the integ tests causing build failure
    After that we can approve and merge

Note:

@aws-rafams
Copy link
Contributor Author

aws-rafams commented Sep 25, 2024

The permission issue has been solved. When using an FM for Parsing or a Lambda for Custom processing, the appropriate policies are now added
Screenshot 2024-09-25 at 14 42 31

@krokoko krokoko self-requested a review September 25, 2024 16:38
@krokoko krokoko merged commit 04e1efb into awslabs:main Sep 25, 2024
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(bedrock): support additional data sources
4 participants