Skip to content

Releases: truera/trulens

TruLens-Eval-0.18.2

01 Dec 23:39
Compare
Choose a tag to compare

Changelog

  • Unpin typing-extensions, typing-inspect (#590)
  • Make CI Pipeline run daily (#599)
  • Increase test coverage to all /quickstart* notebooks (#601)

Examples

  • Add notebook to use for dev and debugging (#605)
  • Add example for multimodal rag eval (#617)
  • Add example for finetuning experiments in bedrock (#618)

Bug Fixes

  • Fix helpfulness prompt (#594)
  • Serialize OpenAI Client (#595)
  • Removed extra reset cell in quickstart (#597)
  • Fix langchain prompt template imports in examples (#602)
  • Change model_id -> model_engine in Bedrock example (#612)
  • Fix prompt swapping in model agreement feedback (#615)
  • Fix > character in groundedness prompt (#623)

TruLens-Eval-0.18.0

18 Nov 16:38
2d45b6a
Compare
Choose a tag to compare

Evaluate and Track LLM Applications

Changelog

  • Migrate to OpenAI v1.

Known issues with async.

TruLens Eval v0.17.0

02 Nov 01:52
Compare
Choose a tag to compare

Changelog:

  • Add criteria and improve chain of thought prompting for evals
  • Allow feedback functions to be in different directions with appropriate coloring/emojis
  • Filter leaderboard feedback function results to only those available for the given app id
  • Add smoke testing/benchmarking for groundedness based on SummEval dataset

Bug Fixes:

  • Fix issue with LiteLLM provider
  • Allow Groundedness to run with any LLM provider

Examples

  • Using Anthropic Claude to run feedback functions

TruLens Eval v0.16.0

20 Oct 23:42
cacdbff
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Changelog

Bug Fixes

  • Fix App UI, links, icons

New Contributors

Full Changelog: trulens-eval-0.15.3...trulens-eval-0.16.0

TruLens Eval v0.15.3

11 Oct 16:04
1b21016
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Bug Fixes

  • Fixed OpenAI provider issues for feedback functions

TruLens Eval v0.15.1

06 Oct 21:13
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Changelog

  • PII Detection Feedback Function
  • Embedding Distance Feedback Function
  • App UI Playground

Examples

  • All new User Guides Docs Section
  • Language Verification
  • PII Detection
  • Hallucination Detection
  • Retrieval Quality

Bug Fixes

  • Unicode Issue on Windows

TruLens Eval v0.14.0

28 Sep 22:36
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Changelog

  • Added a stereotypes feedback function
  • Added a summarization feedback function
  • Added litellm as a provider
  • Support for llama index agent instrumentation
  • Added an interactive UI for jupyter notebooks to explore the App structure

Bugfixes

  • Fixed an issue with langchain async not logging

TruLens Eval v0.13.0

22 Sep 03:08
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Changelog

  • Updated all documentation to show context recorder usage
  • Smoke Tests are tested with trulens eval

Examples

  • Examples are restructured for better discoverability
  • Added a Milvus Vector DB Example

Bug Fixes

  • Removed metadata_fn in examples

TruLens Eval v0.12.0

08 Sep 00:11
Compare
Choose a tag to compare

Library containing evaluations of LLM Applications

Changelog

  • Added chain of thought and reason metadata to LLM based feedback functions
  • Feedback function docs upgrade
    • Feedback Function APIs now showing actual APIs with code
    • App wrappers (TruChain/TruLLama/etc) docs with code
    • More concise selector documentation with code

Examples

  • Updated examples to use context recording

Bug Fixes

  • Fix for basic app with multiple args
  • Fix aggregation bug in multi context groundedness introduced in 0.11.0
  • Now shows index of json path if available in timeline UI
  • No longer overwrites user changes to streamlit .toml files
  • Slow or hanging thread bug fix

TruLens Eval v0.11.0

31 Aug 23:19
Compare
Choose a tag to compare

Changelog

  • Add ability to add metadata to records
  • Add Feedback functions for bertscore, rouge, and bleu scores
  • More instrumentation for Langchain Agents
  • Added capability to instrument more than the default calls such as LangchainP Prompt Templates
  • Added support for tracking via python context managers
  • Added badges showing test results on documentation page

Examples

  • Added Llama Index RAG application with a vector store using Milvus

Bug Fixes

  • Fix for multi-result introduced in 0.10.0
  • Allow FeedbackCall to have JSON args
  • Fix error for OpenAi Chat LLM with ChatPromptTemplate