September 27, 2023

In this release, we’ve added support for large language models (LLMs) to enhance the capabilities of the ValidMind Library in preparation for the closed beta,1 along with a number of new demo notebooks that you can try out.

Other enhancements provide improvements for the developer experience and with our documentation site.

Release highlights

ValidMind Library (v1.19.0)

Large language model (LLM) support

We added initial support for large language models (LLMs) in ValidMind via the new FoundationModel class.

  • You can now create an instance of a FoundationModel and specify predict_fn and a prompt, and pass that into any test suite, for example.
  • The predict_fn must be defined by the user and implements the logic for calling the Foundation LLM, usually via the Python Library API.

To demonstrate the capabilities of LLM support, this release also includes new demo notebooks:

Prompt validation demo notebook for LLMs

As a proof of concept, we added initial native prompt validation tests to the library, including a notebook and simple template to test out these metrics on a sentiment analysis LLM model we built.

Text summarization model demo notebook for LLMs

We added a new notebook in the library that includes the financial news dataset, initializes a Hugging Face summarization model using the init_model interface, implements relevant metrics for testing, and demonstrates how to run a text summarization metrics test suite for an LLM instructed as a financial news summarizer.

Support for Hugging Face models

ValidMind can now validate pre-trained models from the HuggingFace Hub, including any language model compatible with the HF transformers API.

To illustrate this new feature, we have included a financial news sentiment analysis demo that runs documentation tests for a Hugging Face model with text classification using the financial_phrasebank:2

A better developer experience with run_test()

We added a new run_test() helper function that streamlines running tests for you. This function allows executing any individual test independent of a test suite or a documentation template. A one-line command can execute a test, making it easier to run tests with various parameters and options.

For example:

run_test("ClassImbalance", dataset=dataset, params=params, send=True)

We also updated the QuickStart notebook to have a consistent experience.

This notebook:

Example usage for run_test

Discover existing tests by calling list_tests() or describe_test():

list_tests():

A screenshot showing examples for `list_tests()`

Examples for list_tests()

describe_test():

A screenshot showing examples for `describe_test()`

Examples for describe_test()

View the tests associated with a documentation template by running preview_template():

A screenshot showing examples for `preview_template()`

Examples for preview_template()

Using the test ID, run a given test and pass in additional configuration parameters and inputs:

# No params
test_results = vm.tests.run_test(
    "class_imbalance",
    dataset=vm_dataset
)

# Custom params
test_results = vm.tests.run_test(
    "class_imbalance",
    params={"min_percent_threshold": 30},
    dataset=vm_dataset
)

Output:

A screenshot showing the example output `test_results`

Example output test_results

Send the results of the test to ValidMind by calling .log():

test_results.log()

Enhancements

Multi-class test improvements

We made a number of changes to tests to improve the developer experience:

  • A new fail_fast argument can be passed to run_test_plan(), run_test_suite() and run_documentation_tests(), used to fail and raise an exception on the first error encountered. This change is useful for debugging.
  • ClassifierPerformance test now determines if you are testing a binary or a multi-class model. When testing a multi-class model, we now report additional per-class, macro and weighted average tests.
  • Fixed F1 score test so it works correctly for binary and multi-class models.

Added multi-class classification support

  • The library now supports a multi-class version of some the existing tests, such as confusion matrix, accuracy, precision, recall, and more.
  • Also, the dataset and model interfaces now support dealing with multiple targets.

Implemented classification model comparison tests

  • Added a model performance comparison test for classification tasks.
  • The test includes metrics such as accuracy, F1, precision, recall, and roc_auc score.

Track additional test metadata

  • Added a metadata property to every ValidMind test class.
  • The metadata property includes a task_types field and a tags field which both serve to categorize the tests based on what data and model types they work with, what category of test they fall into, and more.

Filter tests by task type and tags

We added a new search feature to the validmind.tests.list_tests function to allow for better test discoverability.

The list_tests function in the tests module now supports the following arguments:

  • filter: If set, will match tests by ID, task_types or tags using a combination of substring and fuzzy string matching. Defaults to None.
  • task: If set, will further narrow matching tests (assuming filter has been passed) by exact matching the task to the test’s task_type metadata. Defaults to None.
  • tags: If a list is passed, will again narrow the matched tests by exact matching on tags. Defaults to None.

Documentation updates

User journey improvements

We enhanced the architecture and content of our external docs site to make the user journey more efficient for model developers and model validators who are new to our products:

  • Reworked the “Get Started” section to include more conceptual information and an overview of the high-level workflows.
  • Revised the “What is the ValidMind Library?” section to provide an end-to-end overview of the workflow that model developers should follow as they adopt the library.

Docs site improvements

We made a number of incremental improvements to our user guide:

  • New dropdown for the ValidMind Library that gives faster access to the most important bits, such as our code samples and the reference documentation — Click on Developers in the top navigation bar to see it in action!
  • Publication date for each page that reflects the last time the source file was touched.
  • Previous and next topic footers for related topics that make it easier to keep reading.
  • Expanded overview for key ValidMind concepts with some additional information.
  • Lighter background for diagrams that improves legibility.

How to upgrade

ValidMind Platform

To access the latest version of the ValidMind Platform,3 hard refresh your browser tab:

  • Windows: Ctrl + Shift + R OR Ctrl + F5
  • MacOS: ⌘ Cmd + Shift + R OR hold down ⌘ Cmd and click the Reload button

ValidMind Library

To upgrade the ValidMind Library:4

  1. In your Jupyter Notebook:

    • Using JupyterHub:5 Hard refresh your browser tab.
    • In your own developer environment:6 Restart your notebook.
  2. Then within a code cell or your terminal, run:

    %pip install --upgrade validmind

You may need to restart your kernel after running the upgrade package for changes to be applied.