January 26, 2024

This release includes numerous improvements to the library, including new features for model and dataset initialization, easier testing, support for additional inputs and the Azure OpenAI API, updated notebooks, bug fixes, and much more.

Release highlights

ValidMind Library (v1.25.3)

Improvements to init_model()

  • When initializing a model, you can now pass a dataset with pre-computed model predictions if they are available.
  • By default, if no prediction column is specified when calling init_model(), the ValidMind Library will compute the model predictions on the entire dataset.

To illustrate how passing a dataset that includes a prediction column can help, consider the following example without a prediction column:

vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

Internally, this example invokes the predict() method of the model for the training and test datasets when the model is initialized.

This approach can be problematic with large datasets: init_model can simply take too long to compute.

You can now avoid this issue by providing a dataset with a column containing pre-computed predictions, similar to the following example.

If init_model detects this column, it will not generate model predictions at runtime.

x1 x2 target_column prediction_column
0.1 0.2 0 0
0.2 0.4 1… 1 1

Usage example with a prediction column:

vm.init_dataset(
     dataset=df,
     feature_columns=[...],
     target_column= ...,
     extra_columns={
         prediction_column: 'NAME-OF-PREDICTION-COLUMN',
    },
)

Improvements to init_dataset()

When initializing a dataset, the new feature_columns argument lets you specify a list of feature names for prediction to improve efficiency. Internally, the function filters the dataset to retain only these specified features for prediction-related tasks, leaving the remaining dataset available for other purposes, such as segmentation.

  • This improvement replaces the existing behavior of init_dataset(), which loaded the entire dataset, incorporating all available features for prediction tasks.
  • While this approach worked well, it could impose limitations when generating segmented tests and proved somewhat inefficient with large datasets containing numerous features, of which only a small subset were relevant for prediction.

Usage example:

feature_columns = ['CreditScore', 'Age', 'Balance', 'NumOfProducts', 'EstimatedSalary']

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    target_column=demo_dataset.target_column,
    feature_columns=feature_columns
)

A new notebook illustrates how you can configure these dataset features:

  • How to utilize the feature_columns parameter when initizalizing validmind datasets and model objects
  • How feature_columns can be used to report by segment

Improvements to run_documentation_tests()

The run_documentation_tests() function, used to collect and run all the tests associated with a template, now supports running multiple sections at a time.

  • This means that you no longer need to call the same function twice for two different sections, reducing the potential for errors and enabling you to use a single config object.
  • The previous behavior was to allow running only one section at a time.
  • This change maintains backward compatibility with the existing syntax, requiring no updates to your code.

Existing example usage: Multiple function calls are needed to run multiple sections.

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section="section_1",
    config={
        "validmind.tests.data_validation.ClassImbalance": ...
    } 
)

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section="section_2",
    config={
        "validmind.tests.data_validation.Duplicates": ...
    } 
)

New example usage: A single function call runs multiple sections.

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section=["section_1", "section_2"],
    config={
        "validmind.tests.data_validation.ClassImbalance": ...,
        "validmind.tests.data_validation.Duplicates": ...
    } 
)

Support for custom inputs

The ValidMind Library now supports passing custom inputs as an inputs dictionary when running individual tests or test suites. This support replaces the standard inputs dataset, model, and models, which are now deprecated.1

New recommended syntax for passing inputs:

test_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_dataset,
        "model": vm_model,
    },
)

To make it easier for you to adopt custom inputs, we have updated our how-to notebooks and code samples to use the new recommended syntax:

Code samples

Enhancements

Support for Azure OpenAI Service

The ValidMind Library now supports running LLM-powered tests with the Azure OpenAI Service via API,2 in addition to the previously supported OpenAI API.

2 To learn more about configuring Azure OpenAI Service, see Authentication in the official Microsoft documentation.

To work with Azure OpenAI API endpoints, you need to set the following environment variables before calling init():

  • AZURE_OPENAI_KEY: API key for authentication
  • AZURE_OPENAI_ENDPOINT: API endpoint URL
  • AZURE_OPENAI_MODEL: Specifies the language model or service to use
  • AZURE_OPENAI_VERSION (optional): Allows specifying a specific version of the service if available

Bug fixes

Fixed support for OpenAI library >=1.0

  • We have updated our demonstration notebooks for large language models (LLMs) to provide the correct support for openai >= 1.0.0.
  • Previously, some notebooks were using an older version of the OpenAI client API.

Deprecations

Standard inputs are deprecated

The ValidMind Library now supports passing custom inputs3 as an inputs dictionary when running individual tests or test suites.

  • As a result, the standard inputs dataset, model, and models are deprecated and might be removed in a future release.
  • You should update your code to use the new, recommended syntax.

Deprecated legacy usage for passing inputs:

  test_suite = vm.run_documentation_tests(
      dataset=vm_dataset,
      model=vm_model
  )

New recommended usage for passing inputs:

  test_suite = vm.run_documentation_tests(
      inputs={
          "dataset": vm_dataset,
          "model": vm_model,
      },
  )

Removed deprecated Python Library API methods

The Python API methods run_template and run_test_plan had been deprecated previously. They have now been removed from the ValidMind Library.

You’ll need to update your code to use the recommended high-level API methods:

User guide

Updated Python requirements

  • We have updated our user guide to clarify the Python versions supported by the ValidMind Library.
  • We now support Python ≧3.8 and <3.11.

How to upgrade

ValidMind Platform

To access the latest version of the ValidMind Platform,4 hard refresh your browser tab:

  • Windows: Ctrl + Shift + R OR Ctrl + F5
  • MacOS: ⌘ Cmd + Shift + R OR hold down ⌘ Cmd and click the Reload button

ValidMind Library

To upgrade the ValidMind Library:5

  1. In your Jupyter Notebook:

    • Using JupyterHub:6 Hard refresh your browser tab.
    • In your own developer environment:7 Restart your notebook.
  2. Then within a code cell or your terminal, run:

    %pip install --upgrade validmind

You may need to restart your kernel after running the upgrade package for changes to be applied.