January 26, 2024

Release highlights

This release includes numerous improvements to the library, including new features for model and dataset initialization, easier testing, support for additional inputs and the Azure OpenAI API, updated notebooks, bug fixes, and much more.

ValidMind Library (v1.25.3)

Improvements to `init_model()`

When initializing a model, you can now pass a dataset with pre-computed model predictions if they are available.
By default, if no prediction column is specified when calling init_model(), the ValidMind Library will compute the model predictions on the entire dataset.

init_model()

To illustrate how passing a dataset that includes a prediction column can help, consider the following example without a prediction column:

vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

Internally, this example invokes the predict() method of the model for the training and test datasets when the model is initialized.

This approach can be problematic with large datasets: init_model can simply take too long to compute.

You can now avoid this issue by providing a dataset with a column containing pre-computed predictions, similar to the following example.

If init_model detects this column, it will not generate model predictions at runtime.

x1	x2	…	target_column	prediction_column
0.1	0.2	…	0	0
0.2	0.4	1…	1	1

Usage example with a prediction column:

vm.init_dataset(
     dataset=df,
     feature_columns=[...],
     target_column= ...,
     extra_columns={
         prediction_column: 'NAME-OF-PREDICTION-COLUMN',
    },
)

Improvements to `init_dataset()`

When initializing a dataset, the new feature_columns argument lets you specify a list of feature names for prediction to improve efficiency. Internally, the function filters the dataset to retain only these specified features for prediction-related tasks, leaving the remaining dataset available for other purposes, such as segmentation.

This improvement replaces the existing behavior of init_dataset(), which loaded the entire dataset, incorporating all available features for prediction tasks.
While this approach worked well, it could impose limitations when generating segmented tests and proved somewhat inefficient with large datasets containing numerous features, of which only a small subset were relevant for prediction.

init_dataset()

Usage example:

feature_columns = ['CreditScore', 'Age', 'Balance', 'NumOfProducts', 'EstimatedSalary']

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    target_column=demo_dataset.target_column,
    feature_columns=feature_columns
)

A new notebook illustrates how you can configure these dataset features:

How to utilize the feature_columns parameter when initizalizing validmind datasets and model objects
How feature_columns can be used to report by segment

Configure dataset features

Improvements to `run_documentation_tests()`

The run_documentation_tests() function, used to collect and run all the tests associated with a template, now supports running multiple sections at a time.

This means that you no longer need to call the same function twice for two different sections, reducing the potential for errors and enabling you to use a single config object.
The previous behavior was to allow running only one section at a time.
This change maintains backward compatibility with the existing syntax, requiring no updates to your code.

run_documentation_tests()

Existing example usage: Multiple function calls are needed to run multiple sections.

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section="section_1",
    config={
        "validmind.tests.data_validation.ClassImbalance": ...
    } 
)

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section="section_2",
    config={
        "validmind.tests.data_validation.Duplicates": ...
    } 
)

New example usage: A single function call runs multiple sections.

full_suite = vm.run_documentation_tests(
    inputs = {
        ...
    },
    section=["section_1", "section_2"],
    config={
        "validmind.tests.data_validation.ClassImbalance": ...,
        "validmind.tests.data_validation.Duplicates": ...
    } 
)

Run individual documentation sections

Support for custom inputs

The ValidMind Library now supports passing custom inputs as an inputs dictionary when running individual tests or test suites. This support replaces the standard inputs dataset, model, and models, which are now deprecated.¹

¹ Standard inputs are deprecated

New recommended syntax for passing inputs:

test_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_dataset,
        "model": vm_model,
    },
)

To make it easier for you to adopt custom inputs, we have updated our how-to notebooks and code samples to use the new recommended syntax:

Code samples

Enhancements

Support for Azure OpenAI Service

The ValidMind Library now supports running LLM-powered tests with the Azure OpenAI Service via API,² in addition to the previously supported OpenAI API.

² To learn more about configuring Azure OpenAI Service, see Authentication in the official Microsoft documentation.

To work with Azure OpenAI API endpoints, you need to set the following environment variables before calling init():

AZURE_OPENAI_KEY: API key for authentication
AZURE_OPENAI_ENDPOINT: API endpoint URL
AZURE_OPENAI_MODEL: Specifies the language model or service to use
AZURE_OPENAI_VERSION (optional): Allows specifying a specific version of the service if available

init()

Bug fixes

Fixed support for OpenAI library >=1.0

We have updated our demonstration notebooks for large language models (LLMs) to provide the correct support for openai >= 1.0.0.
Previously, some notebooks were using an older version of the OpenAI client API.

Deprecations

Standard inputs are deprecated

The ValidMind Library now supports passing custom inputs³ as an inputs dictionary when running individual tests or test suites.

³ Support for custom inputs

As a result, the standard inputs dataset, model, and models are deprecated and might be removed in a future release.
You should update your code to use the new, recommended syntax.

Deprecated legacy usage for passing inputs:

  test_suite = vm.run_documentation_tests(
      dataset=vm_dataset,
      model=vm_model
  )

New recommended usage for passing inputs:

  test_suite = vm.run_documentation_tests(
      inputs={
          "dataset": vm_dataset,
          "model": vm_model,
      },
  )

Removed deprecated Python Library API methods

The Python API methods run_template and run_test_plan had been deprecated previously. They have now been removed from the ValidMind Library.

You’ll need to update your code to use the recommended high-level API methods:

run_template (removed): Use run_documentation_tests()
run_test_plan (removed): Use run_test_suite()

User guide

Updated Python requirements

We have updated our user guide to clarify the Python versions supported by the ValidMind Library.
We now support Python ≧3.8 and <3.11.

How to upgrade

ValidMind Platform

To access the latest version of the ValidMind Platform,⁴ hard refresh your browser tab:

⁴ Log in to ValidMind

Windows: Ctrl + Shift + R OR Ctrl + F5
MacOS: ⌘ Cmd + Shift + R OR hold down ⌘ Cmd and click the Reload button

ValidMind Library

To upgrade the ValidMind Library:⁵

⁵ Get started with the ValidMind Library

In your Jupyter Notebook:
- Using JupyterHub:⁶ Hard refresh your browser tab.
- In your own developer environment:⁷ Restart your notebook.
Then within a code cell or your terminal, run:
```
%pip install --upgrade validmind
```

⁶ Try it with JupyterHub

⁷ Try it in your own developer environment

You may need to restart your kernel after running the upgrade package for changes to be applied.

Release highlights

ValidMind Library (v1.25.3)

Improvements to init_model()

Improvements to init_dataset()

Improvements to run_documentation_tests()

Support for custom inputs

Enhancements

Support for Azure OpenAI Service

Bug fixes

Fixed support for OpenAI library >=1.0

Deprecations

Standard inputs are deprecated

Removed deprecated Python Library API methods

User guide

Updated Python requirements

How to upgrade

ValidMind Platform

ValidMind Library

Improvements to `init_model()`

Improvements to `init_dataset()`

Improvements to `run_documentation_tests()`