January 26, 2024
This release includes numerous improvements to the library, including new features for model and dataset initialization, easier testing, support for additional inputs and the Azure OpenAI API, updated notebooks, bug fixes, and much more.
Release highlights
ValidMind Library (v1.25.3)
Improvements to init_model()
- When initializing a model, you can now pass a dataset with pre-computed model predictions if they are available.
- By default, if no prediction column is specified when calling
init_model()
, the ValidMind Library will compute the model predictions on the entire dataset.
To illustrate how passing a dataset that includes a prediction column can help, consider the following example without a prediction column:
= vm.init_model(
vm_model
model,=vm_train_ds,
train_ds=vm_test_ds,
test_ds )
Internally, this example invokes the predict()
method of the model for the training and test datasets when the model is initialized.
This approach can be problematic with large datasets: init_model
can simply take too long to compute.
You can now avoid this issue by providing a dataset with a column containing pre-computed predictions, similar to the following example.
If init_model
detects this column, it will not generate model predictions at runtime.
x1 | x2 | … | target_column | prediction_column |
---|---|---|---|---|
0.1 | 0.2 | … | 0 | 0 |
0.2 | 0.4 | 1… | 1 | 1 |
Usage example with a prediction column:
vm.init_dataset(=df,
dataset=[...],
feature_columns= ...,
target_column={
extra_columns'NAME-OF-PREDICTION-COLUMN',
prediction_column:
}, )
Improvements to init_dataset()
When initializing a dataset, the new feature_columns
argument lets you specify a list of feature names for prediction to improve efficiency. Internally, the function filters the dataset to retain only these specified features for prediction-related tasks, leaving the remaining dataset available for other purposes, such as segmentation.
- This improvement replaces the existing behavior of
init_dataset()
, which loaded the entire dataset, incorporating all available features for prediction tasks. - While this approach worked well, it could impose limitations when generating segmented tests and proved somewhat inefficient with large datasets containing numerous features, of which only a small subset were relevant for prediction.
Usage example:
= ['CreditScore', 'Age', 'Balance', 'NumOfProducts', 'EstimatedSalary']
feature_columns
= vm.init_dataset(
vm_train_ds =train_df,
dataset=demo_dataset.target_column,
target_column=feature_columns
feature_columns )
A new notebook illustrates how you can configure these dataset features:
- How to utilize the
feature_columns
parameter when initizalizingvalidmind
datasets and model objects - How
feature_columns
can be used to report by segment
Improvements to run_documentation_tests()
The run_documentation_tests()
function, used to collect and run all the tests associated with a template, now supports running multiple sections at a time.
- This means that you no longer need to call the same function twice for two different sections, reducing the potential for errors and enabling you to use a single
config
object. - The previous behavior was to allow running only one section at a time.
- This change maintains backward compatibility with the existing syntax, requiring no updates to your code.
Existing example usage: Multiple function calls are needed to run multiple sections.
= vm.run_documentation_tests(
full_suite = {
inputs
...
},="section_1",
section={
config"validmind.tests.data_validation.ClassImbalance": ...
}
)
= vm.run_documentation_tests(
full_suite = {
inputs
...
},="section_2",
section={
config"validmind.tests.data_validation.Duplicates": ...
} )
New example usage: A single function call runs multiple sections.
= vm.run_documentation_tests(
full_suite = {
inputs
...
},=["section_1", "section_2"],
section={
config"validmind.tests.data_validation.ClassImbalance": ...,
"validmind.tests.data_validation.Duplicates": ...
} )
Support for custom inputs
The ValidMind Library now supports passing custom inputs as an inputs
dictionary when running individual tests or test suites. This support replaces the standard inputs dataset
, model
, and models
, which are now deprecated.1
New recommended syntax for passing inputs:
= vm.run_documentation_tests(
test_suite ={
inputs"dataset": vm_dataset,
"model": vm_model,
}, )
To make it easier for you to adopt custom inputs, we have updated our how-to notebooks and code samples to use the new recommended syntax:
Enhancements
ValidMind Library (v1.25.3)
Support for Azure OpenAI Service
The ValidMind Library now supports running LLM-powered tests with the Azure OpenAI Service via API,2 in addition to the previously supported OpenAI API.
2 To learn more about configuring Azure OpenAI Service, see Authentication in the official Microsoft documentation.
To work with Azure OpenAI API endpoints, you need to set the following environment variables before calling init()
:
AZURE_OPENAI_KEY
: API key for authenticationAZURE_OPENAI_ENDPOINT
: API endpoint URLAZURE_OPENAI_MODEL
: Specifies the language model or service to useAZURE_OPENAI_VERSION
(optional): Allows specifying a specific version of the service if available
Bug fixes
ValidMind Library (v1.25.3)
Fixed support for OpenAI library >=1.0
- We have updated our demonstration notebooks for large language models (LLMs) to provide the correct support for
openai >= 1.0.0
. - Previously, some notebooks were using an older version of the OpenAI client API.
Deprecations
ValidMind Library (v1.25.3)
Standard inputs are deprecated
The ValidMind Library now supports passing custom inputs3 as an inputs
dictionary when running individual tests or test suites.
- As a result, the standard inputs
dataset
,model
, andmodels
are deprecated and might be removed in a future release. - You should update your code to use the new, recommended syntax.
Deprecated legacy usage for passing inputs:
= vm.run_documentation_tests(
test_suite =vm_dataset,
dataset=vm_model
model )
New recommended usage for passing inputs:
= vm.run_documentation_tests(
test_suite ={
inputs"dataset": vm_dataset,
"model": vm_model,
}, )
Removed deprecated Python Library API methods
The Python API methods run_template
and run_test_plan
had been deprecated previously. They have now been removed from the ValidMind Library.
You’ll need to update your code to use the recommended high-level API methods:
run_template
(removed): Userun_documentation_tests()
run_test_plan
(removed): Userun_test_suite()
Documentation
User guide updates
Updated Python requirements
- We have updated our user guide to clarify the Python versions supported by the ValidMind Library.
- We now support Python ≧3.8 and <3.11.
How to upgrade
ValidMind Platform
To access the latest version of the ValidMind Platform,4 hard refresh your browser tab:
- Windows:
Ctrl
+Shift
+R
ORCtrl
+F5
- MacOS:
⌘ Cmd
+Shift
+R
OR hold down⌘ Cmd
and click theReload
button
ValidMind Library
To upgrade the ValidMind Library:5
In your Jupyter Notebook:
Then within a code cell or your terminal, run:
%pip install --upgrade validmind
You may need to restart your kernel after running the upgrade package for changes to be applied.