EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use
ValidMind for model validation 2 — Start the model validation process
Learn how to use ValidMind for your end-to-end model validation process with our series of four introductory notebooks. In this second notebook, independently verify the data quality tests performed on the dataset used to train the champion model.
You'll learn how to run relevant validation tests with ValidMind, log the results of those tests to the ValidMind Platform, and insert your logged test results as evidence into your validation report. You'll become familiar with the tests available in ValidMind, as well as how to run them. Running tests during model validation is crucial to the effective challenge process, as we want to independently evaluate the evidence and assessments provided by the model development team.
While running our tests in this notebook, we'll focus on:
Ensuring that data used for training and testing the model is of appropriate data quality
Ensuring that the raw data has been preprocessed appropriately and that the resulting final datasets reflects this
Our course tailor-made for validators new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — Validator Fundamentals
Prerequisites
In order to independently assess the quality of your datasets with notebook, you'll need to first have:
# Make sure the ValidMind Library is installed%pip install -q validmind# Load your model identifier credentials from an `.env` file%load_ext dotenv%dotenv .env# Or replace with your code snippetimport validmind as vmvm.init(# api_host="...",# api_key="...",# api_secret="...",# model="...",)
Note: you may need to restart the kernel to use updated packages.
2025-06-05 01:24:32,316 - INFO(validmind.api_client): 🎉 Connected to ValidMind!
📊 Model: [ValidMind Academy] Model validation (ID: cmalguc9y02ok199q2db381ib)
📁 Document Type: validation_report
Load the sample dataset
Let's first import the public Bank Customer Churn Prediction dataset from Kaggle, which was used to develop the dummy champion model.
We'll use this dataset to review steps that should have been conducted during the initial development and documentation of the model to ensure that the model was built correctly. By independently performing steps taken by the model development team, we can confirm whether the model was built using appropriate and properly processed data.
In our below example, note that:
The target column, Exited has a value of 1 when a customer has churned and 0 otherwise.
The ValidMind Library provides a wrapper to automatically load the dataset as a Pandas DataFrame object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns.
from validmind.datasets.classification import customer_churn as demo_datasetprint(f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}")raw_df = demo_dataset.load_data()raw_df.head()
Loaded demo dataset with:
• Target column: 'Exited'
• Class labels: {'0': 'Did not exit', '1': 'Exited'}
CreditScore
Geography
Gender
Age
Tenure
Balance
NumOfProducts
HasCrCard
IsActiveMember
EstimatedSalary
Exited
0
619
France
Female
42
2
0.00
1
1
1
101348.88
1
1
608
Spain
Female
41
1
83807.86
1
0
1
112542.58
0
2
502
France
Female
42
8
159660.80
3
1
0
113931.57
1
3
699
France
Female
39
1
0.00
2
0
0
93826.63
0
4
850
Spain
Female
43
2
125510.82
1
1
1
79084.10
0
Verifying data quality adjustments
Let's say that thanks to the documentation submitted by the model development team (Learn more ...), we know that the sample dataset was first modified before being used to train the champion model. After performing some data quality assessments on the raw dataset, it was determined that the dataset required rebalancing, and highly correlated features were also removed.
Identify qualitative tests
During model validation, we use the same data processing logic and training procedure to confirm that the model's results can be reproduced independently, so let's start by doing some data quality assessments by running a few individual tests just like the development team did.
tasks represent the kind of modeling task associated with a test. Here we'll focus on classification tasks.
tags are free-form descriptions providing more details about the test, for example, what category the test falls into. Here we'll focus on the data_quality tag.
# Get the list of available task typessorted(vm.tests.list_tasks())
Want to learn more about navigating ValidMind tests?
Refer to our notebook outlining the utilities available for viewing and understanding available ValidMind tests: Explore tests
Initialize the ValidMind datasets
With the individual tests we want to run identified, the next step is to connect your data with a ValidMind Dataset object. This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind, but you only need to do it once per dataset.
Initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module. For this example, we'll pass in the following arguments:
dataset — The raw dataset that you want to provide as input to tests.
input_id — A unique identifier that allows tracking what inputs are used when running each individual test.
target_column — A required argument if tests require access to true values. This is the name of the target column in the dataset.
# vm_raw_dataset is now a VMDataset object that you can pass to any ValidMind testvm_raw_dataset = vm.init_dataset( dataset=raw_df, input_id="raw_dataset", target_column="Exited",)
Run data quality tests
Now that we know how to initialize a ValidMind dataset object, we're ready to run some tests!
You run individual tests by calling the run_test function provided by the validmind.tests module. For the examples below, we'll pass in the following arguments:
test_id — The ID of the test to run, as seen in the ID column when you run list_tests.
params — A dictionary of parameters for the test. These will override any default_params set in the test definition.
The output above shows that the class imbalance test did not pass according to the value we set for min_percent_threshold — great, this matches what was reported by the model development team.
To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:
import pandas as pdraw_copy_df = raw_df.sample(frac=1) # Create a copy of the raw dataset# Create a balanced dataset with the same number of exited and not exited customersexited_df = raw_copy_df.loc[raw_copy_df["Exited"] ==1]not_exited_df = raw_copy_df.loc[raw_copy_df["Exited"] ==0].sample(n=exited_df.shape[0])balanced_raw_df = pd.concat([exited_df, not_exited_df])balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42)
With this new balanced dataset, you can re-run the individual test to see if it now passes the class imbalance test requirement.
As this is technically a different dataset, remember to first initialize a new ValidMind Dataset object to pass in as input as required by run_test():
# Register new data and now 'balanced_raw_dataset' is the new dataset object of interestvm_balanced_raw_dataset = vm.init_dataset( dataset=balanced_raw_df, input_id="balanced_raw_dataset", target_column="Exited",)
# Pass the initialized `balanced_raw_dataset` as input into the test runresult = vm.tests.run_test( test_id="validmind.data_validation.ClassImbalance", inputs={"dataset": vm_balanced_raw_dataset}, params={"min_percent_threshold": 30},)
Remove highly correlated features
Next, let's also remove highly correlated features from our dataset as outlined by the development team. Removing highly correlated features helps make the model simpler, more stable, and easier to understand.
You can utilize the output from a ValidMind test for further use — in this below example, to retrieve the list of features with the highest correlation coefficients and use them to reduce the final list of features for modeling.
Let's remove the highly correlated features and create a new VM dataset object.
We'll begin by checking out the table in the result and extracting a list of features that failed the test:
# Extract table from `corr_result.tables`features_df = corr_result.tables[0].datafeatures_df
Columns
Coefficient
Pass/Fail
0
(Age, Exited)
0.3473
Fail
1
(IsActiveMember, Exited)
-0.1917
Pass
2
(Balance, NumOfProducts)
-0.1820
Pass
3
(Balance, Exited)
0.1490
Pass
4
(Age, Balance)
0.0611
Pass
5
(NumOfProducts, Exited)
-0.0539
Pass
6
(NumOfProducts, IsActiveMember)
0.0489
Pass
7
(Age, NumOfProducts)
-0.0456
Pass
8
(HasCrCard, IsActiveMember)
-0.0427
Pass
9
(CreditScore, EstimatedSalary)
-0.0404
Pass
# Extract list of features that failed the testhigh_correlation_features = features_df[features_df["Pass/Fail"] =="Fail"]["Columns"].tolist()high_correlation_features
['(Age, Exited)']
Next, extract the feature names from the list of strings (example: (Age, Exited) > Age):
high_correlation_features = [feature.split(",")[0].strip("()") for feature in high_correlation_features]high_correlation_features
['Age']
Now, it's time to re-initialize the dataset with the highly correlated features removed.
Note the use of a different input_id. This allows tracking the inputs used when running each individual test.
# Remove the highly correlated features from the datasetbalanced_raw_no_age_df = balanced_raw_df.drop(columns=high_correlation_features)# Re-initialize the dataset objectvm_raw_dataset_preprocessed = vm.init_dataset( dataset=balanced_raw_no_age_df, input_id="raw_dataset_preprocessed", target_column="Exited",)
Re-running the test with the reduced feature set should pass the test:
Now that we've done some analysis on two different datasets, we can use ValidMind to easily document why certain things were done to our raw data with testing to support it. Every test result returned by the run_test() function has a .log() method that can be used to send the test results to the ValidMind Platform.
When logging validation test results to the platform, you'll need to manually add those results to the desired section of the validation report. To demonstrate how to add test results to your validation report, we'll log our data quality tests and insert the results via the ValidMind Platform.
Configure and run comparison tests
Below, we'll perform comparison tests between the original raw dataset (raw_dataset) and the final preprocessed (raw_dataset_preprocessed) dataset, again logging the results to the ValidMind Platform.
We can specify all the tests we'd ike to run in a dictionary called test_config, and we'll pass in the following arguments for each test:
params: Individual test parameters.
input_grid: Individual test inputs to compare. In this case, we'll input our two datasets for comparison.
Note here that the input_grid expects the input_id of the dataset as the value rather than the variable name we specified:
for t in test_config:print(t)try:# Check if test has input_gridif'input_grid'in test_config[t]:# For tests with input_grid, pass the input_grid configurationif'params'in test_config[t]: vm.tests.run_test(t, input_grid=test_config[t]['input_grid'], params=test_config[t]['params']).log()else: vm.tests.run_test(t, input_grid=test_config[t]['input_grid']).log()else:# Original logic for regular inputsif'params'in test_config[t]: vm.tests.run_test(t, inputs=test_config[t]['inputs'], params=test_config[t]['params']).log()else: vm.tests.run_test(t, inputs=test_config[t]['inputs']).log()exceptExceptionas e:print(f"Error running test {t}: {str(e)}")
validmind.data_validation.ClassImbalance
2025-06-05 01:26:20,463 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.ClassImbalance does not exist in model's document
validmind.data_validation.HighPearsonCorrelation
2025-06-05 01:26:39,436 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.HighPearsonCorrelation does not exist in model's document
Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for some test IDs.
That's expected, as when we run validations tests the results logged need to be manually added to your report as part of your compliance assessment process within the ValidMind Platform.
Log tests with unique identifiers
Next, we'll use the previously initialized vm_balanced_raw_dataset (that still has a highly correlated Age column) as input to run an individual test, then log the result to the ValidMind Platform.
When running individual tests, you can use a custom result_id to tag the individual result with a unique identifier:
This result_id can be appended to test_id with a : separator.
The balanced_raw_dataset result identifier will correspond to the balanced_raw_dataset input, the dataset that still has the Age column.
result = vm.tests.run_test( test_id="validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset", params={"max_threshold": 0.3}, inputs={"dataset": vm_balanced_raw_dataset},)result.log()
2025-06-05 01:27:00,143 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset does not exist in model's document
Add test results to reporting
With some test results logged, let's head to the model we connected to at the beginning of this notebook and learn how to insert a test result into our validation report (Need more help?).
While the example below focuses on a specific test result, you can follow the same general procedure for your other results:
From the Inventory in the ValidMind Platform, go to the model you connected to earlier.
In the left sidebar that appears for your model, click Validation Report.
Locate the Data Preparation section and click on 2.2.1. Data Quality to expand that section.
Under the Class Imbalance Assessment section, locate Validator Evidence then click Link Evidence to Report:
Select the Class Imbalance test results we logged: ValidMind Data Validation Class Imbalance
Click Update Linked Evidence to add the test results to the validation report.
Confirm that the results for the Class Imbalance test you inserted has been correctly inserted into section 2.2.1. Data Quality of the report:
Note that these test results are flagged as Requires Attention — as they include comparative results from our initial raw dataset.
Click See evidence details to review the LLM-generated description that summarizes the test results, that confirm that our final preprocessed dataset actually passes our test:
Here in this text editor, you can make qualitative edits to the draft that ValidMind generated to finalize the test results.
With our raw dataset rebalanced with highly correlated features removed, let's now spilt our dataset into train and test in preparation for model evaluation testing.
To start, let's grab the first few rows from the balanced_raw_no_age_df dataset we initialized earlier:
balanced_raw_no_age_df.head()
CreditScore
Geography
Gender
Tenure
Balance
NumOfProducts
HasCrCard
IsActiveMember
EstimatedSalary
Exited
5854
651
France
Female
4
38617.20
1
1
1
104876.80
0
3394
782
Germany
Male
7
98556.89
2
1
0
117644.36
0
1646
595
Germany
Female
9
150463.11
2
0
1
81548.38
0
4793
659
France
Female
9
23503.31
1
0
1
169862.01
1
6920
765
France
Female
1
0.00
1
1
0
13228.93
1
Before training the model, we need to encode the categorical features in the dataset:
Use the OneHotEncoder class from the sklearn.preprocessing module to encode the categorical features.
The categorical features in the dataset are Geography and Gender.
Splitting our dataset into training and testing is essential for proper validation testing, as this helps assess how well the model generalizes to unseen data:
We start by dividing our balanced_raw_no_age_df dataset into training and test subsets using train_test_split, with 80% of the data allocated to training (train_df) and 20% to testing (test_df).
From each subset, we separate the features (all columns except "Exited") into X_train and X_test, and the target column ("Exited") into y_train and y_test.