Dataset Column Filters when Running Tests

To run a test on a dataset but only include certain columns from that dataset, you can use the new columns option. This is done by passing a dictionary for the dataset input for the test instead of the dataset object or dataset input ID directly. This dictionary should have the following keys:

This mechanism is intended to be general enough to support many other types of options that users may want to set on datasets or models that apply to a specific test or set of tests. These can be implemented in the with_options() method of the VMDataset or VMModel classes which gets called when the user passes the above described dictionary for the dataset or model input. The with_options() method should return a new instance of the class with the specified options set.

from validmind import init

init()
import pandas as pd

from validmind import init_dataset

my_df = pd.DataFrame(
    {
        "col1": [1, 1, 3],
        "col2": [4, 5, 6],
        "target": [0, 1, 0],
    }
)

my_dataset = init_dataset(my_df, target_column="target", input_id="my_dataset")
from validmind.tests import run_test

result = run_test(
    test_id="validmind.data_validation.DatasetDescription",
    inputs={"dataset": {"input_id": "my_dataset", "columns": ["col1"]}},
)

# notice that only the column "col1" is shown in the test result