Dataset Column Filters when Running Tests

To run a test on a dataset but only include certain columns from that dataset, you can use the new columns option. This is done by passing a dictionary for the dataset input for the test instead of the dataset object or dataset input ID directly. This dictionary should have the following keys:

input_id: The dataset input ID for the dataset you want to use
columns: A list of the column names from the original dataset to include in the dataset that will be used for the test

This mechanism is intended to be general enough to support many other types of options that users may want to set on datasets or models that apply to a specific test or set of tests. These can be implemented in the with_options() method of the VMDataset or VMModel classes which gets called when the user passes the above described dictionary for the dataset or model input. The with_options() method should return a new instance of the class with the specified options set.

from validmind import init

init()

import pandas as pd

from validmind import init_dataset

my_df = pd.DataFrame(
    {
        "col1": [1, 1, 3],
        "col2": [4, 5, 6],
        "target": [0, 1, 0],
    }
)

my_dataset = init_dataset(my_df, target_column="target", input_id="my_dataset")

from validmind.tests import run_test

result = run_test(
    test_id="validmind.data_validation.DatasetDescription",
    inputs={"dataset": {"input_id": "my_dataset", "columns": ["col1"]}},
)

# notice that only the column "col1" is shown in the test result