from validmind import init
init()
Dataset Column Filters when Running Tests
To run a test on a dataset but only include certain columns from that dataset, you can use the new columns
option. This is done by passing a dictionary for the dataset
input for the test instead of the dataset object or dataset input ID directly. This dictionary should have the following keys:
input_id
: The dataset input ID for the dataset you want to usecolumns
: A list of the column names from the original dataset to include in the dataset that will be used for the test
This mechanism is intended to be general enough to support many other types of options that users may want to set on datasets or models that apply to a specific test or set of tests. These can be implemented in the with_options()
method of the VMDataset
or VMModel
classes which gets called when the user passes the above described dictionary for the dataset
or model
input. The with_options()
method should return a new instance of the class with the specified options set.
import pandas as pd
from validmind import init_dataset
= pd.DataFrame(
my_df
{"col1": [1, 1, 3],
"col2": [4, 5, 6],
"target": [0, 1, 0],
}
)
= init_dataset(my_df, target_column="target", input_id="my_dataset") my_dataset
from validmind.tests import run_test
= run_test(
result ="validmind.data_validation.DatasetDescription",
test_id={"dataset": {"input_id": "my_dataset", "columns": ["col1"]}},
inputs
)
# notice that only the column "col1" is shown in the test result