validmind.vm_models
Models entrypoint
R_MODEL_TYPES
R_MODEL_TYPES= ['LogisticRegression', 'LinearRegression', 'XGBClassifier', 'XGBRegressor']:
VMInput
classVMInput(ABC):
Base class for ValidMind Input types.
with_options
defwith_options(self,**kwargs:Dict[str, Any]) → validmind.vm_models.VMInput:
Allows for setting options on the input object that are passed by the user when using the input to run a test or set of tests.
To allow options, just override this method in the subclass (see VMDataset) and ensure that it returns a new instance of the input with the specified options set.
Arguments
**kwargs
: Arbitrary keyword arguments that will be passed to the input object.
Returns
- A new instance of the input with the specified options set.
VMDataset
classVMDataset(VMInput):
Base class for VM datasets.
Child classes should be used to support new dataset types (tensor, polars etc.) by converting the user's dataset into a numpy array collecting metadata like column names and then call this (parent) class __init__
method.
This way we can support multiple dataset types but under the hood we only need to work with numpy arrays and pandas dataframes in this class.
Arguments
raw_dataset (np.ndarray)
: The raw dataset as a NumPy array.input_id (str)
: Identifier for the dataset.index (np.ndarray)
: The raw dataset index as a NumPy array.columns (Set[str])
: The column names of the dataset.target_column (str)
: The target column name of the dataset.feature_columns (List[str])
: The feature column names of the dataset.feature_columns_numeric (List[str])
: The numeric feature column names of the dataset.feature_columns_categorical (List[str])
: The categorical feature column names of the dataset.text_column (str)
: The text column name of the dataset for NLP tasks.target_class_labels (Dict)
: The class labels for the target columns.df (pd.DataFrame)
: The dataset as a pandas DataFrame.extra_columns (Dict)
: Extra columns to include in the dataset.
VMDataset
VMDataset(raw_dataset:np.ndarray,input_id:str=None,model:validmind.vm_models.VMModel=None,index:np.ndarray=None,index_name:str=None,date_time_index:bool=False,columns:list=None,target_column:str=None,feature_columns:list=None,text_column:str=None,extra_columns:dict=None,target_class_labels:dict=None)
Initializes a VMDataset instance.
Arguments
raw_dataset (np.ndarray)
: The raw dataset as a NumPy array.input_id (str)
: Identifier for the dataset.model (VMModel)
: Model associated with the dataset.index (np.ndarray)
: The raw dataset index as a NumPy array.index_name (str)
: The raw dataset index name as a NumPy array.date_time_index (bool)
: Whether the index is a datetime index.columns (List[str], optional)
: The column names of the dataset. Defaults to None.target_column (str, optional)
: The target column name of the dataset. Defaults to None.feature_columns (str, optional)
: The feature column names of the dataset. Defaults to None.text_column (str, optional)
: The text column name of the dataset for nlp tasks. Defaults to None.target_class_labels (Dict, optional)
: The class labels for the target columns. Defaults to None.
add_extra_column
defadd_extra_column(self,column_name,column_values=None):
Adds an extra column to the dataset without modifying the dataset features
and target
columns.
Arguments
column_name (str)
: The name of the extra column.column_values (np.ndarray)
: The values of the extra column.
assign_predictions
defassign_predictions(self,model:validmind.vm_models.VMModel,prediction_column:Optional[str]=None,prediction_values:Optional[List[Any]]=None,probability_column:Optional[str]=None,probability_values:Optional[List[float]]=None,prediction_probabilities:Optional[List[float]]=None,**kwargs:Dict[str, Any]):
Assign predictions and probabilities to the dataset.
Arguments
model (VMModel)
: The model used to generate the predictions.prediction_column (Optional[str])
: The name of the column containing the predictions.prediction_values (Optional[List[Any]])
: The values of the predictions.probability_column (Optional[str])
: The name of the column containing the probabilities.probability_values (Optional[List[float]])
: The values of the probabilities.prediction_probabilities (Optional[List[float]])
: DEPRECATED: The values of the probabilities.**kwargs
: Additional keyword arguments that will get passed through to the model'spredict
method.
prediction_column
defprediction_column(self,model:validmind.vm_models.VMModel,column_name:str=None) → str:
Get or set the prediction column for a model.
probability_column
defprobability_column(self,model:validmind.vm_models.VMModel,column_name:str=None) → str:
Get or set the probability column for a model.
target_classes
deftarget_classes(self):
Returns the target class labels or unique values of the target column.
with_options
defwith_options(self,**kwargs:Dict[str, Any]) → validmind.vm_models.VMDataset:
Support options provided when passing an input to run_test or run_test_suite
Arguments
**kwargs
: Options:- columns: Filter columns in the dataset
Returns
- A new instance of the dataset with only the specified columns
x_df
defx_df(self):
Returns a dataframe containing only the feature columns
y_df
defy_df(self) → pd.DataFrame:
Returns a dataframe containing the target column
y_pred
defy_pred(self,model) → np.ndarray:
Returns the predictions for a given model.
Attempts to stack complex prediction types (e.g., embeddings) into a single, multi-dimensional array.
Arguments
model (VMModel)
: The model whose predictions are sought.
Returns
- The predictions for the model
y_pred_df
defy_pred_df(self,model) → pd.DataFrame:
Returns a dataframe containing the predictions for a given model
y_prob
defy_prob(self,model) → np.ndarray:
Returns the probabilities for a given model.
Arguments
model (str)
: The ID of the model whose predictions are sought.
Returns
- The probability variables.
y_prob_df
defy_prob_df(self,model) → pd.DataFrame:
Returns a dataframe containing the probabilities for a given model
df
df():
Returns the dataset as a pandas DataFrame.
Returns
- The dataset as a pandas DataFrame.
x
x():
Returns the input features (X) of the dataset.
Returns
- The input features.
y
y():
Returns the target variables (y) of the dataset.
Returns
- The target variables.
VMModel
classVMModel(VMInput):
An base class that wraps a trained model instance and its associated data.
Arguments
model (object, optional)
: The trained model instance. Defaults to None.input_id (str, optional)
: The input ID for the model. Defaults to None.attributes (ModelAttributes, optional)
: The attributes of the model. Defaults to None.name (str, optional)
: The name of the model. Defaults to the class name.
VMModel
VMModel(input_id:str=None,model:object=None,attributes:validmind.vm_models.ModelAttributes=None,name:str=None,**kwargs)
predict
@abstractmethod
defpredict(self,*args,**kwargs):
Predict method for the model. This is a wrapper around the model's
predict_proba
defpredict_proba(self,*args,**kwargs):
Predict probabilties - must be implemented by subclass if needed
serialize
defserialize(self):
Serializes the model to a dictionary so it can be sent to the API
Figure
@dataclass
classFigure:
Figure objects track the schema supported by the ValidMind API.
Figure
Figure(key:str,figure:Union[matplotlib.validmind.vm_models.figure.Figure, go.Figure, go.validmind.vm_models.FigureWidget, bytes],ref_id:str,_type:str='plot')
serialize
defserialize(self):
Serializes the Figure to a dictionary so it can be sent to the API.
serialize_files
defserialize_files(self):
Creates a requests
-compatible files object to be sent to the API.
to_widget
defto_widget(self):
Returns the ipywidget compatible representation of the figure. Ideally we would render images as-is, but Plotly FigureWidgets don't work well on Google Colab when they are combined with ipywidgets.
ModelAttributes
@dataclass
classModelAttributes:
Model attributes definition.
ModelAttributes
ModelAttributes(architecture:str=None,framework:str=None,framework_version:str=None,language:str=None,task:validmind.vm_models.ModelTask=None)
from_dict
@classmethod
deffrom_dict(cls,data):
Creates a ModelAttributes instance from a dictionary.
ResultTable
@dataclass
classResultTable:
A dataclass that holds the table summary of result.
ResultTable
ResultTable(data:Union[List[Any], pd.DataFrame],title:Optional[str]=None)
serialize
defserialize(self):
TestResult
@dataclass
classTestResult(Result):
Test result.
TestResult
TestResult(result_id:str=None,name:str='Test Result',ref_id:str=None,title:Optional[str]=None,doc:Optional[str]=None,description:Optional[Union[str, validmind.vm_models.DescriptionFuture]]=None,metric:Optional[Union[int, float]]=None,tables:Optional[List[validmind.vm_models.ResultTable]]=None,raw_data:Optional[validmind.vm_models.RawData]=None,figures:Optional[List[Figure]]=None,passed:Optional[bool]=None,params:Optional[Dict[str, Any]]=None,inputs:Optional[Dict[str, Union[List[validmind.vm_models.VMInput], validmind.vm_models.VMInput]]]=None,metadata:Optional[Dict[str, Any]]=None,_was_description_generated:bool=False,_unsafe:bool=False,_client_config_cache:Optional[Any]=None)
add_figure
defadd_figure(self,figure:Union[matplotlib.validmind.vm_models.figure.Figure, go.Figure, go.validmind.vm_models.FigureWidget, bytes, Figure]):
Add a new figure to the result.
Arguments
figure
: The figure to add. Can be one of:- matplotlib.figure.Figure: A matplotlib figure
- plotly.graph_objs.Figure: A plotly figure
- plotly.graph_objs.FigureWidget: A plotly figure widget
- bytes: A PNG image as raw bytes
- validmind.vm_models.figure.Figure: A ValidMind figure object.
Returns
- None.
add_table
defadd_table(self,table:Union[validmind.vm_models.ResultTable, pd.DataFrame, List[Dict[str, Any]]],title:Optional[str]=None):
Add a new table to the result.
Arguments
table (Union[ResultTable, pd.DataFrame, List[Dict[str, Any]]])
: The table to add.title (Optional[str])
: The title of the table (can optionally be provided for pd.DataFrame and List[Dict[str, Any]] tables).
check_result_id_exist
defcheck_result_id_exist(self):
Check if the result_id exists in any test block across all sections.
log
deflog(self,section_id:str=None,position:int=None,unsafe:bool=False):
Log the result to ValidMind.
Arguments
section_id (str)
: The section ID within the model document to insert the test result.position (int)
: The position (index) within the section to insert the test result.unsafe (bool)
: If True, log the result even if it contains sensitive data i.e. raw data from input datasets.
log_async
async deflog_async(self,section_id:str=None,position:int=None,unsafe:bool=False):
remove_figure
defremove_figure(self,index:int=0):
Remove a figure from the result by index.
Arguments
index (int)
: The index of the figure to remove (default is 0).
remove_table
defremove_table(self,index:int):
Remove a table from the result by index.
Arguments
index (int)
: The index of the table to remove (default is 0).
serialize
defserialize(self):
Serialize the result for the API.
to_widget
defto_widget(self):
test_name
test_name():
Get the test name, using custom title if available.
TestSuite
@dataclass
classTestSuite:
Base class for test suites. Test suites are used to define a grouping of tests that can be run as a suite against datasets and models. Test Suites can be defined by inheriting from this base class and defining the list of tests as a class variable.
Tests can be a flat list of strings or may be nested into sections by using a dict.
TestSuite
TestSuite(sections:List[validmind.vm_models.TestSuiteSection]=None)
get_default_config
defget_default_config(self) → dict:
Returns the default configuration for the test suite.
Each test in a test suite can accept parameters and those parameters can have default values. Both the parameters and their defaults are set in the test class and a config object can be passed to the test suite's run method to override the defaults. This function returns a dictionary containing the parameters and their default values for every test to allow users to view and set values.
Returns
- A dictionary of test names and their default parameters.
get_tests
defget_tests(self) → List[str]:
Get all test suite test objects from all sections.
num_tests
defnum_tests(self) → int:
Returns the total number of tests in the test suite.
TestSuiteRunner
classTestSuiteRunner:
Runs a test suite.
TestSuiteRunner
TestSuiteRunner(suite:validmind.vm_models.TestSuite,config:dict=None,inputs:dict=None)
log_results
async deflog_results(self):
Logs the results of the test suite to ValidMind.
This method will be called after the test suite has been run and all results have been collected. This method will log the results to ValidMind.
run
defrun(self,send:bool=True,fail_fast:bool=False):
Runs the test suite, renders the summary and sends the results to ValidMind.
Arguments
send (bool, optional)
: Whether to send the results to ValidMind. Defaults to True.fail_fast (bool, optional)
: Whether to stop running tests after the first failure. Defaults to False.
summarize
defsummarize(self,show_link:bool=True):