flowchart LR
subgraph Inputs
Model[Model Object]
Dataset[Dataset]
Precomputed[Precomputed Predictions]
end
Model --> predict["predict()"]
Model --> predict_proba["predict_proba()"]
Precomputed --> Dataset
subgraph TestTypes[Test Types]
ModelTests[Model Tests]
DatasetTests[Dataset Tests]
ClassificationTests[Classification Metrics]
end
predict --> ModelTests
predict --> DatasetTests
predict_proba --> ClassificationTests
Dataset --> DatasetTests
Supported models and frameworks
Learn about the wide range of models and frameworks for testing and documentation supported by the ValidMind Library. Understand which frameworks are supported, what your model needs to provide for tests to run, and how to work with models that don't fit standard patterns.
What does supported mean?
Supported here means the ValidMind Library provides dedicated wrappers, install options, and guidance for these models and frameworks.
You can also use other code with the library for models that don't fit standard framework patterns. For example, you can wrap any Python callable with FunctionModel or pass precomputed predictions when you don't have a model object.
- Framework-agnostic support
-
Compatible with any Python object that implements a
predict()method. Framework-specific wrappers provide extra features and convenience, but they are not required to use generic Python models.
- Vendor model compatibility
- Works with both first-party models you create and third-party models from external vendors, enabling flexible integration with both proprietary and external sources.
AI systems
LLM-based classification models via OpenAI-compatible APIs.
- Test suite:
LLMClassifierFullSuite - Includes prompt validation, text data quality, classifier metrics
Hugging Face and other NLP classification models.
- Test suite:
NLPClassifierFullSuite - Text classification, sentiment analysis, named entity recognition
Models that generate concise summaries from longer texts.
- Test suite:
SummarizationMetrics - BERT, BLEU, METEOR, ROUGE score evaluation
Text and vector embedding models.
- Test suite:
EmbeddingsFullSuite - Similarity metrics, stability analysis, visualization
Retrieval-augmented generation pipelines.
- Tests: RAGAS integration
- Refer to RAG evaluation for dataset requirements and available tests
Machine learning models
NLP and tabular models using Hugging Face Transformers.
- Test suite:
NLPClassifierFullSuite(for NLP),ClassifierFullSuiteorRegressionFullSuite(for tabular) - Text classification, tabular classification/regression
Any PyTorch neural network architecture, including LSTM, RNN, CNN, transformers, and custom architectures.
- Test suite:
ClassifierFullSuiteorRegressionFullSuite(depending on task) - Model performance, feature importance, diagnosis
Gradient boosting and ensemble tree methods (XGBoost, CatBoost, random forest).
- Test suite:
ClassifierFullSuiteorRegressionFullSuite - Confusion matrix, ROC/AUC, feature importance, SHAP values
Distance-based classification and regression.
- Test suite:
ClassifierFullSuiteorRegressionFullSuite - Performance metrics, model comparison
Any sklearn-compatible clustering algorithm, including K-means, DBSCAN, and hierarchical clustering.
- Test suite:
ClusterFullSuite - Silhouette score, homogeneity, completeness, cluster distribution
Traditional statistical models
Models the relationship between a scalar response and one or more explanatory variables.
- Test suite:
RegressionFullSuite - R² score, regression errors, feature importance
Models the probability of a binary outcome based on one or more predictor variables.
- Test suite:
ClassifierFullSuite - Confusion matrix, ROC/AUC, precision/recall, feature importance
ARIMA, VAR, and other statsmodels time series models.
- Test suite:
TimeSeriesModelValidation - Stationarity tests, autocorrelation analysis, forecast evaluation
Frameworks
The ValidMind Library provides wrapper classes for common frameworks. When you call vm.init_model(), the library automatically detects your model's framework from its module, such as sklearn, torch, or transformers, and selects the appropriate wrapper. You don't need to specify the wrapper class manually.
Gradient boosting on decision trees with categorical feature support.
- Wrapper:
CatBoostModel - Install:
pip install validmind
Pre-trained models for text classification, summarization, translation, and generation from the Hugging Face Transformers library.
- Wrapper:
HFModel - Install:
pip install validmind[huggingface] - Supported tasks:
text_classification,text2text_generation(summarization),feature_extraction(embeddings)
Access to OpenAI and Azure OpenAI models for text generation and analysis via OpenAI-compatible APIs.
- Wrapper:
FoundationModel - Install:
pip install validmind[llm]
An open-source machine learning library for computer vision and natural language processing.
- Wrapper:
PyTorchModel - Install:
pip install validmind[pytorch]
Integrates R's statistical computing with Python via rpy2.
- Wrapper:
RModel - Install:
pip install validmind rpy2 - Supported model types:
glm(including logistic regression),lm,xgb.Booster - Initialization: Use
vm.init_r_model(model_path, input_id)to load R model objects
A Python library for machine learning, offering supervised and unsupervised learning algorithms.
- Wrapper:
SKlearnModel - Install:
pip install validmind
A Python module for statistical models, tests, and data exploration.
- Wrapper:
StatsModelsModel - Install:
pip install validmind[stats]
An optimized gradient boosting library for classification and regression.
- Wrapper:
XGBoostModel - Install:
pip install validmind[xgboost]
Wrap custom prediction functions or models from otherwise unsupported frameworks.
- Wrapper:
FunctionModel - Install:
pip install validmind
To install all optional dependencies:
pip install validmind[all]Test input requirements
Different tests require different inputs from your model and dataset. Understanding these requirements helps you run the right tests for your use case.1
Most model tests call your model's predict() method to generate predictions. This includes:
- Performance metrics (accuracy, precision, recall, F1)
- Error analysis tests
- Robustness tests
Classification metrics that evaluate probability outputs require predict_proba():
- ROC-AUC score
- Precision-recall curves
- Calibration tests
- Probability distribution analysis
If your model doesn't have predict_proba(), these tests will be skipped or return an error.
Test input flow
Using precomputed predictions
If you can't provide a model object because your model runs in a separate environment, you can pass precomputed predictions directly to the dataset:
vm_dataset = vm.init_dataset(
dataset=df,
target_column="target",
prediction_values=predictions, # numpy array of predictions
probability_values=probabilities # optional: for classification
)Alternatively, if predictions are already in your dataframe:
vm_dataset = vm.init_dataset(
dataset=df,
target_column="target",
prediction_column="predicted", # column name in df
probability_column="probability" # optional: for classification
)Dataset-only tests
Some tests analyze data quality and don't require a model at all:
- Missing value analysis
- Class imbalance detection
- Feature correlation
- Outlier detection
- Data drift tests
Supported dataset formats
The library accepts multiple data formats when initializing datasets with vm.init_dataset():
| Format | Notes |
|---|---|
| Pandas DataFrame | Primary format, used internally |
| Polars DataFrame | Converted to pandas internally |
| NumPy ndarray | Requires column names to be specified |
| PyTorch TensorDataset | Requires pip install validmind[pytorch] |
Custom model wrappers
For models that don't fit standard framework patterns, use these flexible wrappers:
Wrap any Python callable as a model:
from validmind.models import FunctionModel
def my_predict(X):
# Your prediction logic here
return predictions
vm_model = vm.init_model(
model=FunctionModel(predict_fn=my_predict),
input_id="my_model"
)Chain multiple models or processing steps using the | operator between VMModel instances:
from validmind.models import PipelineModel
vm_model = vm.init_model(
model=PipelineModel(vm_preprocessor | vm_model),
input_id="my_pipeline"
)Or, pass the pipeline directly:
vm_model = vm.init_model(vm_preprocessor | vm_model, input_id="my_pipeline")Use when you have model metadata but no inference capability — for example, when documenting a model deployed in an external system. Calling vm.init_model() with attributes creates a MetadataModel implicitly:
vm_model = vm.init_model(
input_id="external_model",
attributes={
"architecture": "ExternalModel",
"language": "Python",
},
)With a metadata-only model, you can still run dataset-only tests and use precomputed predictions.
Setting up for LLMs and GenAI
This section covers how to configure and test large language models and generative AI systems.
Use FoundationModel for LLM endpoints which wraps a callable predict_fn and a Prompt:
import validmind as vm
from openai import OpenAI
from validmind.models import FoundationModel, Prompt
client = OpenAI()
def call_model(prompt: str) -> str:
return (
client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
)
.choices[0]
.message.content
)
vm_model = vm.init_model(
model=FoundationModel(
predict_fn=call_model,
prompt=Prompt(
template="Classify the sentiment: {text}",
variables=["text"],
),
),
input_id="sentiment_classifier",
)LLM test suites
Install LLM dependencies:
pip install validmind[llm]Available test suites for LLMs include:
- Prompt injection detection
- Output consistency
- Hallucination detection
- Toxicity analysis
- Bias evaluation
Tests such as bias, clarity, robustness, and toxicity can be applied to any text-generating model, regardless of type.
RAG evaluation
For retrieval-augmented generation (RAG) systems, the ValidMind Library integrates with RAGAS for comprehensive evaluation.
Dataset requirements
RAG evaluation expects these logical fields, but you can map your own column names using test parameters:
| Field | Type | Description |
|---|---|---|
user_input |
str | The user's query |
response |
str | Model output |
retrieved_contexts |
List[str] | Retrieved context chunks |
reference |
str | Ground truth (required for some tests) |
To map custom column names, pass them as parameters when running the test:
run_test(
"validmind.model_validation.ragas.Faithfulness",
inputs={"dataset": vm_test_ds},
params={
"user_input_column": "question",
"response_column": "rag_model_prediction",
"retrieved_contexts_column": "retrieval_model_prediction",
},
).log()Available RAG tests
- Faithfulness — Measures how well the response is grounded in retrieved contexts
- Context Recall — Evaluates if relevant information was retrieved
- Context Precision — Measures relevance of retrieved contexts
- Answer Relevancy — Assesses if the response addresses the query
Python and dependency compatibility
The ValidMind Library requires:
- Python: >=3.9, <3.13
- Core dependencies: pandas, numpy, scikit-learn
Optional dependencies for specific frameworks:
| Extra | Includes |
|---|---|
pytorch |
PyTorch, torchvision |
llm |
OpenAI, langchain, ragas, deepeval |
xgboost |
XGBoost |
huggingface |
Transformers, sentencepiece |
nlp |
langdetect, nltk, textblob, evaluate, rouge, bert-score |
stats |
scipy, statsmodels, arch |
explainability |
SHAP |
credit_risk |
scorecardpy |
datasets |
Hugging Face datasets |
pii-detection |
presidio-analyzer, presidio-structured |
all |
All optional dependencies |