Supported models and frameworks

Published

May 12, 2026

Learn about the wide range of models and frameworks for testing and documentation supported by the ValidMind Library. Understand which frameworks are supported, what your model needs to provide for tests to run, and how to work with models that don't fit standard patterns.

What does supported mean?

Supported here means the ValidMind Library provides dedicated wrappers, install options, and guidance for these models and frameworks.

You can also use other code with the library for models that don't fit standard framework patterns. For example, you can wrap any Python callable with FunctionModel or pass precomputed predictions when you don't have a model object.

Framework-agnostic support: Compatible with any Python object that implements a predict() method. Framework-specific wrappers provide extra features and convenience, but they are not required to use generic Python models.

Vendor model compatibility: Works with both first-party models you create and third-party models from external vendors, enabling flexible integration with both proprietary and external sources.

AI systems

LLM-based classification models via OpenAI-compatible APIs.

Test suite: LLMClassifierFullSuite
Includes prompt validation, text data quality, classifier metrics

Hugging Face and other NLP classification models.

Test suite: NLPClassifierFullSuite
Text classification, sentiment analysis, named entity recognition

Models that generate concise summaries from longer texts.

Test suite: SummarizationMetrics
BERT, BLEU, METEOR, ROUGE score evaluation

Text and vector embedding models.

Test suite: EmbeddingsFullSuite
Similarity metrics, stability analysis, visualization

Retrieval-augmented generation pipelines.

Tests: RAGAS integration
Refer to RAG evaluation for dataset requirements and available tests

LLM-based agents that use tools and multi-step reasoning.

Scorers: DeepEval integration
Refer to Agentic AI evaluation for available scorers

Machine learning models

NLP and tabular models using Hugging Face Transformers.

Test suite: NLPClassifierFullSuite (for NLP), ClassifierFullSuite or RegressionFullSuite (for tabular)
Text classification, tabular classification/regression

Any PyTorch neural network architecture, including LSTM, RNN, CNN, transformers, and custom architectures.

Test suite: ClassifierFullSuite or RegressionFullSuite (depending on task)
Model performance, feature importance, diagnosis

Gradient boosting and ensemble tree methods (XGBoost, CatBoost, random forest).

Test suite: ClassifierFullSuite or RegressionFullSuite
Confusion matrix, ROC/AUC, feature importance, SHAP values

Distance-based classification and regression.

Test suite: ClassifierFullSuite or RegressionFullSuite
Performance metrics, model comparison

Any sklearn-compatible clustering algorithm, including K-means, DBSCAN, and hierarchical clustering.

Test suite: ClusterFullSuite
Silhouette score, homogeneity, completeness, cluster distribution

Traditional statistical models

Models the relationship between a scalar response and one or more explanatory variables.

Test suite: RegressionFullSuite
R² score, regression errors, feature importance

Models the probability of a binary outcome based on one or more predictor variables.

Test suite: ClassifierFullSuite
Confusion matrix, ROC/AUC, precision/recall, feature importance

ARIMA, VAR, and other statsmodels time series models.

Test suite: TimeSeriesModelValidation
Stationarity tests, autocorrelation analysis, forecast evaluation

Frameworks

The ValidMind Library provides wrapper classes for common frameworks. When you call vm.init_model(), the library automatically detects your model's framework from its module, such as sklearn, torch, or transformers, and selects the appropriate wrapper. You don't need to specify the wrapper class manually.

Gradient boosting on decision trees with categorical feature support.

Wrapper: CatBoostModel
Install: pip install validmind

Pre-trained models for text classification, summarization, translation, and generation from the Hugging Face Transformers library.

Wrapper: HFModel
Install: pip install validmind[huggingface]
Supported tasks: text_classification, text2text_generation (summarization), feature_extraction (embeddings)

Access to OpenAI and Azure OpenAI models for text generation and analysis via OpenAI-compatible APIs.

Wrapper: FoundationModel
Install: pip install validmind[llm]

An open-source machine learning library for computer vision and natural language processing.

Wrapper: PyTorchModel
Install: pip install validmind[pytorch]

Integrates R's statistical computing with Python via rpy2.

Wrapper: RModel
Install: pip install validmind rpy2
Supported model types: glm (including logistic regression), lm, xgb.Booster
Initialization: Use vm.init_r_model(model_path, input_id) to load R model objects

A Python library for machine learning, offering supervised and unsupervised learning algorithms.

Wrapper: SKlearnModel
Install: pip install validmind

A Python module for statistical models, tests, and data exploration.

Wrapper: StatsModelsModel
Install: pip install validmind[stats]

An optimized gradient boosting library for classification and regression.

Wrapper: XGBoostModel
Install: pip install validmind[xgboost]

Wrap custom prediction functions or models from otherwise unsupported frameworks.

Wrapper: FunctionModel
Install: pip install validmind

To install all optional dependencies:

pip install validmind[all]

Test input requirements

Different tests require different inputs from your model and dataset. Understanding these requirements helps you run the right tests for your use case.¹

¹ How to run tests and test suites

Most model tests call your model's predict() method to generate predictions. This includes:

Performance metrics (accuracy, precision, recall, F1)
Error analysis tests
Robustness tests

Classification metrics that evaluate probability outputs require predict_proba():

ROC-AUC score
Precision-recall curves
Calibration tests
Probability distribution analysis

If your model doesn't have predict_proba(), these tests will be skipped or return an error.

Test input flow

flowchart LR
    subgraph Inputs
        Model[Model Object]
        Dataset[Dataset]
        Precomputed[Precomputed Predictions]
    end
    
    Model --> predict["predict()"]
    Model --> predict_proba["predict_proba()"]
    Precomputed --> Dataset
    
    subgraph TestTypes[Test Types]
        ModelTests[Model Tests]
        DatasetTests[Dataset Tests]
        ClassificationTests[Classification Metrics]
    end
    
    predict --> ModelTests
    predict --> DatasetTests
    predict_proba --> ClassificationTests
    Dataset --> DatasetTests

Using precomputed predictions

If you can't provide a model object because your model runs in a separate environment, you can pass precomputed predictions directly to the dataset:

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column="target",
    prediction_values=predictions,  # numpy array of predictions
    probability_values=probabilities  # optional: for classification
)

Alternatively, if predictions are already in your dataframe:

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column="target",
    prediction_column="predicted",  # column name in df
    probability_column="probability"  # optional: for classification
)

Dataset-only tests

Some tests analyze data quality and don't require a model at all:

Missing value analysis
Class imbalance detection
Feature correlation
Outlier detection
Data drift tests

Supported dataset formats

The library accepts multiple data formats when initializing datasets with vm.init_dataset():

Format	Notes
Pandas DataFrame	Primary format, used internally
Polars DataFrame	Converted to pandas internally
NumPy ndarray	Requires column names to be specified
PyTorch TensorDataset	Requires `pip install validmind[pytorch]`

Custom model wrappers

For models that don't fit standard framework patterns, use these flexible wrappers:

Wrap any Python callable as a model:

from validmind.models import FunctionModel

def my_predict(X):
    # Your prediction logic here
    return predictions

vm_model = vm.init_model(
    model=FunctionModel(predict_fn=my_predict),
    input_id="my_model"
)

Chain multiple models or processing steps using the | operator between VMModel instances:

from validmind.models import PipelineModel

vm_model = vm.init_model(
    model=PipelineModel(vm_preprocessor | vm_model),
    input_id="my_pipeline"
)

Or, pass the pipeline directly:

vm_model = vm.init_model(vm_preprocessor | vm_model, input_id="my_pipeline")

Use when you have model metadata but no inference capability — for example, when documenting a model deployed in an external system. Calling vm.init_model() with attributes creates a MetadataModel implicitly:

vm_model = vm.init_model(
    input_id="external_model",
    attributes={
        "architecture": "ExternalModel",
        "language": "Python",
    },
)

With a metadata-only model, you can still run dataset-only tests and use precomputed predictions.

Setting up for LLMs and GenAI

This section covers how to configure and test large language models and generative AI systems.

Use FoundationModel for LLM endpoints which wraps a callable predict_fn and a Prompt:

import validmind as vm
from openai import OpenAI
from validmind.models import FoundationModel, Prompt

client = OpenAI()
def call_model(prompt: str) -> str:
    return (
        client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
        )
        .choices[0]
        .message.content
    )

vm_model = vm.init_model(
    model=FoundationModel(
        predict_fn=call_model,
        prompt=Prompt(
            template="Classify the sentiment: {text}",
            variables=["text"],
        ),
    ),
    input_id="sentiment_classifier",
)

LLM test suites

Install LLM dependencies:

pip install validmind[llm]

Available test suites for LLMs include:

Prompt injection detection
Output consistency
Hallucination detection
Toxicity analysis
Bias evaluation

Tests such as bias, clarity, robustness, and toxicity can be applied to any text-generating model, regardless of type.

RAG evaluation

For retrieval-augmented generation (RAG) systems, the ValidMind Library integrates with RAGAS for comprehensive evaluation.

Dataset requirements

RAG evaluation expects these logical fields, but you can map your own column names using test parameters:

Field	Type	Description
`user_input`	str	The user's query
`response`	str	Model output
`retrieved_contexts`	List[str]	Retrieved context chunks
`reference`	str	Ground truth (required for some tests)

To map custom column names, pass them as parameters when running the test:

run_test(
    "validmind.model_validation.ragas.Faithfulness",
    inputs={"dataset": vm_test_ds},
    params={
        "user_input_column": "question",
        "response_column": "rag_model_prediction",
        "retrieved_contexts_column": "retrieval_model_prediction",
    },
).log()

Available RAG tests

Faithfulness — Measures how well the response is grounded in retrieved contexts
Context Recall — Evaluates if relevant information was retrieved
Context Precision — Measures relevance of retrieved contexts
Answer Relevancy — Assesses if the response addresses the query

Agentic AI evaluation

For agentic AI systems, the ValidMind Library integrates with DeepEval for trace-based evaluation of tool usage and reasoning.

Available agentic scorers

TaskCompletion — Assesses whether the agent achieves the requested outcome
PlanQuality — Measures if generated plans are logical, complete, and efficient
PlanAdherence — Evaluates whether the agent follows its plan during execution
ToolCorrectness — Validates that the agent invokes the expected tools
ArgumentCorrectness — Checks if arguments passed to tools are correct

Scorer requirements

Scorer	Requires model	Parameters
TaskCompletion	Yes (`predict_fn`)	`input_column`
PlanQuality	Yes (`predict_fn`)	`input_column`
PlanAdherence	Yes (`predict_fn`)	`input_column`
ToolCorrectness	No	`input_column`, `expected_tools_called_column`, `actual_tools_called_column`
ArgumentCorrectness	No	`input_column`, `actual_tools_called_column`

Python and dependency compatibility

The ValidMind Library requires:

Python: >=3.9, <3.13
Core dependencies: pandas, numpy, scikit-learn

Optional dependencies for specific frameworks:

Extra	Includes
`pytorch`	PyTorch, torchvision
`llm`	OpenAI, langchain, ragas, deepeval
`xgboost`	XGBoost
`huggingface`	Transformers, sentencepiece
`nlp`	langdetect, nltk, textblob, evaluate, rouge, bert-score
`stats`	scipy, statsmodels, arch
`explainability`	SHAP
`credit_risk`	scorecardpy
`datasets`	Hugging Face datasets
`pii-detection`	presidio-analyzer, presidio-structured
`all`	All optional dependencies

What's next