• Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • ValidMind Library
    • Python API
    • Public REST API

    • Training Courses
  • Log In
  1. Supported models and frameworks
  • ValidMind Library
  • Supported models and frameworks

  • Quickstart
  • Quickstart for model documentation
  • Quickstart for model validation
  • Install and initialize ValidMind
    • Install and initialize the library
    • Install and initialize the library for R
  • Store model credentials in .env files

  • End-to-End Tutorials
  • Model development
    • 1 — Set up ValidMind Library
    • 2 — Start model development process
    • 3 — Integrate custom tests
    • 4 — Finalize testing & documentation
  • Model validation
    • 1 — Set up ValidMind Library for validation
    • 2 — Start model validation process
    • 3 — Developing a challenger model
    • 4 — Finalize validation & reporting

  • How-To
  • Run tests & test suites
    • Explore tests
      • Explore tests
      • Explore test suites
      • Test sandbox beta
    • Run tests
      • Run dataset-based tests
      • Run comparison tests
      • Configuring tests
        • Customize test result descriptions
        • Enable PII detection in tests
        • Dataset Column Filters when Running Tests
        • Run tests with multiple datasets
        • Understand and utilize RawData in ValidMind tests
      • Using tests in documentation
        • Document multiple results for the same test
        • Run individual documentation sections
        • Run documentation tests with custom configurations
    • Custom tests
      • Implement custom tests
      • Integrate external test providers
  • Use library features
    • Data and datasets
      • Introduction to ValidMind Dataset and Model Objects
      • Dataset inputs
        • Configure dataset features
        • Load dataset predictions
    • Metrics
      • Log metrics over time
      • Intro to Unit Metrics
    • Scoring
      • Intro to Assign Scores

  • Notebooks
  • Code samples
    • Agents
      • Document an agentic AI system
    • Capital markets
      • Quickstart for knockout option pricing model documentation
      • Quickstart for Heston option pricing model using QuantLib
    • Code explainer
      • Quickstart for model code documentation
    • Credit risk
      • Document an application scorecard model
      • Document an application scorecard model
      • Document a credit risk model
      • Document an application scorecard model
      • Document an Excel-based application scorecard model
    • Model validation
      • Validate an application scorecard model
    • NLP and LLM
      • Sentiment analysis of financial data using a large language model (LLM)
      • Summarization of financial data using a large language model (LLM)
      • Sentiment analysis of financial data using Hugging Face NLP models
      • Summarization of financial data using Hugging Face NLP models
      • Automate news summarization using LLMs
      • Prompt validation for large language models (LLMs)
      • RAG Model Benchmarking Demo
      • RAG Model Documentation Demo
    • Ongoing monitoring
      • Ongoing Monitoring for Application Scorecard
      • Quickstart for ongoing monitoring of models with ValidMind
    • Regression
      • Document a California Housing Price Prediction regression model
    • Time series
      • Document a time series forecasting model
      • Document a time series forecasting model

  • Reference
  • Test descriptions
    • Data Validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LaggedCorrelationHeatmap
      • LJungBox
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • Nlp
        • CommonWords
        • Hashtags
        • LanguageDetection
        • Mentions
        • PolarityAndSubjectivity
        • Punctuations
        • Sentiment
        • StopWords
        • TextDescription
        • Toxicity
    • Model Validation
      • BertScore
      • BleuScore
      • ClusterSizeDistribution
      • ContextualRecall
      • FeaturesAUC
      • MeteorScore
      • ModelMetadata
      • ModelPredictionResiduals
      • RegardScore
      • RegressionResidualsPlot
      • RougeScore
      • TimeSeriesPredictionsPlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • Embeddings
        • ClusterDistribution
        • CosineSimilarityComparison
        • CosineSimilarityDistribution
        • CosineSimilarityHeatmap
        • DescriptiveAnalytics
        • EmbeddingsVisualization2D
        • EuclideanDistanceComparison
        • EuclideanDistanceHeatmap
        • PCAComponentsPairwisePlots
        • StabilityAnalysisKeyword
        • StabilityAnalysisRandomNoise
        • StabilityAnalysisSynonyms
        • StabilityAnalysisTranslation
        • TSNEComponentsPairwisePlots
      • Ragas
        • AnswerCorrectness
        • AspectCritic
        • ContextEntityRecall
        • ContextPrecision
        • ContextPrecisionWithoutReference
        • ContextRecall
        • Faithfulness
        • NoiseSensitivity
        • ResponseRelevancy
        • SemanticSimilarity
      • Sklearn
        • AdjustedMutualInformation
        • AdjustedRandIndex
        • CalibrationCurve
        • ClassifierPerformance
        • ClassifierThresholdOptimization
        • ClusterCosineSimilarity
        • ClusterPerformanceMetrics
        • CompletenessScore
        • ConfusionMatrix
        • FeatureImportance
        • FowlkesMallowsScore
        • HomogeneityScore
        • HyperParametersTuning
        • KMeansClustersOptimization
        • MinimumAccuracy
        • MinimumF1Score
        • MinimumROCAUCScore
        • ModelParameters
        • ModelsPerformanceComparison
        • OverfitDiagnosis
        • PermutationFeatureImportance
        • PopulationStabilityIndex
        • PrecisionRecallCurve
        • RegressionErrors
        • RegressionErrorsComparison
        • RegressionPerformance
        • RegressionR2Square
        • RegressionR2SquareComparison
        • RobustnessDiagnosis
        • ROCCurve
        • ScoreProbabilityAlignment
        • SHAPGlobalImportance
        • SilhouettePlot
        • TrainingTestDegradation
        • VMeasure
        • WeakspotsDiagnosis
      • Statsmodels
        • AutoARIMA
        • CumulativePredictionProbabilities
        • DurbinWatsonTest
        • GINITable
        • KolmogorovSmirnov
        • Lilliefors
        • PredictionProbabilitiesHistogram
        • RegressionCoeffs
        • RegressionFeatureSignificance
        • RegressionModelForecastPlot
        • RegressionModelForecastPlotLevels
        • RegressionModelSensitivityPlot
        • RegressionModelSummary
        • RegressionPermutationFeatureImportance
        • ScorecardHistogram
    • Ongoing Monitoring
      • CalibrationCurveDrift
      • ClassDiscriminationDrift
      • ClassificationAccuracyDrift
      • ClassImbalanceDrift
      • ConfusionMatrixDrift
      • CumulativePredictionProbabilitiesDrift
      • FeatureDrift
      • PredictionAcrossEachFeature
      • PredictionCorrelation
      • PredictionProbabilitiesHistogramDrift
      • PredictionQuantilesAcrossFeatures
      • ROCCurveDrift
      • ScoreBandsDrift
      • ScorecardHistogramDrift
      • TargetPredictionDistributionPlot
    • Plots
      • BoxPlot
      • CorrelationHeatmap
      • HistogramPlot
      • ViolinPlot
    • Prompt Validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
    • Stats
      • CorrelationAnalysis
      • DescriptiveStats
      • NormalityTests
      • OutlierDetection
  • ValidMind Library Python API
  • ValidMind Public REST API

On this page

  • What does supported mean?
  • AI systems
  • Machine learning models
  • Traditional statistical models
  • Frameworks
    • Test input requirements
    • Test input flow
    • Using precomputed predictions
    • Dataset-only tests
    • Supported dataset formats
  • Custom model wrappers
    • LLM test suites
  • RAG evaluation
    • Dataset requirements
    • Available RAG tests
  • Python and dependency compatibility
  • What's next
  • Edit this page
  • Report an issue

Supported models and frameworks

Published

March 25, 2026

Learn about the wide range of models and frameworks for testing and documentation supported by the ValidMind Library. Understand which frameworks are supported, what your model needs to provide for tests to run, and how to work with models that don't fit standard patterns.

What does supported mean?

Supported here means the ValidMind Library provides dedicated wrappers, install options, and guidance for these models and frameworks.

You can also use other code with the library for models that don't fit standard framework patterns. For example, you can wrap any Python callable with FunctionModel or pass precomputed predictions when you don't have a model object.

Framework-agnostic support
Compatible with any Python object that implements a predict() method. Framework-specific wrappers provide extra features and convenience, but they are not required to use generic Python models.
Vendor model compatibility
Works with both first-party models you create and third-party models from external vendors, enabling flexible integration with both proprietary and external sources.

AI systems

  • LLM classifiers
  • NLP classifiers
  • Summarization models
  • Embeddings models
  • RAG systems

LLM-based classification models via OpenAI-compatible APIs.

  • Test suite: LLMClassifierFullSuite
  • Includes prompt validation, text data quality, classifier metrics

Hugging Face and other NLP classification models.

  • Test suite: NLPClassifierFullSuite
  • Text classification, sentiment analysis, named entity recognition

Models that generate concise summaries from longer texts.

  • Test suite: SummarizationMetrics
  • BERT, BLEU, METEOR, ROUGE score evaluation

Text and vector embedding models.

  • Test suite: EmbeddingsFullSuite
  • Similarity metrics, stability analysis, visualization

Retrieval-augmented generation pipelines.

  • Tests: RAGAS integration
  • Refer to RAG evaluation for dataset requirements and available tests

Machine learning models

  • Hugging Face-compatible
  • PyTorch
  • Tree-based
  • K-nearest neighbors (KNN)
  • Clustering

NLP and tabular models using Hugging Face Transformers.

  • Test suite: NLPClassifierFullSuite (for NLP), ClassifierFullSuite or RegressionFullSuite (for tabular)
  • Text classification, tabular classification/regression

Any PyTorch neural network architecture, including LSTM, RNN, CNN, transformers, and custom architectures.

  • Test suite: ClassifierFullSuite or RegressionFullSuite (depending on task)
  • Model performance, feature importance, diagnosis

Gradient boosting and ensemble tree methods (XGBoost, CatBoost, random forest).

  • Test suite: ClassifierFullSuite or RegressionFullSuite
  • Confusion matrix, ROC/AUC, feature importance, SHAP values

Distance-based classification and regression.

  • Test suite: ClassifierFullSuite or RegressionFullSuite
  • Performance metrics, model comparison

Any sklearn-compatible clustering algorithm, including K-means, DBSCAN, and hierarchical clustering.

  • Test suite: ClusterFullSuite
  • Silhouette score, homogeneity, completeness, cluster distribution

Traditional statistical models

  • Linear regression
  • Logistic regression
  • Timeseries and forecasting

Models the relationship between a scalar response and one or more explanatory variables.

  • Test suite: RegressionFullSuite
  • R² score, regression errors, feature importance

Models the probability of a binary outcome based on one or more predictor variables.

  • Test suite: ClassifierFullSuite
  • Confusion matrix, ROC/AUC, precision/recall, feature importance

ARIMA, VAR, and other statsmodels time series models.

  • Test suite: TimeSeriesModelValidation
  • Stationarity tests, autocorrelation analysis, forecast evaluation

Frameworks

The ValidMind Library provides wrapper classes for common frameworks. When you call vm.init_model(), the library automatically detects your model's framework from its module, such as sklearn, torch, or transformers, and selects the appropriate wrapper. You don't need to specify the wrapper class manually.

  • CatBoost
  • Hugging Face Transformers
  • LLMs via API
  • PyTorch
  • R models via rpy2
  • scikit-learn
  • statsmodels
  • XGBoost
  • Python callable

Gradient boosting on decision trees with categorical feature support.

  • Wrapper: CatBoostModel
  • Install: pip install validmind

Pre-trained models for text classification, summarization, translation, and generation from the Hugging Face Transformers library.

  • Wrapper: HFModel
  • Install: pip install validmind[huggingface]
  • Supported tasks: text_classification, text2text_generation (summarization), feature_extraction (embeddings)

Access to OpenAI and Azure OpenAI models for text generation and analysis via OpenAI-compatible APIs.

  • Wrapper: FoundationModel
  • Install: pip install validmind[llm]

An open-source machine learning library for computer vision and natural language processing.

  • Wrapper: PyTorchModel
  • Install: pip install validmind[pytorch]

Integrates R's statistical computing with Python via rpy2.

  • Wrapper: RModel
  • Install: pip install validmind rpy2
  • Supported model types: glm (including logistic regression), lm, xgb.Booster
  • Initialization: Use vm.init_r_model(model_path, input_id) to load R model objects

A Python library for machine learning, offering supervised and unsupervised learning algorithms.

  • Wrapper: SKlearnModel
  • Install: pip install validmind

A Python module for statistical models, tests, and data exploration.

  • Wrapper: StatsModelsModel
  • Install: pip install validmind[stats]

An optimized gradient boosting library for classification and regression.

  • Wrapper: XGBoostModel
  • Install: pip install validmind[xgboost]

Wrap custom prediction functions or models from otherwise unsupported frameworks.

  • Wrapper: FunctionModel
  • Install: pip install validmind

To install all optional dependencies:

pip install validmind[all]

Test input requirements

Different tests require different inputs from your model and dataset. Understanding these requirements helps you run the right tests for your use case.1

1 How to run tests and test suites

  • When predict() is required
  • When predict_proba() is needed

Most model tests call your model's predict() method to generate predictions. This includes:

  • Performance metrics (accuracy, precision, recall, F1)
  • Error analysis tests
  • Robustness tests

Classification metrics that evaluate probability outputs require predict_proba():

  • ROC-AUC score
  • Precision-recall curves
  • Calibration tests
  • Probability distribution analysis

If your model doesn't have predict_proba(), these tests will be skipped or return an error.

Test input flow

flowchart LR
    subgraph Inputs
        Model[Model Object]
        Dataset[Dataset]
        Precomputed[Precomputed Predictions]
    end
    
    Model --> predict["predict()"]
    Model --> predict_proba["predict_proba()"]
    Precomputed --> Dataset
    
    subgraph TestTypes[Test Types]
        ModelTests[Model Tests]
        DatasetTests[Dataset Tests]
        ClassificationTests[Classification Metrics]
    end
    
    predict --> ModelTests
    predict --> DatasetTests
    predict_proba --> ClassificationTests
    Dataset --> DatasetTests

Using precomputed predictions

If you can't provide a model object because your model runs in a separate environment, you can pass precomputed predictions directly to the dataset:

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column="target",
    prediction_values=predictions,  # numpy array of predictions
    probability_values=probabilities  # optional: for classification
)

Alternatively, if predictions are already in your dataframe:

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column="target",
    prediction_column="predicted",  # column name in df
    probability_column="probability"  # optional: for classification
)

Dataset-only tests

Some tests analyze data quality and don't require a model at all:

  • Missing value analysis
  • Class imbalance detection
  • Feature correlation
  • Outlier detection
  • Data drift tests

Supported dataset formats

The library accepts multiple data formats when initializing datasets with vm.init_dataset():

Format Notes
Pandas DataFrame Primary format, used internally
Polars DataFrame Converted to pandas internally
NumPy ndarray Requires column names to be specified
PyTorch TensorDataset Requires pip install validmind[pytorch]

Custom model wrappers

For models that don't fit standard framework patterns, use these flexible wrappers:

  • FunctionModel
  • PipelineModel
  • MetadataModel
  • FoundationModel

Wrap any Python callable as a model:

from validmind.models import FunctionModel

def my_predict(X):
    # Your prediction logic here
    return predictions

vm_model = vm.init_model(
    model=FunctionModel(predict_fn=my_predict),
    input_id="my_model"
)

Chain multiple models or processing steps using the | operator between VMModel instances:

from validmind.models import PipelineModel

vm_model = vm.init_model(
    model=PipelineModel(vm_preprocessor | vm_model),
    input_id="my_pipeline"
)

Or, pass the pipeline directly:

vm_model = vm.init_model(vm_preprocessor | vm_model, input_id="my_pipeline")

Use when you have model metadata but no inference capability — for example, when documenting a model deployed in an external system. Calling vm.init_model() with attributes creates a MetadataModel implicitly:

vm_model = vm.init_model(
    input_id="external_model",
    attributes={
        "architecture": "ExternalModel",
        "language": "Python",
    },
)

With a metadata-only model, you can still run dataset-only tests and use precomputed predictions.

Setting up for LLMs and GenAI

This section covers how to configure and test large language models and generative AI systems.

Use FoundationModel for LLM endpoints which wraps a callable predict_fn and a Prompt:

import validmind as vm
from openai import OpenAI
from validmind.models import FoundationModel, Prompt

client = OpenAI()
def call_model(prompt: str) -> str:
    return (
        client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
        )
        .choices[0]
        .message.content
    )

vm_model = vm.init_model(
    model=FoundationModel(
        predict_fn=call_model,
        prompt=Prompt(
            template="Classify the sentiment: {text}",
            variables=["text"],
        ),
    ),
    input_id="sentiment_classifier",
)

LLM test suites

Install LLM dependencies:

pip install validmind[llm]

Available test suites for LLMs include:

  • Prompt injection detection
  • Output consistency
  • Hallucination detection
  • Toxicity analysis
  • Bias evaluation

Tests such as bias, clarity, robustness, and toxicity can be applied to any text-generating model, regardless of type.

RAG evaluation

For retrieval-augmented generation (RAG) systems, the ValidMind Library integrates with RAGAS for comprehensive evaluation.

Dataset requirements

RAG evaluation expects these logical fields, but you can map your own column names using test parameters:

Field Type Description
user_input str The user's query
response str Model output
retrieved_contexts List[str] Retrieved context chunks
reference str Ground truth (required for some tests)

To map custom column names, pass them as parameters when running the test:

run_test(
    "validmind.model_validation.ragas.Faithfulness",
    inputs={"dataset": vm_test_ds},
    params={
        "user_input_column": "question",
        "response_column": "rag_model_prediction",
        "retrieved_contexts_column": "retrieval_model_prediction",
    },
).log()

Available RAG tests

  • Faithfulness — Measures how well the response is grounded in retrieved contexts
  • Context Recall — Evaluates if relevant information was retrieved
  • Context Precision — Measures relevance of retrieved contexts
  • Answer Relevancy — Assesses if the response addresses the query

Python and dependency compatibility

The ValidMind Library requires:

  • Python: >=3.9, <3.13
  • Core dependencies: pandas, numpy, scikit-learn

Optional dependencies for specific frameworks:

Extra Includes
pytorch PyTorch, torchvision
llm OpenAI, langchain, ragas, deepeval
xgboost XGBoost
huggingface Transformers, sentencepiece
nlp langdetect, nltk, textblob, evaluate, rouge, bert-score
stats scipy, statsmodels, arch
explainability SHAP
credit_risk scorecardpy
datasets Hugging Face datasets
pii-detection presidio-analyzer, presidio-structured
all All optional dependencies

What's next

How to run tests and test suites
​ValidMind provides many built-in tests and test suites, which help you produce documentation during stages of the model lifecycle, where you need to ensure that your work satisfies regulatory and risk management requirements.
Test descriptions
Tests that are available as part of the ValidMind Library, grouped by type of validation or monitoring test.
How to use ValidMind Library features
Browse our range of Jupyter Notebooks demonstrating how to use the core features of the ValidMind Library. Use these how-to notebooks to get familiar with the library's capabilities and apply them to your own use cases.
Code samples
Our Jupyter Notebook code samples showcase the capabilities and features of the ValidMind Library, while also providing you with useful examples that you can build on and adapt for your own use cases.
No matching items
ValidMind Library
  • ValidMind Logo
    ©
    Copyright 2026 ValidMind Inc.
    All Rights Reserved.
    Cookie preferences
    Legal
  • Get started
    • Model development
    • Model validation
    • Setup & admin
  • Guides
    • Access
    • Configuration
    • Model inventory
    • Model documentation
    • Model validation
    • Workflows
    • Reporting
    • Monitoring
    • Attestation
  • Library
    • For developers
    • For validators
    • Code samples
    • Python API
    • Public REST API
  • Training
    • Learning paths
    • Courses
    • Videos
  • Support
    • Troubleshooting
    • FAQ
    • Get help
  • Community
    • GitHub
    • LinkedIn
    • Events
    • Blog
  • Edit this page
  • Report an issue