Supported models

Published

November 20, 2024

The ValidMind Library provides out-of-the-box support for testing and documentation for an array of model types and modeling packages.

What is a supported model?

A supported model refers to a model for which predefined testing or documentation functions exist in the ValidMind Library, provided that the model you are developing is documented using a supported version of our library. These model types cover a very large portion of the models used in commercial and retail banking.

Given the rapid developments in the AI space, including the advent of large language models (LLMs), ValidMind product development has also focused on making sure that our library is extensible to support future model types or modeling packages, so that we do not limit our users to specific model types. You always have the flexibility to:

Supported model types

Traditional statistical models

  • Linear regression — Models relationship between a scalar response and one or more explanatory variables.
  • Logistic regression — Predicts the probability of a binary outcome based on one or more predictor variables.
  • Time series — Analyzes data points collected or sequenced over time.

Machine learning models

Hugging Face-compatible models
  • Natural language processing (NLP) text classification — Categorizes text into predefined classes.
  • Tabular classification — Assigns categories to tabular dataset entries.
  • Tabular regression — Predicts continuous outcomes from tabular data.
Tree-based models (XGBoost / CatBoost / random forest)
  • Classification — Predicts categorical outcomes using decision trees.
  • Regression — Predicts continuous outcomes using decision trees.
K-nearest neighbors (KNN)
  • Classification — Assigns class by majority vote of the k-nearest neighbors.
  • Regression — Predicts value based on the average of the k-nearest neighbors.
Clustering
  • K-means — Partitions n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
Neural networks
  • Long short-term memory (LSTM) — Processes sequences of data, remembering inputs over long periods.
  • Recurrent neural network (RNN) — Processes sequences by maintaining a state that reflects the history of processed elements.
  • Convolutional neural network (CNN) — Primarily used for processing grid-like data such as images.

Generative AI models

Large language models (LLMs)
  • Classification — Categorizes input into predefined classes.
  • Text summarization — Generates concise summaries from longer texts.

ValidMind offers support for both first-party models and third-party vendor models.

Supported modeling libraries and other tools

  • scikit-learn — A Python library for machine learning, offering a range of supervised and unsupervised learning algorithms.
  • statsmodels — A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and exploring data.
  • PyTorch — An open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
  • Hugging Face Transformers — Provides thousands of pre-trained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, and text generation.
  • XGBoost — An optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, implementing machine learning algorithms under the Gradient Boosting framework.
  • CatBoost — An open-source gradient boosting on decision trees library with categorical feature support out of the box, for ranking, classification, regression, and other ML tasks.
  • LightGBM — A fast, distributed, high-performance gradient boosting (GBDT, GBRT, GBM, or MART) framework based on decision tree algorithms, used for ranking, classification, and many other machine learning tasks.
  • R models, via rpy2 - R in Python — Facilitates the integration of R’s statistical computing and graphics capabilities with Python, allowing for R models to be called from Python.
  • Large language models (LLMs), via OpenAI-compatible APIs — Access to advanced AI models trained by OpenAI for a variety of natural language tasks, including text generation, translation, and analysis, through a compatible API interface. This support includes both the OpenAI API and the Azure OpenAI Service via API.

What’s next