• Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • Python Library
    • ValidMind Library

    • ValidMind Academy
    • Training Courses
  • Log In
    • Public Internet
    • ValidMind Platform · US1
    • ValidMind Platform · CA1

    • Private Link
    • Virtual Private ValidMind (VPV)

    • Which login should I use?
  1. tests
  2. model_validation
  3. ClassifierThresholdOptimization
  • ValidMind Library Python API

  • Python API
  • 2.8.25
  • init
  • init_dataset
  • init_model
  • init_r_model
  • get_test_suite
  • log_metric
  • preview_template
  • print_env
  • reload
  • run_documentation_tests
  • run_test_suite
  • tags
  • tasks
  • test
  • log_text
  • experimental_agent
  • RawData
    • RawData
    • inspect
    • serialize

  • Submodules
  • __version__
  • datasets
    • classification
      • customer_churn
      • taiwan_credit
    • credit_risk
      • lending_club
      • lending_club_bias
    • nlp
      • cnn_dailymail
      • twitter_covid_19
    • regression
      • fred
      • lending_club
  • errors
  • test_suites
    • classifier
    • cluster
    • embeddings
    • llm
    • nlp
    • parameters_optimization
    • regression
    • statsmodels_timeseries
    • summarization
    • tabular_datasets
    • text_data
    • time_series
  • tests
    • data_validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • CommonWords
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • Hashtags
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LJungBox
      • LaggedCorrelationHeatmap
      • LanguageDetection
      • Mentions
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • PolarityAndSubjectivity
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • Punctuations
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • Sentiment
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • StopWords
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TextDescription
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • Toxicity
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • nlp
    • model_validation
      • AdjustedMutualInformation
      • AdjustedRandIndex
      • AutoARIMA
      • BertScore
      • BleuScore
      • CalibrationCurve
      • ClassifierPerformance
      • ClassifierThresholdOptimization
      • ClusterCosineSimilarity
      • ClusterPerformanceMetrics
      • ClusterSizeDistribution
      • CompletenessScore
      • ConfusionMatrix
      • ContextualRecall
      • CumulativePredictionProbabilities
      • DurbinWatsonTest
      • FeatureImportance
      • FeaturesAUC
      • FowlkesMallowsScore
      • GINITable
      • HomogeneityScore
      • HyperParametersTuning
      • KMeansClustersOptimization
      • KolmogorovSmirnov
      • Lilliefors
      • MeteorScore
      • MinimumAccuracy
      • MinimumF1Score
      • MinimumROCAUCScore
      • ModelMetadata
      • ModelParameters
      • ModelPredictionResiduals
      • ModelsPerformanceComparison
      • OverfitDiagnosis
      • PermutationFeatureImportance
      • PopulationStabilityIndex
      • PrecisionRecallCurve
      • PredictionProbabilitiesHistogram
      • ROCCurve
      • RegardScore
      • RegressionCoeffs
      • RegressionErrors
      • RegressionErrorsComparison
      • RegressionFeatureSignificance
      • RegressionModelForecastPlot
      • RegressionModelForecastPlotLevels
      • RegressionModelSensitivityPlot
      • RegressionModelSummary
      • RegressionPerformance
      • RegressionPermutationFeatureImportance
      • RegressionR2Square
      • RegressionR2SquareComparison
      • RegressionResidualsPlot
      • RobustnessDiagnosis
      • RougeScore
      • SHAPGlobalImportance
      • ScoreProbabilityAlignment
      • ScorecardHistogram
      • SilhouettePlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesPredictionsPlot
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • TrainingTestDegradation
      • VMeasure
      • WeakspotsDiagnosis
      • sklearn
      • statsmodels
      • statsutils
    • prompt_validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
      • ai_powered_test
  • unit_metrics
  • vm_models

On this page

  • ClassifierThresholdOptimization
    • Purpose
    • Test Mechanism
    • Signs of High Risk
    • Strengths
    • Limitations
  • find_optimal_threshold
  • Edit this page
  • Report an issue
  1. tests
  2. model_validation
  3. ClassifierThresholdOptimization

validmind.ClassifierThresholdOptimization

ClassifierThresholdOptimization

@tags('model_validation', 'threshold_optimization', 'classification_metrics')

@tasks('classification')

defClassifierThresholdOptimization(dataset:validmind.vm_models.VMDataset,model:validmind.vm_models.VMModel,methods:Optional[List[str]]=None,target_recall:Optional[float]=None) → Dict[str, Union[pd.DataFrame, go.Figure]]:

Analyzes and visualizes different threshold optimization methods for binary classification models.

Purpose

The Classifier Threshold Optimization test identifies optimal decision thresholds using various methods to balance different performance metrics. This helps adapt the model's decision boundary to specific business requirements, such as minimizing false positives in fraud detection or achieving target recall in medical diagnosis.

Test Mechanism

The test implements multiple threshold optimization methods:

  1. Youden's J statistic (maximizing sensitivity + specificity - 1)
  2. F1-score optimization (balancing precision and recall)
  3. Precision-Recall equality point
  4. Target recall achievement
  5. Naive (0.5) threshold For each method, it computes ROC and PR curves, identifies optimal points, and provides comprehensive performance metrics at each threshold.

Signs of High Risk

  • Large discrepancies between different optimization methods
  • Optimal thresholds far from the default 0.5
  • Poor performance metrics across all thresholds
  • Significant gap between achieved and target recall
  • Unstable thresholds across different methods
  • Extreme trade-offs between precision and recall
  • Threshold optimization showing minimal impact
  • Business metrics not improving with optimization

Strengths

  • Multiple optimization strategies for different needs
  • Visual and numerical results for comparison
  • Support for business-driven optimization (target recall)
  • Comprehensive performance metrics at each threshold
  • Integration with ROC and PR curves
  • Handles class imbalance through various metrics
  • Enables informed threshold selection
  • Supports cost-sensitive decision making

Limitations

  • Assumes cost of false positives/negatives are known
  • May need adjustment for highly imbalanced datasets
  • Threshold might not be stable across different samples
  • Cannot handle multi-class problems directly
  • Optimization methods may conflict with business needs
  • Requires sufficient validation data
  • May not capture temporal changes in optimal threshold
  • Single threshold may not be optimal for all subgroups

Arguments

  • dataset: VMDataset containing features and target
  • model: VMModel containing predictions
  • methods: List of methods to compare (default: ['youden', 'f1', 'precision_recall'])
  • target_recall: Target recall value if using 'target_recall' method

Returns

  • Dictionary containing:
  • table: DataFrame comparing different threshold optimization methods (using weighted averages for precision, recall, and f1)
  • figure: Plotly figure showing ROC and PR curves with optimal thresholds

find_optimal_threshold

deffind_optimal_threshold(y_true:np.ndarray,y_prob:np.ndarray,method:str='youden',target_recall:Optional[float]=None) → Dict[str, Union[str, float]]:

Find the optimal classification threshold using various methods.

Arguments

  • y_true: True binary labels
  • y_prob: Predicted probabilities
  • method: Method to use for finding optimal threshold
  • target_recall: Required if method='target_recall'

Returns

  • Dictionary containing threshold and metrics
ClassifierPerformance
ClusterCosineSimilarity
  • ValidMind Logo
    ©
    Copyright 2025 ValidMind Inc.
    All Rights Reserved.
    Cookie preferences
    Legal
  • Get started
    • Model development
    • Model validation
    • Setup & admin
  • Guides
    • Access
    • Configuration
    • Model inventory
    • Model documentation
    • Model validation
    • Model workflows
    • Reporting
    • Monitoring
    • Attestation
  • Library
    • For developers
    • For validators
    • Code samples
    • API Reference
  • Training
    • Learning paths
    • Courses
    • Videos
  • Support
    • Troubleshooting
    • FAQ
    • Get help
  • Community
    • Slack
    • GitHub
    • Blog
  • Edit this page
  • Report an issue