• Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • ValidMind Library
    • Python API
    • Public REST API

    • Training Courses
  • Log In
  1. datasets
  2. llm
  • ValidMind Library Python API

  • Python API
  • 2.10.2
  • init
  • init_dataset
  • init_model
  • init_r_model
  • get_test_suite
  • log_metric
  • preview_template
  • print_env
  • reload
  • run_documentation_tests
  • run_test_suite
  • tags
  • tasks
  • test
  • scorer_decorator
  • log_text
  • experimental_agent
  • RawData
    • RawData
    • inspect
    • serialize

  • Submodules
  • __version__
  • datasets
    • classification
      • customer_churn
      • taiwan_credit
    • credit_risk
      • lending_club
      • lending_club_bias
    • llm
      • rag
      • rfp
    • nlp
      • cnn_dailymail
      • twitter_covid_19
    • regression
      • fred
      • lending_club
  • errors
  • scorer
  • test_suites
    • classifier
    • cluster
    • embeddings
    • llm
    • nlp
    • parameters_optimization
    • regression
    • statsmodels_timeseries
    • summarization
    • tabular_datasets
    • text_data
    • time_series
  • tests
    • data_validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • CommonWords
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • Hashtags
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LJungBox
      • LaggedCorrelationHeatmap
      • LanguageDetection
      • Mentions
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • PolarityAndSubjectivity
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • Punctuations
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • Sentiment
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • StopWords
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TextDescription
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • Toxicity
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • nlp
    • model_validation
      • AdjustedMutualInformation
      • AdjustedRandIndex
      • AutoARIMA
      • BertScore
      • BleuScore
      • CalibrationCurve
      • ClassifierPerformance
      • ClassifierThresholdOptimization
      • ClusterCosineSimilarity
      • ClusterPerformanceMetrics
      • ClusterSizeDistribution
      • CompletenessScore
      • ConfusionMatrix
      • ContextualRecall
      • CumulativePredictionProbabilities
      • DurbinWatsonTest
      • FeatureImportance
      • FeaturesAUC
      • FowlkesMallowsScore
      • GINITable
      • HomogeneityScore
      • HyperParametersTuning
      • KMeansClustersOptimization
      • KolmogorovSmirnov
      • Lilliefors
      • MeteorScore
      • MinimumAccuracy
      • MinimumF1Score
      • MinimumROCAUCScore
      • ModelMetadata
      • ModelParameters
      • ModelPredictionResiduals
      • ModelsPerformanceComparison
      • OverfitDiagnosis
      • PermutationFeatureImportance
      • PopulationStabilityIndex
      • PrecisionRecallCurve
      • PredictionProbabilitiesHistogram
      • ROCCurve
      • RegardScore
      • RegressionCoeffs
      • RegressionErrors
      • RegressionErrorsComparison
      • RegressionFeatureSignificance
      • RegressionModelForecastPlot
      • RegressionModelForecastPlotLevels
      • RegressionModelSensitivityPlot
      • RegressionModelSummary
      • RegressionPerformance
      • RegressionPermutationFeatureImportance
      • RegressionR2Square
      • RegressionR2SquareComparison
      • RegressionResidualsPlot
      • RobustnessDiagnosis
      • RougeScore
      • SHAPGlobalImportance
      • ScoreProbabilityAlignment
      • ScorecardHistogram
      • SilhouettePlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesPredictionsPlot
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • TrainingTestDegradation
      • VMeasure
      • WeakspotsDiagnosis
      • sklearn
      • statsmodels
      • statsutils
    • plots
      • BoxPlot
      • CorrelationHeatmap
      • HistogramPlot
      • ViolinPlot
    • prompt_validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
      • ai_powered_test
    • stats
      • CorrelationAnalysis
      • DescriptiveStats
      • NormalityTests
      • OutlierDetection
  • unit_metrics
  • vm_models

On this page

  • LLMAgentDataset
    • LLMAgentDataset
    • add_golden
    • add_test_case
    • convert_goldens_to_test_cases
    • evaluate_with_deepeval
    • from_deepeval_dataset
    • from_goldens
    • from_test_cases
    • get_deepeval_dataset
    • to_deepeval_test_cases
  • Edit this page
  • Report an issue
  1. datasets
  2. llm

validmind.llm

Entrypoint for LLM datasets.

  • rag

LLMAgentDataset

classLLMAgentDataset(VMDataset):

LLM Agent Dataset for DeepEval integration with ValidMind.

This dataset class allows you to use all DeepEval tests and metrics within the ValidMind evaluation framework. It stores LLM interaction data in a format compatible with both frameworks.

Arguments

  • test_cases (List[LLMTestCase]): List of DeepEval test cases
  • goldens (List[Golden]): List of DeepEval golden templates
  • deepeval_dataset (EvaluationDataset): DeepEval dataset instance

LLMAgentDataset

LLMAgentDataset(input_id:Optional[str]=None,test_cases:Optional[List[Any]]=None,goldens:Optional[List[Any]]=None,deepeval_dataset:Optional[Any]=None,**kwargs:Any)

Initialize LLMAgentDataset.

Arguments

  • input_id (Optional[str]): Identifier for the dataset.
  • test_cases (Optional[List[LLMTestCase]]): List of DeepEval LLMTestCase objects.
  • goldens (Optional[List[Golden]]): List of DeepEval Golden objects.
  • deepeval_dataset (Optional[EvaluationDataset]): DeepEval EvaluationDataset instance.
  • **kwargs (Any): Additional arguments passed to VMDataset.

add_golden

defadd_golden(self,golden:Any):

Add a DeepEval golden to the dataset.

Arguments

  • golden (Golden): DeepEval Golden instance.

add_test_case

defadd_test_case(self,test_case:Any):

Add a DeepEval test case to the dataset.

Arguments

  • test_case (LLMTestCase): DeepEval LLMTestCase instance.

convert_goldens_to_test_cases

defconvert_goldens_to_test_cases(self,llm_app_function:Callable[[str], Any]):

Convert goldens to test cases by generating actual outputs.

Arguments

  • llm_app_function (Callable[[str], Any]): Function that takes input and returns LLM output.

evaluate_with_deepeval

defevaluate_with_deepeval(self,metrics:List[Any],**kwargs:Any) → Dict[str, Any]:

Evaluate the dataset using DeepEval metrics.

Arguments

  • metrics (List[Any]): List of DeepEval metric instances.
  • **kwargs (Any): Additional arguments passed to deepeval.evaluate().

Returns

  • Evaluation results dictionary.

from_deepeval_dataset

@classmethod

deffrom_deepeval_dataset(cls,deepeval_dataset:Any,input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:

Create LLMAgentDataset from DeepEval EvaluationDataset.

Arguments

  • deepeval_dataset (EvaluationDataset): DeepEval EvaluationDataset instance.
  • input_id (str): Dataset identifier.
  • **kwargs (Any): Additional arguments passed through to constructor.

Returns

  • New dataset instance.

from_goldens

@classmethod

deffrom_goldens(cls,goldens:List[Any],input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:

Create LLMAgentDataset from DeepEval goldens.

Arguments

  • goldens (List[Golden]): List of DeepEval Golden objects.
  • input_id (str): Dataset identifier.
  • **kwargs (Any): Additional arguments passed through to constructor.

Returns

  • New dataset instance.

from_test_cases

@classmethod

deffrom_test_cases(cls,test_cases:List[Any],input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:

Create LLMAgentDataset from DeepEval test cases.

Arguments

  • test_cases (List[LLMTestCase]): List of DeepEval LLMTestCase objects.
  • input_id (str): Dataset identifier.
  • **kwargs (Any): Additional arguments passed through to constructor.

Returns

  • New dataset instance.

get_deepeval_dataset

defget_deepeval_dataset(self) → Any:

Get or create a DeepEval EvaluationDataset instance.

Returns

  • DeepEval EvaluationDataset instance.

to_deepeval_test_cases

defto_deepeval_test_cases(self) → List[Any]:

Convert dataset rows back to DeepEval test cases.

Returns

  • List of DeepEval LLMTestCase objects.
lending_club_bias
rag
  • ValidMind Logo
    ©
    Copyright 2025 ValidMind Inc.
    All Rights Reserved.
    Cookie preferences
    Legal
  • Get started
    • Model development
    • Model validation
    • Setup & admin
  • Guides
    • Access
    • Configuration
    • Model inventory
    • Model documentation
    • Model validation
    • Workflows
    • Reporting
    • Monitoring
    • Attestation
  • Library
    • For developers
    • For validators
    • Code samples
    • Python API
    • Public REST API
  • Training
    • Learning paths
    • Courses
    • Videos
  • Support
    • Troubleshooting
    • FAQ
    • Get help
  • Community
    • GitHub
    • LinkedIn
    • Events
    • Blog
  • Edit this page
  • Report an issue