validmind.llm
Entrypoint for LLM datasets.
LLMAgentDataset
classLLMAgentDataset(VMDataset):
LLM Agent Dataset for DeepEval integration with ValidMind.
This dataset class allows you to use all DeepEval tests and metrics within the ValidMind evaluation framework. It stores LLM interaction data in a format compatible with both frameworks.
Arguments
test_cases (List[LLMTestCase]): List of DeepEval test casesgoldens (List[Golden]): List of DeepEval golden templatesdeepeval_dataset (EvaluationDataset): DeepEval dataset instance
LLMAgentDataset
LLMAgentDataset(input_id:Optional[str]=None,test_cases:Optional[List[Any]]=None,goldens:Optional[List[Any]]=None,deepeval_dataset:Optional[Any]=None,**kwargs:Any)
Initialize LLMAgentDataset.
Arguments
input_id (Optional[str]): Identifier for the dataset.test_cases (Optional[List[LLMTestCase]]): List of DeepEval LLMTestCase objects.goldens (Optional[List[Golden]]): List of DeepEval Golden objects.deepeval_dataset (Optional[EvaluationDataset]): DeepEval EvaluationDataset instance.**kwargs (Any): Additional arguments passed toVMDataset.
add_golden
defadd_golden(self,golden:Any):
Add a DeepEval golden to the dataset.
Arguments
golden (Golden): DeepEval Golden instance.
add_test_case
defadd_test_case(self,test_case:Any):
Add a DeepEval test case to the dataset.
Arguments
test_case (LLMTestCase): DeepEval LLMTestCase instance.
convert_goldens_to_test_cases
defconvert_goldens_to_test_cases(self,llm_app_function:Callable[[str], Any]):
Convert goldens to test cases by generating actual outputs.
Arguments
llm_app_function (Callable[[str], Any]): Function that takes input and returns LLM output.
evaluate_with_deepeval
defevaluate_with_deepeval(self,metrics:List[Any],**kwargs:Any) → Dict[str, Any]:
Evaluate the dataset using DeepEval metrics.
Arguments
metrics (List[Any]): List of DeepEval metric instances.**kwargs (Any): Additional arguments passed todeepeval.evaluate().
Returns
- Evaluation results dictionary.
from_deepeval_dataset
@classmethod
deffrom_deepeval_dataset(cls,deepeval_dataset:Any,input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:
Create LLMAgentDataset from DeepEval EvaluationDataset.
Arguments
deepeval_dataset (EvaluationDataset): DeepEval EvaluationDataset instance.input_id (str): Dataset identifier.**kwargs (Any): Additional arguments passed through to constructor.
Returns
- New dataset instance.
from_goldens
@classmethod
deffrom_goldens(cls,goldens:List[Any],input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:
Create LLMAgentDataset from DeepEval goldens.
Arguments
goldens (List[Golden]): List of DeepEval Golden objects.input_id (str): Dataset identifier.**kwargs (Any): Additional arguments passed through to constructor.
Returns
- New dataset instance.
from_test_cases
@classmethod
deffrom_test_cases(cls,test_cases:List[Any],input_id:str='llm_agent_dataset',**kwargs:Any) → validmind.vm_models.LLMAgentDataset:
Create LLMAgentDataset from DeepEval test cases.
Arguments
test_cases (List[LLMTestCase]): List of DeepEval LLMTestCase objects.input_id (str): Dataset identifier.**kwargs (Any): Additional arguments passed through to constructor.
Returns
- New dataset instance.
get_deepeval_dataset
defget_deepeval_dataset(self) → Any:
Get or create a DeepEval EvaluationDataset instance.
Returns
- DeepEval EvaluationDataset instance.
to_deepeval_test_cases
defto_deepeval_test_cases(self) → List[Any]:
Convert dataset rows back to DeepEval test cases.
Returns
- List of DeepEval LLMTestCase objects.