validmind.NormalityTests

NormalityTests

@tags('tabular_data', 'statistics', 'normality')

@tasks('classification', 'regression', 'clustering')

defNormalityTests(dataset:validmind.vm_models.VMDataset,columns:Optional[List[str]]=None,alpha:float=0.05,tests:List[str]=['shapiro', 'anderson', 'kstest']) → Dict[str, Any]:

Performs multiple normality tests on numerical features to assess distribution normality.

Purpose

This test evaluates whether numerical features follow a normal distribution using various statistical tests. Understanding distribution normality is crucial for selecting appropriate statistical methods and model assumptions.

Test Mechanism

The test applies multiple normality tests:

Shapiro-Wilk test: Best for small to medium samples
Anderson-Darling test: More sensitive to deviations in tails
Kolmogorov-Smirnov test: General goodness-of-fit test

Signs of High Risk

Multiple normality tests failing consistently
Very low p-values indicating strong evidence against normality
Conflicting results between different normality tests

Strengths

Multiple statistical tests for robust assessment
Clear pass/fail indicators for each test
Suitable for different sample sizes

Limitations

Limited to numerical features only
Some tests sensitive to sample size
Perfect normality is rare in real data