DescriptiveStats

Provides comprehensive descriptive statistics for numerical features in a dataset.

Purpose

This test generates detailed descriptive statistics for numerical features, including basic statistics, distribution measures, confidence intervals, and normality tests. It provides a comprehensive overview of data characteristics essential for understanding data quality and distribution properties.

Test Mechanism

The test computes various statistical measures for each numerical column: - Basic statistics: count, mean, median, std, min, max, quartiles - Distribution measures: skewness, kurtosis, coefficient of variation - Confidence intervals for the mean - Normality tests (Shapiro-Wilk for small samples, Anderson-Darling for larger) - Missing value analysis

Signs of High Risk

High skewness or kurtosis indicating non-normal distributions
Large coefficients of variation suggesting high data variability
Significant results in normality tests when normality is expected
High percentage of missing values
Extreme outliers based on IQR analysis

Strengths

Comprehensive statistical analysis in a single test
Includes advanced statistical measures beyond basic descriptives
Provides confidence intervals for uncertainty quantification
Handles missing values appropriately
Suitable for both exploratory and confirmatory analysis

Limitations

Limited to numerical features only
Normality tests may not be meaningful for all data types
Large datasets may make some tests computationally expensive
Interpretation requires statistical knowledge