TimeSeriesFrequency
Evaluates consistency of time series data frequency and generates a frequency plot.
Purpose
The purpose of the TimeSeriesFrequency test is to evaluate the consistency in the frequency of data points in a time-series dataset. This test inspects the intervals or duration between each data point to determine if a fixed pattern (such as daily, weekly, or monthly) exists. The identification of such patterns is crucial to time-series analysis as any irregularities could lead to erroneous results and hinder the model’s capacity for identifying trends and patterns.
Test Mechanism
Initially, the test checks if the dataframe index is in datetime format. Subsequently, it utilizes pandas infer_freq
method to identify the frequency of each data series within the dataframe. The infer_freq
method attempts to establish the frequency of a time series and returns both the frequency string and a dictionary relating these strings to their respective labels. The test compares the frequencies of all datasets. If they share a common frequency, the test passes, but it fails if they do not. Additionally, Plotly is used to create a frequency plot, offering a visual depiction of the time differences between consecutive entries in the dataframe index.
Signs of High Risk
- The test fails, indicating multiple unique frequencies within the dataset. This failure could suggest irregular intervals between observations, potentially interrupting pattern recognition or trend analysis.
- The presence of missing or null frequencies could be an indication of inconsistencies in data or gaps within the data collection process.
Strengths
- This test uses a systematic approach to checking the consistency of data frequency within a time-series dataset.
- It increases the model’s reliability by asserting the consistency of observations over time, an essential factor in time-series analysis.
- The test generates a visual plot, providing an intuitive representation of the dataset’s frequency distribution, which caters to visual learners and aids in interpretation and explanation.
Limitations
- This test is only applicable to time-series datasets and hence not suitable for other types of datasets.
- The
infer_freq
method might not always correctly infer frequency when faced with missing or irregular data points. - Depending on context or the model under development, mixed frequencies might sometimes be acceptable, but this test considers them a failing condition.