Fix pandas DataFrame dtype preservation in VMDataset initialization

validmind-library

2.8.22

bug

enhancement

Published

April 25, 2025

You can now reduce memory usage when initializing VMDataset objects with vm.init_dataset(). We’ve introduced a copy_data option that lets you avoid copying the input dataframe, which is useful for handling large datasets in environments with limited memory. By default, copy_data is set to True. Here’s how to use it:

vm_ds = vm.init_dataset(
    dataset=df,
    input_id="demo",
    target_column="target",
    copy_data=False,
)