Data handling and privacy

Published

November 20, 2024

How are users added to ValidMind?

ValidMind provides a built-in user management interface that allows new users to be registered on the ValidMind Platform and assigned user roles. User roles and access permissions are configured during initial onboarding. In addition, ValidMind also provides support for Single Sign-On (SSO) integration as part of our Enterprise and our Virtual Private ValidMind (VPV) edition.

How does ValidMind handle end-user computing and spreadsheet models?

Customers can register spreadsheet models in the model inventory and centralize tracking of the associated documentation files with the inventory metadata (roadmap item – Q3’2023). However, ValidMind cannot automate documentation generation for spreadsheet models.

What model artifacts are automatically imported into documentation and how are they retained?

ValidMind stores the following artifacts in the documentation via our Python Library API:

  • Dataset and model metadata which allow generating documentation snippets programmatically (example: stored definition for “common logistic regression limitations” when a logistic regression model has been passed to the ValidMind test suite execution)
  • Quality and performance metrics collected from the dataset and model
  • Outputs from executed test suites
  • Images, plots, and visuals generated as part of extracting metrics and running tests

ValidMind is a multi-tenant or single-tenant solution hosted on cloud providers.1 With multi-tenant deployments, infrastructure is shared but with strict data isolation protocols that ensure that no tenant can access another’s sensitive information. For organizations requiring the highest degree of data security, ValidMind offers a Virtual Private ValidMind (VPV) option to host the solution in a dedicated single-tenant cloud instance on the ValidMind cloud account. Furthermore, ValidMind’s data retention policy complies with the SOC 2 security standard.

How does ValidMind handle large datasets? What about the confidentiality of data sent to ValidMind?

ValidMind does not send datasets outside the client’s environment. The library executes test suites and functions locally in your environment and is not limited by dataset size.

For more information, refer to our data privacy policy.

What solutions do you offer and how do you handle privacy?

ValidMind is a library and cloud platform available in multiple editions catering to different organizational needs:

  • Standard Edition: Our introductory offering, providing essential features and services.
  • Enterprise Edition: Builds upon the Standard Edition by adding features tailored for large-scale organizations.
  • Virtual Private ValidMind (VPV): Our most secure offering for organizations requiring a higher level of privacy, such as financial services handling sensitive data. Includes all Enterprise Edition features but in a separate, isolated ValidMind environment. VPV accounts do not share resources with accounts outside the VPV.

Access to any edition is facilitated through AWS PrivateLink, which provides private connectivity between ValidMind and your on-premises networks without exposing your traffic to the public internet. To learn more, check Configure AWS PrivateLink. ValidMind does not send any personally identifiable information (PII) through our Python API.

Can the tool automatically document other non-standard ETL steps or performance metrics from notebooks?

Support for more complex data processing pipelines is on our roadmap, currently scheduled for Q4’2023. We are implementing connector interfaces that will allow us to extract relationships between raw data sources and final post-processed datasets for Spark and Snowflake.

How does the tool manage model changes?

ValidMind allows model developers to re-run documentation functions with the library to capture changes in the model, such as changes in the number of features or hyperparameters.

After a model developer has made a change in their development environment, such as to a Jupyter Notebook, they can execute the relevant ValidMind documentation function to update the corresponding documentation section. ValidMind will then automatically recreate the relevant figures and tables and update them in the online documentation.

ValidMind is currently working on a version history function, which will allow users to see the history of changes made to the documentation.

Can you accommodate Spark DataFrames?

Our ValidMind Library can extract dataset quality metrics on Pandas DataFrame, NumPy arrays, or Spark DataFrame instances using standard metrics provided by popular open-source frameworks such as scikit-learn, statsmodels, and more.

Each test defines a mapping to the different supported dataset and/or model interfaces: when passing a Spark DataFrame, our library will directly call native evaluation metrics provided by the SparkML API or custom ones built by the developer (such as via UDFs).