CISS-VAE documentation
The Clustering-Informed Shared-Structure Variational Autoencoder (CISS-VAE) is a flexible deep learning model for missing data imputation that is particularly well-suited to MNAR (Missing Not at Random) scenarios where missingness patterns are informative. It also functions effectively under MAR (Missing at Random) assumptions. Please see our publication for more details.
The model uses unsupervised clustering to capture distinct patterns of missingness and leverages a mix of shared and unshared encoder and decoder layers, allowing knowledge transfer across clusters and enhancing parameter stability. Its iterative learning procedure improves imputation accuracy compared to traditional training approaches.
The CISS-VAE package also offers the ciss_vae.training.autotune.autotune() function, which can help select the best hyperparameters for your model within a user-defined search space.
The autotune function has compatibility with Optuna Dashboard for viewing hyperparameter importance trends.
The R package associated with this model can be found at rCISS-VAE.
Contents:
- CISS-VAE Quickstart
- How to use CISS-VAE
- Running the CISS-VAE Model
- Hyperparameter Tuning with Optuna
- Saving and loading models
- Handling binary data columns
- Handling Categorical Data Columns
- Using
create_missingness_prop_matrix: A Complete Guide - Integration with sample clustering
- Integration w/ CISS-VAE pipeline
- Full workflow
- Avoiding undesired imputation of certain missing data
- API Reference
Installation
To install via PyPI:
pip install ciss-vae
To install via github:
pip install git+https://github.com/CISS-VAE/CISS-VAE-python.git
The github repo can be found here.
Features
Cluster-specific VAE architecture
Compatible with real-world missing data (MAR, MNAR)
Optuna-based hyperparameter tuning
Need help? See the vignette or the full API reference.