Post-clustering Inference under Dependency

The recent work by Gao et al. (2022) laid the foundation for post-clustering inference. For the first time, the authors established a theoretical framework allowing to test for differences between means of estimated clusters. Additionally, they studied the estimation of unknown parameters while controlling the selective type I error. However, their theory was developed for independent observations identically distributed as $p$-dimensional Gaussian variables with spherical covariance matrix.

Here, we aim at extending this framework to a more convenient scenario for practical applications, where arbitrary dependence structures between observations and features are allowed. We show that a $p$-value for post-clustering inference under general dependency can be defined, and assess the theoretical conditions allowing the compatible estimation of one covariance matrix. The theory is developed for hierarchical clustering algorithms with several types of linkages, and for $k$-means algorithm. We illustrate our method with synthetic data and real data of protein structures.