Department or Program


Additional Department or Program (if any)


Primary Wellesley Thesis Advisor

Qing Wang


Many questions in research can be rephrased as binary classification tasks, to find simple yes-or-no answers. For example, does a patient have a tumor? Should this email be classified as spam? For classifiers trained to answer these queries, area under the ROC (receiver operating characteristic) curve (AUC) is a popular metric for assessing the performance of a binary classification method, where a larger AUC score indicates an overall better binary classifier. However, due to sampling variation, the model with the largest AUC score for a given data set is not necessarily the optimal model. Thus, it is important to evaluate the variance of AUC. We first recognize that AUC can be estimated unbiasedly in the form of a two-sample U-statistic. We then propose a new method, an unbiased variance estimator of a general $K$-sample U-statistic, and apply it to evaluate the variance of AUC. We suggest choosing the most parsimonious model whose AUC score is within 1 standard error of the maximum AUC. The developed procedure improves model selection algorithms that weigh complexity and performance.

To realize the proposed unbiased variance estimator of AUC, we propose to use a partition resampling scheme that yields high computational efficiency. We conduct simulation studies to investigate the performance of the developed method in comparison to bootstrap and jackknife variance estimators. The simulations suggest that the proposal yields comparable or even better results in terms of bias and mean squared error. In addition, it has significantly improved computational efficiency compared to its resampling-based counterparts. Moreover, we also discuss the generalization of the devised method to estimating the variance of a general $K$-sample U-statistic (K >= 2), which has broad applications in practice.