Department or Program


Primary Wellesley Thesis Advisor

Professor Qing Wang


Classification is the task of predicting the label(s) of future instances by learning and inferring from the patterns of instances with known labels. Traditional classification methods focus on single-label classification; however, many real-life problems require multi-label classification that classifies each instance into multiple categories. For example, in sentiment analysis, a person may feel multiple emotions at the same time; in bioinformatics, a gene or protein may have a number of functional expressions; in text categorization, an email, medical record, or social media posting can be identified by various tags simultaneously. As a result of such wide a range of applications, in recent years, multi-label classification has become an emerging research area.

There are two general approaches to realize multi-label classification: problem transformation and algorithm adaption. The problem transformation methodology, at its core, converts a multi-label dataset into several single-label datasets, thereby allowing the transformed datasets to be modeled using existing binary or multi-class classification methods. On the other hand, the algorithm adaption methodology transforms single-label classification algorithms in order to be applied to original multi-label datasets.

This thesis proposes a new method, called Multi-Label Super Leaner (MLSL), which is a stacking-based heterogeneous ensemble method. An improved multi-label classification algorithm following the problem transformation approach, MLSL combines the prediction power of several multi-label classification methods through an ensemble algorithm, super learner. The performance of this new method is compared to existing problem transformation algorithms, and our numerical results show that MLSL outperforms existing algorithms for almost all of the performance metrics.