Department or Program

Computer Science

Primary Wellesley Thesis Advisor

Eni Mustafaraj


In the era of misinformation and machine learning, the fact-checking community is eager to develop semi-automated fact-checking techniques that can detect misinformation and present fact-checks alongside problematic content. This thesis explores the technical elements and social context of one "claim matching" system, Google's Reviewed Claims. The Reviewed Claims feature was one of the few user-facing interfaces in the complex socio-technical system between fact-checking organizations, news publishers, Google, and online information seekers. This thesis addresses the following research questions:

RQ1: How accurate was Google's Reviewed Claims feature?

RQ2: Is it possible to create a consensus definition for "relevant fact-checks" to enable the development of more successful automated fact-checking systems?

RQ3: How do different actors in the fact-checking ecosystem define relevance?

I pursue these research questions through a series of methods including qualitative coding, qualitative content analysis, quantitative data analysis, and user studies.

To answer RQ1, I qualitatively label the relevance of 118 algorithmically assigned fact-checks and find that 21% of fact-checks are not relevant to their assigned article.

To address RQ2, I find that three independent raters using a survey are only able to come to "fair-moderate agreement" about whether the algorithmically assigned fact-checks are relevant to the matched articles. A reconciliation process substantially raised their agreement. This indicates that further discussions may create a common understanding of relevance among information seekers. Using raters' open-ended justification responses, I generated 6 categories of justifications for their explanations. To further evaluate if information seekers shared a common definition of relevance, I asked Amazon Mechanical Turk workers to classify six different algorithmically assigned fact-checks and found that crowd workers were more likely to find the matched content relevant and were unable to agree on the justifications.

With regard to RQ3, a sociotechnical analysis finds that the fact-checking ecosystem is fraught with distrust and conflicting incentives between individual actors (news publishers distrust fact-checking organizations and platforms, fact-checking organizations distrust platforms, etc.). Future systems need to be interpretable and transparent about "relevance" and the ways in which claim matching is performed because of the distrust between the actors.

Fact-checking is dependent on nuance and context, AI is not technically sophisticated enough to account for these variables. As such, human-in-the-loop models seem to be essential to future semi-automated fact-checking approaches. However, my results indicate untrained crowd workers may not be the ideal candidates for modeling complex values in sociotechnical systems.