Email me to meet and discuss if you are interested before submitting your preferences!

Capturing inherent disagreement in textual inference (co-supervised with James Thorne)

Natural language inference (NLI) is a well studied task, in which models need to predict whether a textual hypothesis can is supported or refuted by a given premise. Recent research has demonstrated that while inherent disagreement among humans for this task are frequent, state of the art models cannot predict them (Pavlick and Kwiatkowski, 2019, Nie et al., 2020). In this project we will explore modelling approaches beyond the commonly employed cross-entropy objective to ameliorate this issue.

User simulator for engaging convesations on Wikipedia (co-supervised with Youmna Farag)

Modern neural dialogue agents are trained on large amounts of conversational data for their intended task/domain, often collected via Wizard-of-Oz experiments where a human plays the role of the agent interacting with a user. However, such experiments are often expensive and thus the data obtained is limited. A commonly employed solution to this problem is the development of a user simulator that can be used to train an initial dialogue agent further by generating more conversations. This kind of approach has been employed successfully in the context of task-oriented dialogue agents. In this project we will explore the development of user simulators on the Wizard of Wikipedia dataset and task, where the goal is to develop an agent that have an engaging conversation on a topic from Wikipedia. For this, apart from developing the user simulator, we will also need to develop a predictor of the engagingness rating as a measure of task success.

Fact-Checking in Dialogues (co-supervised with Marzieh Saeidi from Facebook)

Fact-checking is often mentioned in the context of social media conversations however most dataset are considering claims in isolation. Recently, the DialFact dataset was proposed with utterances from dialogues verified against Wikipedia. In this project we will explore how much does the dialogue context matter using this dataset, and how we can better exploit and adapt recent work on automated fact-checking both in terms of datasets and modelling advances (see here for a recent survey).

Identifying argument structure in dialogue data (co-supervised with Georgi Karadzhov)

Argument mining is an area of NLP, concerned with identifying argument structure in text. It is traditionally applied to essays and debates, but in this project we aim to explore how people argue their ideas in natural dialogues. We will be investigating both collaborative and negotiation dialogues, investigating potentially different argumentation strategies that are being used depending on the conversation objective. The project will first identify the type of the argument that is being used within a specific dialogue utterance (for example “Argument from cause to efect” or “Generic ad hominem”). Then, based on the type of the argument, we will aim to extract key parts of argument structure, including what are the premises and the conclusions. Ultimately, the goal of the project would be to (i) understand how people argue in different conversational settings, and (ii) evaluate how identifying the argument structure can be leveraged in dialogue systems research. The starting point for this project will be the dataset proposed in this paper by Feng and Hirst.

Mitigating biases in fact-checking models

Recent work has proposed training fact-checking models against contrastive evidence Schuster et al., 2021 in order to enahance their ability to process evidence and mitigate the hypothesis-only biases. In this project we will assess the ability of models trained on this data to counter against other types of biases and corresponding advesarial attacks (e.g. Thorne et al., 2019), as well as how these can be mitigated by bias-agnostic methods (e.g. Utama et al., 2020) as opposed to methods targetting specific biases.