projects
If you are interested in a PhD in any of the topics below contact me!
Automated Fact-Checking
Prime minster said: ”Our government has halved youth unemployment!” True or False? Fact checking is one of the main tasks performed by journalists, especially in an era in which information sources abound. In our first paper in 2014 we discussed the main challenges, namely the open domain nature of the task and the importance of context. We followed this up by papers on distantly supervised approach for fact-checking simple claims about statistical properties (EMNLP2015, EACL2017) and automating the debunking process of journalists (NAACL2016). and co-organizing the Fact Extraction and Verification workshop series developing systems using the 200K claims dataset we proposed in this NAACL2018 paper, as well adversarial attacks against such systems EMNLP 2019. We are now working on the 5-year ERC consolidator grant AVERiTeC to advance the state of the art towards more complex claims checked against a greater variety of sources, and we made available a new dataset, FEVEROUS, with claims verified against both sentences and claims from Wikipedia. Finally, we collaborated with project partners in the EU-funded SUMMA project developing fact checking platforms for journalists (WebConf2019), which we will advance further in the follow-up MONITIO project. See this recent presentation, as well as some media coverage.
Dialogue agents for online deliberation
Navajas et al. (2018) demonstrated that small groups outperform larger ones on common sense/knowledge questions, despite the latter having access to a larger pool of expertise. And even though small groups outperform individuals on simple logic tasks, they still fail at a substantial rate (Mercier and Sperber, 2011), suggesting that while group size matters, the communication among its members is likely to matter too. We have been investigating what makes dispute resolution work on Wikipedia (EACL 2021); see here for an interview with Christine who led the work. We are also collaborating with our project partners on the EPRSC research grant Opening Up Minds: engaging dialogue generated from argument maps, where we will explore developing dialogue agents inspired by the BBC’s program Moral Maze. See this video presented in the British Science Week 2022 for an overview. We are also exploring how people converse when solve logic problems, in particular the Wason card selection task, and collected the first publicly available dataset of such conversations (see here for a blog post on this by Tom Stafford, our collaborator from psychology, as well as an interview with him for the You Are Not So Smart podcast). As argued in this talk we gave at the Truth and Trust Online Conference 2020, dialogue agents enhancing online deliberation can help improve the effectiveness of fact checking (automated or not!). In October 2022 we organized a workshop titled Deliberation4Good with speakers from psychology, social sciences and computer science on this topic; you can read more about it here.
Imitation Learning for Structured Prediction
Imitation learning is a paradigm originally developed in robotics that has been applied successfully to a variety of structured prediction tasks in NLP. Intuitively, it decomposes the usually complex output (e.g. a graph) to a sequence of actions that construct it. These actions are predicted by a suitably trained policy. This framework has the advantage of being able to learn policies with non-decomposable loss functions without explicit enumeration of the output search space and has been applied successfully to a variety of applications, including information extraction (BMC Bioinformatics or EMNLP2015), semantic parsing (TACL2014, ACL2016) and natural language generation (Coling2016). We are collaborating with the Heriot Watt University NLP lab) and the UCL Machine Reading Group in the EPSRC-funded project Diligent. For an overview of imitation learning for structured prediction in NLP see our EACL2017 tutorial. Also look at our implementation of imitation learning around the excellent scikit-learn. More recently, we demonstrated that scheduled sampling, one of the most widely used imitation learning algorithms for RNN training, suffers from catastrophic forgetting and we improved it using elastic weight consolidation.