Publications
* denotes equal contribution and joint lead authorship. See Scholar for latest publications.
2024
Human Alignment of Large Language Models through Online Preference Optimisation.
2024.
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.
2024.
LiPO: Listwise Preference Optimization through Learning-to-Rank.
2024.
Gemini: A Family of Highly Capable Multimodal Models.
2024.
Statistical Rejection Sampling Improves Preference Optimization.
In The Twelfth International Conference on Learning Representations 2024.
Offline Regularised Reinforcement Learning for Large Language Models Alignment.
2024.
2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback.
2023.
Calibrating Sequence likelihood Improves Conditional Language Generation.
In The Eleventh International Conference on Learning Representations 2023.
- EACL23
Unsupervised Keyphrase Extraction via Interpretable Neural Networks.
In Findings of the Association for Computational Linguistics: EACL 2023 2023.
Keyphrase extraction aims at automatically extracting a list of “important” phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via embedding clustering or graph centrality, requiring extensive domain expertise. Our work presents a simple alternative approach which defines keyphrases as document phrases that are salient for predicting the topic of the document. To this end, we propose INSPECT—an approach that uses self-explaining models for identifying influential keyphrases in a document by measuring the predictive impact of input phrases on the downstream task of the document topic classification. We show that this novel method not only alleviates the need for ad-hoc heuristics but also achieves state-of-the-art results in unsupervised keyphrase extraction in four datasets across two domains: scientific publications and news articles. Calibrating Likelihoods towards Consistency in Summarization Models.
2023.
2021
DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues.
In International Conference on Learning Representations 2021.
To successfully negotiate a deal, it is not enough to communicate fluently: pragmatic planning of persuasive negotiation strategies is essential. While modern dialogue agents excel at generating fluent sentences, they still lack pragmatic grounding and cannot reason strategically. We present DialoGraph, a negotiation system that incorporates pragmatic strategies in a negotiation dialogue using graph neural networks. DialoGraph explicitly incorporates dependencies between sequences of strategies to enable improved and interpretable prediction of next optimal strategies, given the dialogue context. Our graph-based method outperforms prior state-of-the-art negotiation models both in the accuracy of strategy/dialogue act prediction and in the quality of downstream dialogue response generation. We qualitatively show further benefits of learned strategy-graphs in providing explicit associations between effective negotiation strategies over the course of the dialogue, leading to interpretable and strategic dialogues.ResPer: Computationally Modelling Resisting Strategies in Persuasive Conversations..
In Proceedings of the 2021 Conference on European Association for Computational Linguistics 2021.
- JBI21
Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.
In Journal of Biomedical Informatics 2021.
Objectives Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction—extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types. Methods We experiment with five off-the-shelf biomedical NLP toolkits on four benchmark datasets for medical information extraction from scientific literature and clinical notes. All toolkits adopt a staged approach of mention detection followed by two stages of medical entity linking: (1) generating a list of candidate concepts, and (2) picking the best concept among them. We introduce a semantic type prediction module to alleviate the problem of overgeneration of candidate concepts by filtering out irrelevant candidate concepts based on the predicted semantic type of a mention. We present MedType, a fully modular semantic type prediction model which we integrate into the existing NLP toolkits. To address the dearth of broad-coverage training data for medical information extraction, we further present WikiMed and PubMedDS, two large-scale datasets for medical entity linking. Results Semantic type filtering improves medical entity linking performance across all toolkits and datasets, often by several percentage points of F-1. Further, pretraining MedType on our novel datasets achieves state-of-the-art performance for semantic type prediction in biomedical text. Conclusions Semantic type prediction is a key part of building accurate NLP pipelines for broad-coverage information extraction from biomedical text. We make our source code and novel datasets publicly available to foster reproducible research.
2020
- ICWSM20
Analysing the Extent of Misinformation in Cancer Related Tweets.
In Proceedings of the International AAAI Conference on Web and Social Media 2020.
Keeping Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions.
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020.
The notion of face refers to the public self-image of an individual that emerges both from the individual’s own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we propose a generalized framework for modeling face acts in persuasion conversations, resulting in a reliable coding manual, an annotated corpus, and computational models. The framework reveals insights about differences in face act utilization between asymmetric roles in persuasion conversations. Using computational models, we are able to successfully identify face acts as well as predict a key conversational outcome (e.g. donation success). Finally, we model a latent representation of the conversational state to analyze the impact of predicted face acts on the probability of a positive conversational outcome and observe several correlations that corroborate previous findings.Tartan: A Two-Tiered Dialog Framework for Multi-Domain Social Chitchat.
In Proceedings of the 3rd Alexa Prize 2019 2020.
MedType: Improving Medical Entity Linking with Semantic Type Prediction.
In arXiv e-prints 2020.
LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification.
In Proceedings of the 14th Internation Workshop on Semantic Evaluation 2020.
AMUSED: A Multi-Stream Vector Representation Method for Use in Natural Dialogue.
In Proceedings of the 2020 International Conference on Language Resources and Evaluation 2020.
The problem of building a coherent and non-monotonous conversational agent with proper discourse and coverage is still an area of open research. Current architectures only take care of semantic and contextual information for a given query and fail to completely account for syntactic and external knowledge which are crucial for generating responses in a chit-chat system. To overcome this problem, we propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the the neighborhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques. Furthermore, we use AMUSED to represent query and responses along with its context to develop a retrieval based conversational agent which has been validated by expert linguists to have comprehensive engagement with humans.- ICWSM20
Analysing the Extent of Misinformation in Cancer Related Tweets.
In 14th International Conference on Web and Social Media, 2020 2020.
Twitter has become one of the most sought after places to discuss a wide variety of topics, including medically relevant issues such as cancer. This helps spread awareness regarding the various causes, cures and prevention methods of cancer. However, no proper analysis has been performed, which discusses the validity of such claims. In this work, we aim to tackle the misinformation spread in such platforms. We collect and present a dataset regarding tweets which talk specifically about cancer and propose an attention-based deep learning model for automated detection of misinformation along with its spread. We then do a comparative analysis of the linguistic variation in the text corresponding to misinformation and truth. This analysis helps us gather relevant insights on various social aspects related to misinformed tweets. LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification.
In Proceedings of the Fourteenth Workshop on Semantic Evaluation 2020.
In this paper we describe our submission for the task of Propaganda Span Identification in news articles. We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda. The ”multi-granular” model incorporates linguistic knowledge at various levels of text granularity, including word, sentence and document level syntactic, semantic and pragmatic affect features, which significantly improve model performance, compared to its language-agnostic variant. To facilitate better representation learning, we also collect a corpus of 10k news articles, and use it for fine-tuning the model. The final model is a majority-voting ensemble which learns different propaganda class boundaries by leveraging different subsets of incorporated knowledge.
2018
RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information.
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018.
Distantly-supervised Relation Extraction (RE) methods train an extractor by automatically aligning relation instances in a Knowledge Base (KB) with unstructured text. In addition to relation instances, KBs often contain other relevant side information, such as aliases of relations (e.g., founded and co-founded are aliases for the relation founderOfCompany). RE models usually ignore such readily available side information. In this paper, we propose RESIDE, a distantly-supervised neural relation extraction method which utilizes additional side information from KBs for improved relation extraction. It uses entity type and relation alias information for imposing soft constraints while predicting relations. RESIDE employs Graph Convolution Networks (GCN) to encode syntactic information from text and improves performance even when limited side information is available. Through extensive experiments on benchmark datasets, we demonstrate RESIDE’s effectiveness. We have made RESIDE’s source code available to encourage reproducible research.