Rezapour, R, Reddy, S, Jones, R, Soboroff, I. (2022). What Makes a Good Podcast Summary?. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.


Abstractive summarization of podcasts is motivated by the growing popularity of podcasts and the needs of their listeners. Podcasting is a markedly different domain from news and other media that are commonly studied in the context of automatic summarization. As such, the qualities of a good podcast summary are yet unknown. Using a collection of podcast summaries produced by different algorithms alongside human judgments of summary quality obtained from the TREC 2020 Podcasts Track, we study the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that result in strong evaluations.


Rezapour, R, Dinh, L, Diesner, J. (2021). Incorporating the Measurement of Moral Foundations Theory into Analyzing Stances on Controversial Topics. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media (HT ’21), Virtual Event, ACM.


This paper investigates the correlation between moral foundations and the expression of opinions in the form of stance on different issues of public interest. This work is based on the assumption that the formation of values (personal and societal) and language are interrelated, and that we can observe differences in points of view in user-generated text data. We leverage the Moral Foundations Theory to expand the scope of stance analysis by examining the narratives in favor or against several topics. Applying an expanded version of the Moral Foundations Dictionary to a benchmark dataset for stance analysis, we capture and analyze the relationships between moral values and polarized online discussions. Using this enhanced methodology, we find that each social issue has different ''moral and lexical profiles.'' While some social issues project more authority-related words (Donald Trump), others consist of words related to care and purity (abortion and feminism). Our correlation analysis of stance and morality revealed notable associations between stances on social issues and various types of morality, such as care, fairness, and loyalty, hence demonstrating that there are certain morality types that are more attributed to stance classification than others. Overall, our analysis highlights the usefulness of considering morality when studying stance. The differences observed in various viewpoints and stances highlight linguistic variation in discourse, which may assist in analyzing cultural values and biases in society.

Rezapour, R, Reddy, S, Clifton, Jones, R.(2021). Spotify at TREC 2020: Genre-aware abstractive podcast summarization. In the Text Retrieval Conference (TREC).


This paper contains the description of our submissions to the summarization task of the Podcast Track in TREC (the Text REtrieval Conference) 2020. The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode using automatically generated transcripts of the podcast audio. Since podcasts vary with respect to their genre, topic, and granularity of information, we propose two summarization models that explicitly take genre and named entities into consideration in order to generate summaries appropriate to the style of the podcasts. Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries. The results of the submitted summaries show that our best model achieves an aggregate quality score of 1.58 in comparison to the creator descriptions and a baseline abstractive system which both score 1.49 (an improvement of 9%) as assessed by human evaluators.

Reddy, S, Yu, Y, Pappu, A, Sivaraman, A, Rezapour, R, Jones, R. (2021). Detecting extraneous content in podcasts. In Proceedings of the 16th Annual Meeting of the European chapter of the Association for Computational Linguistics (EACL).


Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.


Clifton, A, Reddy, S, Yu, Y, Pappu, A, Rezapour, R, Bonab, H, Karlgren, J, Carterette, B, Jones, R.(2020). 100,000 Podcasts: A Large-Scale spoken English document corpus. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). [pdf][bib]


As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with Automatic Speech Recognition (ASR) they represent a noisy but fascinating collection of text which can be studied through the lens of NLP, IR, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce a new corpus of 100,000 podcasts, and demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.

Sarol, J, Dinh, L, Rezapour, R, Chin, C, Yang, P, Diesner, J. (2020). An empirical methodology for detecting and prioritizing needs during crisis events. In Findings of Empirical Methods in Natural Language (EMNLP2020). [pdf][bib]


In times of crisis, identifying essential needs is crucial to providing appropriate resources and services to affected entities. Social media platforms such as Twitter contain a vast amount of information about the general public's needs. However, the sparsity of information and the amount of noisy content present a challenge for practitioners to effectively identify relevant information on these platforms. This study proposes two novel methods for two needs detection tasks: 1) extracting a list of needed resources, such as masks and ventilators, and 2) detecting sentences that specify who-needs-what resources (e.g., we need testing). We evaluate our methods on a set of tweets about the COVID-19 crisis. For extracting a list of needs, we compare our results against two official lists of resources, achieving 0.64 precision. For detecting who-needs-what sentences, we compared our results against a set of 1,000 annotated tweets and achieved a 0.68 F1-score.

Jiang, L, Dinh, L, Rezapour, R, Diesner, J. (2020). Which group do you belong to? Sentiment-based PageRank to measure formal and informal influence of nodes in networks. In Proceedings of the International Conference on Complex Networks and Their Applications. [pdf][bib]


Organizational networks are often hierarchical by nature as individuals take on roles or functions at various job levels. Prior studies have used either text-level (e.g., sentiment, affect) or structural-level features (e.g., PageRank, various centrality metrics) to identify influential nodes in networks. In this study, we use a combination of these two levels of information to develop a novel ranking method that combines sentiment analysis and PageRank to infer node-level influence in a real-world organizational network. We detect sentiment scores for all actor pairs based on the content of their email-based communication, and calculate their influence index using an enhanced PageRank method. Finally, we group individual nodes into distinct clusters according to their influence index. Compared to established network metrics designed or used to infer formal and informal influence and ground truth data on job levels, our metric achieves the highest accuracy for inferring formal influence (60.7%) and second highest for inferring informal influence (69.0%). Our approach shows that combining text-level and structural-level information is effective for identifying the job level of nodes in an organizational network.

Dinh, L.*,Rezapour, R.*, Jiang, L., Diesner, J. (under review). Structural balance in signed digraphs: considering transitivity to measure balance in graphs constructed by using different link signing methods. Scientific Reports.(*equal contribution) [pdf][bib]


A wide range of networks research has incorporated structural balance as both a theoretical and empirical foundation to explain various cognitive and social processes, including attitudes, beliefs, sentiment, and trust. Structural balance originates from Heider’s cognitive consistency theory, which postulates the drive for social relationships with dual opposite nature to lessen tension and maintain a balanced state. Since then, structural balance has been empirically tested for undirected triads, however, many real-world networks are directed. To address this gap, we incorporate directionality into the evaluation of structural balance by establishing transitivity as an additional condition for balance. We evaluate balance in three empirical networks, Enron Email Dataset, Avocado IT Email Collection, and network survey of 31 decision-making teams. Using natural language processing, we label the signs for the email networks in terms of sentiment and morality scores. For the decision-making dataset, we define signs in terms of the perceived trust scores between team members. Our results show that the balance ratios with transitivity considered is approximately 81.7% for morality, 69.5% for sentiment, and 72.7% for perceived trust. In sum, we demonstrate how balance can be computed for signed directed networks using theoretical constructs of structural balance, transitivity and empirical methods from natural language processing and network analysis.

Aref, S.*, Dinh, L.*, Rezapour, R.*, Diesner, J. (2020). Multilevel structural evaluation of signed directed social networks based on balance theory Scientific Reports 10, 15228. https://doi.org/10.1038/s41598-020-71838-6 (*equal contribution) [pdf] [bib][code]


Balance theory explains the forces behind the structure of social systems commonly modeled as static undirected signed networks. We expand the modeling to incorporate directionality of the edges and consider three levels of analysis: triads, subgroups, and the whole network. For triad-level balance, we utilize semicycles that satisfy the condition of transitivity. For subgroup-level balance, we derive measures of cohesiveness (internal solidarity) and divisiveness (external antagonism) to capture balance in subgroups using the most fitting partition of nodes into two groups. For network-level balance, we use the normalized line index which relies on the proportion of edges whose position suits balance. Through extensive computational analysis, we document frequently repeated patterns of social structure in triads, subgroups, and the whole network across a range of social settings from college students and Wikipedia users to philosophers and Bitcoin traders. We then apply our multilevel framework to examine balance in temporal and multilayer networks which demonstrates the generalizability of our approach and leads to new observations on balance with respect to time and layer dimensions. Our complementary findings on a variety of social networks highlight the need to evaluate balance at different levels for which we propose a comprehensive yet parsimonious approach.

Rezapour, R., Bopp, J., Fiedler, N., Steffen, D., Witt, A., Diesner, J. (2020). Beyond citation: Corpus-based methodsfor assessing the impact of research outcomes on society. In Proceedings of the 12th International Conference onLanguage Resources and Evaluation (LREC). (pp 6779-6787). [pdf] [bib]


This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.


Rezapour, R., Ferronato, P., Diesner, J. (2019). How do moral values differ in tweets on social movements?. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing (CSCW19 Companion). (pp. 347-351). [pdf] [bib][Data]


In this paper, we analyze and compare the representation of social movements on social media from the perspective of morality. Following previous research, which found associations between morality, collective action, and social decision-making, we postulate that moral values are distinct across different movements since these movements represent moral values of those who support or oppose them. The result of our analysis of four movements as represented on Twitter (#BlackLivesMatter, #WhiteLivesMatter, #AllLivesMatter, and #BlueLivesMatter) reveal that #BlueLivesMatter represents values such as Care, Harm, Loyalty, and Authority, while #WhiteLivesMatter features Harm and Fairness. Moreover, we find that Harm is the most prominent moral value in all of our datasets. Our analysis provides a robust understanding of authors’ moral stances, which contextualizes the influence of movements on people, and how these movements are perceived in society.

Rezapour, R., Shah, S. H., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). (pp. 35-45). [pdf] [bib] [Expanded Morality Lexicon]


We investigate the relationship between basic principles of human morality and the expression of opinions in user-generated text data. We assume that people’s backgrounds, culture, and values are associated with their perceptions and expressions of everyday topics, and that people’s language use reflects these perceptions. While personal values and social effects are abstract and complex concepts, they have practical implications and are relevant for a wide range of NLP applications. To extract human values (in this paper, morality) and measure social effects (morality and stance), we empirically evaluate the usage of a morality lexicon that we expanded via a quality controlled, human in the loop process. As a result, we enhanced the Moral Foundations Dictionary in size (from 324 to 4,636 syntactically disambiguated entries) and scope. We used both lexical for feature based and deep learning classification (SVM, RF, and LSTM) to test their usefulness for measuring social effects. We find that the enhancement of the original lexicon led to measurable improvements in prediction accuracy for the selected NLP tasks.


Rezapour, R. (2018). Using Linguistic Cues for Analyzing Social Movements. arXiv preprint arXiv:1808.01742. [pdf][bib]


With the growth of social media usage, social activists try to leverage this platform to raise the awareness related to a social issue and engage the public worldwide. The broad use of social media platforms in recent years, made it easier for the people to stay up-to-date on the news related to regional and worldwide events. While social media, namely Twitter, assists social movements to connect with more people and mobilize the movement, traditional media such as news articles help in spreading the news related to the events in a broader aspect. In this study, we analyze linguistic features and cues, such as individualism vs. pluralism, sentiment and emotion to examine the relationship between the medium and discourse over time. We conduct this work in a specific application context, the “Black Lives Matter” (BLM) movement, and compare discussions related to this event in social media vs. news articles.

Witt, A, Diesner, J, Steffen, D, Rezapour, R, Bopp, J, Fiedler, N, Köller, C, Raster, M, Wockenfuß, J. (2018). Impact of scientific research beyond academia: An alternative classification schema. In Proceedings of 1st Workshop on Computational Impact Detection from Text Data, 11th Conference on Language Resources and Evaluation (LREC) , Miyazaki, Japan. [pdf][bib]


The actual or anticipated impact of research projects can be documented in scientific publications and project reports. While project reports are available at varying level of accessibility, they might be rarely used or shared outside of academia. Moreover, a connection between outcomes of actual research project and potential secondary use might not be explicated in a project report. This paper outlines two methods for classifying and extracting the impact of publicly funded research projects. The first method is concerned with identifying impact categories and assigning these categories to research projects and their reports by extension by using subject matter experts; not considering the content of research reports. This process resulted in a classification schema that we describe in this paper. With the second method which is still work in progress, impact categories are extracted from the actual text data.


Rezapour, R., & Diesner, J. (2017). Classification and detection of micro-level impact of issue-focused documentary films based on reviews. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. (pp. 1419-1431). [pdf] [bib]


We present novel research at the intersection of review mining and impact assessment of issue-focused information products, namely documentary films. We develop and evaluate a theoretically grounded classification schema, related codebook, corpus annotation, and prediction model for detecting multiple types of impact that documentaries can have on individuals, such as change versus reaffirmation of behavior, cognition, and emotions, based on user-generated content, i.e., reviews. This work broadens the scope of review mining tasks, which typically comprise the prediction of ratings, helpfulness, and opinions. Our results suggest that documentaries can change or reinforce peoples’ conception of an issue. We perform supervised learning to predict impact on the sentence level by using data driven as well as predefined linguistic, lexical, and psychological features; achieving an accuracy rate of 81% (F1) when using a Random Forest classifier, and 73% with a Support Vector Machine.

Rezapour, R., Wang, L., Abdar, O., & Diesner, J. (2017). Identifying the overlap between election result and candidates’ ranking based on hashtag-enhanced, lexicon-based sentiment analysis. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). (pp. 93-96). [pdf] [bib]

The popularity and availability of Twitter as a service and a data source have fueled the interest in sentiment analysis. Previous research has shed light on the challenges that contextualizing effects and linguistic complexities pose for the accurate sentiment classification of tweets. We test the effect of adding manually-annotated, corpus-based hashtags to a sentiment lexicon; finding that this step in combination with negation detection increases prediction accuracy by about 7%. We then use our enhanced model to identify and rank the candidates of the Republican and Democratic Party of the 2016 New York primary election by the decreasing ratio of tweets that mentioned these individuals and had positive valence, and compare our results to the election outcome.

Addawood, A., Rezapour, R., Abdar, O., & Diesner, J. (2017). Telling apart tweets associated with controversial versus non-controversial topics. In Proceedings of the Second Workshop on NLProcessing and Computational Social Science (NLP+CSS) at 55th Annual Meeting of the Association for Computational Linguistics (ACL). (pp. 32-41). [pdf] [bib]


In this paper, we evaluate the predictability of tweets associated with controversial versus non-controversial topics. As a first step, we crowd-sourced the scoring of a predefined set of topics on a Likert scale from non-controversial to controversial. Our feature set entails and goes beyond sentiment features, e.g., by leveraging empathic language and other features that have been previously used, but are new for this particular study. We find focusing on the structural characteristics of tweets to be beneficial for this task. Using a combination of emphatic, language-specific, and Twitter-specific features for supervised learning resulted in 87% accuracy (F1) for cross-validation of the training set and 63.4% accuracy when using the test set. Our analysis shows that features specific to Twitter or social media in general are more prevalent in tweets on controversial topics than in non-controversial ones. To test the premise of the paper, we conducted two additional sets of experiments, which led to mixed results. This finding will inform our future investigations into the relationship between language use on social media and the perceived controversiality of topics.


Diesner, J., Rezapour, R., & Jiang, M. (2016). Assessing public awareness of social justice documentary films based on news coverage versus social media. IConference 2016 Proceedings. [pdf] [bib]


The comprehensive measurement of the impact that information products have on individuals, groups and society is of practical relevance to many actors, including philanthropic funding organizations. In this paper we focus on assessing one dimension of impact, namely public awareness, which we conceptualize as the amount and substance of attention that information products gain from the press and social media. We are looking at a type of products that philanthropic organizations fund, namely social justice documentaries. Using topic modeling as a text summarization technique, we find that films from certain domains, such as “Politics and Government” and “Environment and Nature,” attract more attention than productions on others, such as “Gender and Ethnicity”. We also observe that film-related public discourse on social media (Facebook and non-expert reviews) has a higher overlap with the content of a film than press coverage of films does. This is partially due to the fact that social media users focus more on the topics of a production whereas the press pays strong attention to cinematographic and related features.


Diesner, J., & Rezapour, R. (2015). Social computing for impact assessment of social change projects. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction (SBP). (pp. 34-43). [pdf] [bib]


One problem that both philanthropic foundations and scientific organizations have recently started to tackle more seriously is assessing the societal impact of the work they are funding by going beyond traditional methods and metrics. In collaboration with makers and funders of social justice information products, we have been leveraging social computing techniques for practical impact assessment. In this paper, we identify which of the main impact goals as defined in the social change domain can be assessed by using our computational solution, illustrate our approach with an empirical case study, and compare our findings to those that can be obtained with traditional methods. We find that our solution can complement and enhance the findings and interpretations that can be obtained with standard techniques used in the given application domain, especially when applying data mining techniques to natural language text data, such as representations of public awareness, dialogue and engagement around various issues in their cultural contexts.