Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?

Li, Wing Yan; Wang, Zeqiang; Johnson, Jon; De, Suparna

Abstract:Automated detection of semantically equivalent questions in longitudinal social science surveys is crucial for long-term studies informing empirical research in the social, economic, and health sciences. Retrieving equivalent questions faces dual challenges: inconsistent representation of theoretical constructs (i.e. concept/sub-concept) across studies as well as between question and response options, and the evolution of vocabulary and structure in longitudinal text. To address these challenges, our multi-disciplinary collaboration of computer scientists and survey specialists presents a new information retrieval (IR) task of identifying concept (e.g. Housing, Job, etc.) equivalence across question and response options to harmonise longitudinal population studies. This paper investigates multiple unsupervised approaches on a survey dataset spanning 1946-2020, including probabilistic models, linear probing of language models, and pre-trained neural networks specialised for IR. We show that IR-specialised neural models achieve the highest overall performance with other approaches performing comparably. Additionally, the re-ranking of the probabilistic model's results with neural models only introduces modest improvements of 0.07 at most in F1-score. Qualitative post-hoc evaluation by survey specialists shows that models generally have a low sensitivity to questions with high lexical overlap, particularly in cases where sub-concepts are mismatched. Altogether, our analysis serves to further research on harmonising longitudinal studies in social science.

Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2504.20679 [cs.CL]
	(or arXiv:2504.20679v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.20679

Computer Science > Computation and Language

Title:Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators