The candidate will work in high visibility projects as a Data Scientist, bringing the Data Science and NLP expertise to projects. The candidate will work in RD Data Science team and collaborate with Product's managers, domain experts, Knowledge representation experts, to build high value outcome from Elsevier content. The candidate will have an opportunity to impact virtually all Elsevier applications related to Research and Operations.
In an age where scientific discovery is increasingly dynamic and prolific, making complex
research accessible to the general public is crucial for fostering informed engagement with science.
Scientific articles are often written in highly specialized language that can be difficult for non-experts
to understand. To bridge this communication gap, there is a growing need for tools that
can distill and present scientific findings in a more digestible and engaging format. While existing
models have achieved success in generating news-like summaries from single articles [2][3], these
approaches often fall short when dealing with comprehensive topics that span multiple sources.
Scientific research is rarely confined to a single study; instead, it typically builds on a body of
work that spans various articles and publications. Thus, summarizing multiple relevant articles
into a cohesive, accessible news or blog post presents a unique challenge and opportunity. Our
project aims to tackle this challenge by developing a system that generates news-like summaries
based on multiple full-text scientific articles. This approach will allow us to synthesize and
integrate insights from various studies, offering a more realistic perspective on scientific topics.
By leveraging Large Language Models (LLMs) and fine-tuning them for this specific task, we
intend to create content that not only captures the essence of complex research but also presents it
in a manner that is engaging and understandable to a broader audience. To achieve this, we will
adapt and enhance existing LLMs, originally designed for single-article summaries, to handle
multiple sources. This will involve fine-tuning these models using publicly available datasets and
developing methods to evaluate their performance. We will experiment with a multi-agent setting
in which several LLMs (a planner agent, a summarizer agent, and a reflection agent) will interact
together to increase the readability and completeness of the generated simplified documents. The
evaluation will be done based on the quality of generated summaries as judged by an LLM on
readability, accuracy, and engagement. Furthermore, we will create a benchmark set to assess the
quality of the generated posts. We will measure the performance of fine-tuned models by standard
evaluation metrics (ROUGE, BLEU) and human assessments.This project aims at making
scientific knowledge more accessible, contributing to better public understanding of scientific
advancements and fostering a more informed society. The findings of this research will be
summarized as a research article and submitted to a high-quality peer-reviewed conference or
journal.
-----------------------------------------------------------------------
Elsevier is an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form: https://forms.office.com/r/eVgFxjLmAK , or please contact 1-855-833-5120.
Please read our Candidate Privacy Policy.