The candidate will work in high visibility projects as a Data Scientist, bringing the Data Science and NLP expertise to projects. The candidate will work in RD Data Science team and collaborate with Product's managers, domain experts, Knowledge representation experts, to build high value outcome from Elsevier content. The candidate will have an opportunity to impact virtually all Elsevier applications related to Research and Operations.

In an age where scientific discovery is increasingly dynamic and prolific, making complex

research accessible to the general public is crucial for fostering informed engagement with science.

Scientific articles are often written in highly specialized language that can be difficult for non-experts

to understand. To bridge this communication gap, there is a growing need for tools that

can distill and present scientific findings in a more digestible and engaging format. While existing

models have achieved success in generating news-like summaries from single articles [2][3], these

approaches often fall short when dealing with comprehensive topics that span multiple sources.

Scientific research is rarely confined to a single study; instead, it typically builds on a body of

work that spans various articles and publications. Thus, summarizing multiple relevant articles

into a cohesive, accessible news or blog post presents a unique challenge and opportunity. Our

project aims to tackle this challenge by developing a system that generates news-like summaries

based on multiple full-text scientific articles. This approach will allow us to synthesize and

integrate insights from various studies, offering a more realistic perspective on scientific topics.

By leveraging Large Language Models (LLMs) and fine-tuning them for this specific task, we

intend to create content that not only captures the essence of complex research but also presents it

in a manner that is engaging and understandable to a broader audience. To achieve this, we will

adapt and enhance existing LLMs, originally designed for single-article summaries, to handle

multiple sources. This will involve fine-tuning these models using publicly available datasets and

developing methods to evaluate their performance. We will experiment with a multi-agent setting

in which several LLMs (a planner agent, a summarizer agent, and a reflection agent) will interact

together to increase the readability and completeness of the generated simplified documents. The

evaluation will be done based on the quality of generated summaries as judged by an LLM on

readability, accuracy, and engagement. Furthermore, we will create a benchmark set to assess the

quality of the generated posts. We will measure the performance of fine-tuned models by standard

evaluation metrics (ROUGE, BLEU) and human assessments.This project aims at making

scientific knowledge more accessible, contributing to better public understanding of scientific

advancements and fostering a more informed society. The findings of this research will be

summarized as a research article and submitted to a high-quality peer-reviewed conference or

journal.

-----------------------------------------------------------------------

Elsevier is an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form: https://forms.office.com/r/eVgFxjLmAK , or please contact 1-855-833-5120.

Please read our Candidate Privacy Policy.

Location

NLD Amsterdam (Radarweg)

Job Overview
Job Posted:
5 days ago
Job Expires:
Job Type
Full Time Intern

Share This Job: