Data Scientist, Research Data Insights
Department: Technical
Employment Type: Full Time
Location: USA, East Coast (Home based)
Description
About us
We are
Digital Science and we are advancing the research ecosystem. We are a pioneering technology company, and our vision is of a future where a trusted and collaborative research ecosystem drives progress for all. We believe in better, open, collaborative and inclusive research. In creating the next generation of tools and working in partnership with the community we tackle some of the biggest challenges to research. In order to achieve our vision, we need innovative, inspiring and dynamic people to join our team. Want to join us?
Dimensions, part of the Digital Science family, is the world’s largest linked research information dataset, covering millions of research publications and connected by more than 1.3 billion citations. We are shaping the future of research and are looking for a Data Scientist to join the team.
Your new role
As part of a dynamic team environment you will support our global customers through the development of new analytic approaches and capabilities leveraging our scientometric data sets and emerging knowledge graph ecosystem. You will help our customers, including the largest funding and research organizations in the U.S. Federal government and beyond, to more effectively manage their multi-billion dollar research portfolios by providing delivery excellence that delights our customers, fuels word-of-mouth growth, and very high renewal rates. You will leverage our data and platforms, including Dimensions and the rest of Digital Sciences portfolio to support research assessment, portfolio management/analysis, strategic planning and more.
The role will touch all aspects of data analysis & delivery, from expanding and leveraging the Dimensions Knowledge Graph, to managing specialised analytic infrastructure resources in secure environments in support of specialised data indexing and analytic workloads, to data collection/wrangling, visualization, and the development & delivery of interactive dashboards and other applications. You will work closely with team members with a diversity of intellectual and professional backgrounds to harness our unique data and product capabilities to address our customer’s critical needs.
What you’ll be doing
- Conduct large-scale, quantitative data analysis (millions of records) potentially including custom indexing, data linking, data collection and other data wrangling using Dimensions in-house data assets and external or customer data sets as required.
- Leverage Large Language Models and other AI technologies to address customer analytical needs, identifying opportunities to incorporate these tools into analytic workflows and customer facing applications.
- Plan, design, maintain and document data integrations, pipelines, internal use utilities, tools and software packages to support our advanced analytic capabilities.
- Build machine-learning models that operate on large, text-based documents (10s - 100s of millions of documents), for a variety of applications including named entity resolution, relationship extraction, document clustering and topic modeling.
- Create and deploy visualizations and interactive web-based dashboards, using tools such as Plotly, Dash, and React.
What you’ll bring to the role
- You will have a good understanding of the S&T ecosystem - funders, research organizations, scientific publishing and related experience working with bibliometric/scientometric datasets such as scientific publications, grants, and patents.
- You will have familiarity with knowledge graphs (including technologies such as RDF and SPARQL). Ideally, you will experience building and querying knowledge graphs in support of analytic workloads leveraging bibliometric/scientometric data sets.
- You will have experience in Python, including relevant Python libraries and modules such as pandas, scikit learn, gensim, transformers, pyTorch and Dash.
- You will have familiarity with commercial AI models like GPT, Bard or Palm and ideally experience working with LLM support toolkits such as LangChain, Guidance, and Haystack.
- You’ll be experienced in Natural Language Processing and machine learning methods with bibliometric/scientometric datasets.
- You will have experience with data visualization tools (Plotly, D3, matplotlib etc)
- You will thrive in an environment where you can work independently and remotely
- You will have previous experience of working globally and across multiple teams
- You will be a strong communicator and able to communicate your findings to a varied audience through written and verbal presentation
- You will have 3-5 years of experience delivering customer solutions.
Not sure you meet all qualifications? Let us decide! Research shows that women and members of other under-represented groups tend to not apply to jobs when they think they may not meet every qualification, when in fact, they often do! We are committed to creating a diverse and inclusive environment and strongly encourage you to apply.
Additional Information
Current US Public Trust clearance preferred, as applicants will be subject to a security investigation and will need to meet eligibility requirements for access to sensitive information.
Living our Values
We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open, efficient and effective.
The talent we secure is fundamental to us achieving our vision and our growth plans. The values we live by are:
We are
brave in the pursuit of better
We are
collaborative and inclusive
We are
always open-minded
We are
from and for the community We're an equal opportunity employer. All applicants will be considered for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.