Data Engineer for AI

Data Engineer for AI

at Offshorly Ltd.

Full Time Remote

Apply Now

This is a remote position.

What the engineer will actually do:

P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
P3 | Produce concise handover docs for our future data architect.

Skill Set:
Must‑have (core):

2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
Basic SQL tuning and ability to work with structured schemas.
Git and CI/CD familiarity.

Nice‑to‑have (bonus):

Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.
Experience adding embeddings to tables for downstream ML or search.
Great Expectations or similar data‑quality tooling.
Familiarity with Unity Catalog or Snowflake RBAC concepts.

Location

Philippines (Remote)

Remote Job

Engineer

Job Overview

Job Posted:

3 months ago

Job Expires:

Job Type

Full Time

Share This Job:

AIJobs.ai is the leading source for AI Jobs and Careers in the fields of Artificial Intelligence, Machine Learning & Data Science.

We're one of the web's biggest sources of AI jobs from both AI Startups and established companies, with jobs numbering in the thousands.

© AI Jobs 2025 | All Rights Reserved