This is a remote position.

What the engineer will actually do:
  • P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
  • P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
  • P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
  • P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
  • P3 | Produce concise handover docs for our future data architect.

Skill Set:
Must‑have (core):
  • 2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
  • Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
  • Basic SQL tuning and ability to work with structured schemas.
  • Git and CI/CD familiarity.
Nice‑to‑have (bonus):
  • Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.
  • Experience adding embeddings to tables for downstream ML or search.
  • Great Expectations or similar data‑quality tooling.
  • Familiarity with Unity Catalog or Snowflake RBAC concepts.


Location

Philippines (Remote)

Remote Job

Job Overview
Job Posted:
1 month ago
Job Expires:
Job Type
Full Time

Share This Job: