AI/NLP Engineer – Extract Meaning from Complex Documents (Not a Simple Parsing Task) - Contract to Hire

Remote, USA Full-time Posted 2026-05-31
Apply Now

We are building a system that processes real-world documents (PDFs, emails, reports, and all types of text documents) and extracts meaningful structured signals from unstructured text.

This is not a simple parsing or summarization task.

The Problem,

The documents we work with are:

inconsistent in format

ambiguous in language

written by different authors with different styles

often contain indirect or implied recommendations

We need to extract signals such as:

findings

recommendations

actions

key clinical or operational statements

Several approaches have already been attempted:

rule-based extraction (regex, YAML rules) → too brittle

strict deterministic pipelines → fail on variability

basic LLM extraction → inconsistent and not reliable enough

We are looking for someone who can design and implement a robust signal extraction approach that can:

handle messy, real-world text

extract relevant signals with high recall

link extracted signals back to source text

produce structured outputs that can be used downstream

We are not looking for someone to just wire APIs.

We are looking for someone who can:

think through ambiguity

design an approach that works in practice

understand tradeoffs between flexibility and control

Required in Your Proposal,

Please answer the following:

How would you approach extracting meaningful signals from documents with inconsistent formatting and ambiguous language?

What would your pipeline look like at a high level?

What are the biggest failure points in this type of system?

Apply tot his job

Apply To this Job

Similar Jobs