AI/NLP Engineer – Extract Meaning from Complex Documents (Not a Simple Parsing Task) - Contract to Hire
We are building a system that processes real-world documents (PDFs, emails, reports, and all types of text documents) and extracts meaningful structured signals from unstructured text.
This is not a simple parsing or summarization task.
The Problem,
The documents we work with are:
inconsistent in format
ambiguous in language
written by different authors with different styles
often contain indirect or implied recommendations
We need to extract signals such as:
findings
recommendations
actions
key clinical or operational statements
Several approaches have already been attempted:
rule-based extraction (regex, YAML rules) → too brittle
strict deterministic pipelines → fail on variability
basic LLM extraction → inconsistent and not reliable enough
We are looking for someone who can design and implement a robust signal extraction approach that can:
handle messy, real-world text
extract relevant signals with high recall
link extracted signals back to source text
produce structured outputs that can be used downstream
We are not looking for someone to just wire APIs.
We are looking for someone who can:
think through ambiguity
design an approach that works in practice
understand tradeoffs between flexibility and control
Required in Your Proposal,
Please answer the following:
How would you approach extracting meaningful signals from documents with inconsistent formatting and ambiguous language?
What would your pipeline look like at a high level?
What are the biggest failure points in this type of system?
Apply tot his job
Apply To this Job