
đź§‘Pape Ibrahima Thiam, Yohann Chasseray, Josiane Mothe, Mathieu Roche, and Maguelonne Teisseire
đź“…November 2025
Territorial food systems, captured here through French-language documents, demand robust Named Entity Recognition (NER) despite scarce annotations and long, heterogeneous texts. In this work, we tackle two challenges: (1) segmenting long documents and (2) adapting open-schema NER to a specialized, low-resource domain. First, we compare four segmentation strategies in zero-shot settings to quantify precision–recall trade-offs. Then, we propose a semi-supervised pipeline that fine-tunes GLiNER and NuNER using a small manually annotated seed set followed by a large pseudo-labeled corpus built via cross-model agreement. Evaluations on a test set and the full annotated corpus show that pseudo-label fine-tuning consistently outperforms training on human-labeled data alone. The study also exposes strategy-specific strengths and weaknesses, underscoring that optimizing segmentation materially affects NER in domain-specific, low-resource scenarios. Our results provide practical guidance for deploying NER in territorial food systems and comparable specialized domains.