Spaces:

cmboulanger
/

tei-annotator

Sleeping

App Files Files Community

cmboulanger commited on 15 days ago

Commit

89b03df

verified ·

1 Parent(s): 48e9493

Upload folder using huggingface_hub

Browse files

Files changed (36) hide show

.claude/settings.local.json +14 -0
.claude/skills/optimize-element-descriptions/SKILL.md +124 -0
.github/workflows/ci.yml +28 -0
.github/workflows/release.yml +81 -0
.local/eval-baseline.log +41 -0
.local/eval-batch10.log +41 -0
.local/evaluate-llm.log +112 -0
.local/gemini-batch10-full.log +43 -0
.local/gemini-batch10-full.progress +6 -6
.local/kisski-batch1.log +43 -0
.local/kisski-batch1.progress +4 -12
.local/kisski-batch10-t600.log +43 -0
.local/kisski-batch10-t600.progress +6 -6
.local/kisski-batch10.log +68 -0
.local/kisski-batch10.progress +2 -5
.local/kisski-batch162.log +11 -0
.local/kisski-batch162.progress +2 -0
.local/kisski-batch50.log +17 -0
.local/kisski-batch50.progress +2 -0
.pytest_cache/.gitignore +2 -0
.pytest_cache/CACHEDIR.TAG +4 -0
.pytest_cache/README.md +8 -0
.pytest_cache/v/cache/lastfailed +1 -0
.pytest_cache/v/cache/nodeids +162 -0
.python-version +1 -0
.releaserc.json +34 -0
CHANGELOG.md +93 -0
CLAUDE.md +149 -0
README.md +1 -1
package-lock.json +0 -0
package.json +32 -0
pyproject.toml +0 -2
requirements.txt +7 -0
schema/tei-bib.rng +0 -0
tei_annotator/providers/README.md +1 -1
webservice/nginx.conf +86 -0

.claude/settings.local.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "permissions": {
+    "allow": [
+      "Bash(xargs grep -l \"webservice\\\\|fastapi\\\\|flask\")",
+      "WebSearch",
+      "WebFetch(domain:router.huggingface.co)",
+      "Bash(/Users/cboulanger/.local/bin/hf auth:*)",
+      "Bash(uv sync:*)"
+    ]
+  },
+  "enabledPlugins": {
+    "hf-cli@huggingface-skills": true
+  }
+}

.claude/skills/optimize-element-descriptions/SKILL.md ADDED Viewed

	@@ -0,0 +1,124 @@

+---
+name: optimize-element-descriptions
+description: Iteratively improve TEIElement descriptions and schema rules to maximise F1 against the gold standard. Use when annotation quality is low or when evaluation shows missed or spurious spans.
+disable-model-invocation: true
+argument-hint: "--max-items N --provider gemini|kisski|all"
+---
+# optimize-element-descriptions
+Iteratively improve the `TEIElement` descriptions and `TEISchema.rules` in the relevant schema file under `tei_annotator/schemas/` to maximise F1 score against the gold standard.
+Schema files:
+- `tei_annotator/schemas/bibl.py` — `build_bibl_schema()`
+- `tei_annotator/schemas/bibl_reference_segmenter.py` — `build_bibl_reference_segmenter_schema()`
+Before writing any descriptions, read the guidelines in [docs/tei-element-descriptions.md](../../../docs/tei-element-descriptions.md).
+Extra arguments passed to this skill (e.g. `--max-items 10 --provider gemini`) are forwarded to `evaluate_llm.py` where applicable.
+---
+## Workflow
+### Step 1 — Baseline evaluation
+Run a full evaluation with `--verbose` and `--match-mode overlap` to capture missed and spurious spans for every failing record:
+```bash
+uv run scripts/evaluate_llm.py --verbose --match-mode overlap $ARGUMENTS
+```
+Record the overall Micro F1, per-element F1, and the text of the lowest-scoring records.
+---
+### Step 2 — Diagnose failure patterns
+For each record where F1 < 1.0, analyse the `missed=` and `spurious=` lists alongside the Gold and Annotation lines shown by `--verbose`.
+Group failures into patterns such as:
+| Pattern | Typical cause |
+|---|---|
+| Span emitted as wrong element (spurious + missed same text) | Conflicting or missing negative constraint in description |
+| Required parent span missing (e.g. `author` around `orgName`) | Parent–child relationship not described from both sides |
+| Multiple instances merged into one span | No explicit "one span per …" instruction |
+| Span boundary includes surrounding punctuation | Span boundary not specified in description |
+| Positional trigger missed (e.g. editor after "in") | Contextual keyword triggers absent from description |
+Focus on patterns that affect **multiple records or both models**: single-record anomalies may be gold-standard issues, not description issues.
+---
+### Step 3 — Improve descriptions
+Read the relevant schema file under `tei_annotator/schemas/` to see the current descriptions, then edit the builder function following the guidelines in [docs/tei-element-descriptions.md](../../../docs/tei-element-descriptions.md).
+Key principles (summary):
+- Phrase everything as "emit a span", not "wrap in a tag"
+- State multiplicity explicitly: "a separate span for each distinct …"
+- Describe parent–child direction from both sides with a concrete example
+- Add negative constraints: "never tag X as Y"
+- Include textual triggers (keywords, position) and inline surface-form examples
+- Prefix critical constraints with `CRITICAL:`
+- If a failure pattern affects **multiple element types**, add the constraint to `TEISchema.rules` instead of duplicating it in each element description — the prompt renders `rules` as a numbered "General Rules" section before all element descriptions.
+Only edit descriptions for elements where you identified a clear failure pattern.
+---
+### Step 4 — Targeted re-evaluation with `--grep`
+Build a grep pattern from the text of the failing records identified in Step 1, then re-run only those records:
+```bash
+uv run scripts/evaluate_llm.py --verbose --match-mode overlap \
+    --grep "pattern1|pattern2|..." $ARGUMENTS
+```
+Compare the new F1 values against the Step 1 baseline for each affected record.
+---
+### Step 5 — Decide: iterate or stop
+**Iterate (go to Step 2)** if:
+- At least one record improved and no regressions were introduced, AND
+- Remaining failures still show patterns addressable by description changes
+**Stop** if any of the following apply:
+- No improvement across two consecutive rounds
+- Remaining failures appear to be gold-standard annotation issues (flag these for human review; see Step 5a)
+- Failures are caused by model-level reasoning limits that description changes cannot fix (e.g. a model consistently ignoring a rule that is already clearly stated)
+---
+### Step 5a — Handle editorial ambiguities with `cert="low"`
+If a failure pattern **persists across model families** after two or more rule iterations and the boundary in question reflects a genuine editorial choice (either split or merged would be defensible), do **not** continue iterating on the prompt. Instead, update the gold file:
+1. Split the merged gold span into two adjacent spans with **no tail text** between them.
+2. Set `cert="low"` on the **second** span.
+```xml
+<!-- before -->
+<bibl><label>5</label> Commentary mentioning Althusser; see Bunn (2015).<lb/> </bibl>
+<!-- after -->
+<bibl><label>5</label> Commentary mentioning Althusser;</bibl><bibl cert="low">see Bunn (2015).<lb/> </bibl>
+```
+The evaluator's union-match pass then accepts either model behaviour (split or merged) as correct. See [tei_annotator/evaluation/README.md](../../../tei_annotator/evaluation/README.md#uncertain-boundary-gold-spans-certlow) for the full specification.
+---
+### Step 6 — Full re-evaluation (final)
+Once iterations are complete, run a full evaluation without `--grep` to confirm that overall F1 has not regressed on records that were previously correct:
+```bash
+uv run scripts/evaluate_llm.py --verbose --match-mode overlap $ARGUMENTS
+```
+Report the final Micro F1 and per-element breakdown, noting which elements improved and which remain problematic.

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: CI
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+jobs:
+  test:
+    name: Run Tests
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+      - name: Set up uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: "3.12"
+          enable-cache: true
+      - name: Install dependencies
+        run: uv sync
+      - name: Run tests
+        run: uv run pytest

.github/workflows/release.yml ADDED Viewed

	@@ -0,0 +1,81 @@

+name: Release
+on:
+  workflow_run:
+    workflows: ["CI"]
+    types:
+      - completed
+    branches:
+      - main
+permissions:
+  contents: write
+  issues: write
+  pull-requests: write
+jobs:
+  release:
+    name: Semantic Release
+    runs-on: ubuntu-latest
+    if: ${{ github.event.workflow_run.conclusion == 'success' }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          persist-credentials: false
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22"
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+      - name: Install Node dependencies
+        run: npm ci
+      - name: Run semantic-release
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GIT_AUTHOR_NAME: ${{ github.actor }}
+          GIT_AUTHOR_EMAIL: ${{ github.actor }}@users.noreply.github.com
+          GIT_COMMITTER_NAME: ${{ github.actor }}
+          GIT_COMMITTER_EMAIL: ${{ github.actor }}@users.noreply.github.com
+        run: npx semantic-release
+  update-tags:
+    name: Update Dynamic Tags
+    runs-on: ubuntu-latest
+    needs: release
+    if: success()
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Update latest and stable tags
+        run: |
+          git config user.name "GitHub Actions"
+          git config user.email "actions@github.com"
+          LATEST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+          if [[ -n "$LATEST_TAG" && ! "$LATEST_TAG" =~ - ]]; then
+            echo "Updating latest and stable tags to $LATEST_TAG"
+            git tag -f latest
+            git tag -f stable
+            git push origin latest --force
+            git push origin stable --force
+          else
+            echo "No version tags found or latest tag is a pre-release. Skipping tag update."
+          fi

.local/eval-baseline.log ADDED Viewed

	@@ -0,0 +1,41 @@

+────────────────────────────────────────────────────────────────
+  Provider  : Gemini 2.0 Flash
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 30   match-mode: text
+  Batch size: 1
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  Completed: 30/30 records
+=== Overall — Gemini 2.0 Flash ===
+Micro  P=0.807  R=0.843  F1=0.825  (TP=247  FP=59  FN=46)
+Macro  P=0.757  R=0.767  F1=0.754
+Per-element breakdown:
+  author                P=0.488  R=0.700  F1=0.575  (TP=21  FP=22  FN=9)
+  biblScope             P=0.944  R=0.895  F1=0.919  (TP=34  FP=2  FN=4)
+  date                  P=0.909  R=0.882  F1=0.896  (TP=30  FP=3  FN=4)
+  editor                P=0.800  R=0.800  F1=0.800  (TP=4  FP=1  FN=1)
+  forename              P=0.902  R=0.920  F1=0.911  (TP=46  FP=5  FN=4)
+  idno                  P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  label                 P=1.000  R=0.667  F1=0.800  (TP=2  FP=0  FN=1)
+  note                  P=0.200  R=0.333  F1=0.250  (TP=1  FP=4  FN=2)
+  orgName               P=0.200  R=0.333  F1=0.250  (TP=1  FP=4  FN=2)
+  pubPlace              P=0.765  R=0.929  F1=0.839  (TP=13  FP=4  FN=1)
+  publisher             P=0.909  R=0.833  F1=0.870  (TP=10  FP=1  FN=2)
+  surname               P=1.000  R=1.000  F1=1.000  (TP=51  FP=0  FN=0)
+  title                 P=0.717  R=0.673  F1=0.695  (TP=33  FP=13  FN=16)
+  Lowest-F1 records (top 5):
+    #  2  F1=0.615  missed=['orgName', 'orgName', 'title']  spurious=['orgName', 'note']
+         "Commission Inter-IREM Collège & Commission Inter-IREM S..."
+    # 29  F1=0.615  missed=['forename', 'author']  spurious=['forename', 'forename', 'author']
+         "Cohen, Gary B. Education and Middle Class Society in Im..."
+    #  3  F1=0.625  missed=['date', 'title', 'biblScope']  spurious=['date', 'title', 'biblScope']
+         "BARIL, Jean (2013). Droit d’accès à l’information envir..."
+    #  5  F1=0.625  missed=['forename', 'author', 'title']  spurious=['forename', 'author', 'title']
+         "Doyle JJ. 1998. Phylogenetic perspectives on nodulation..."
+    # 19  F1=0.640  missed=['author', 'title', 'title', 'forename', 'editor']  spurious=['author', 'title', 'forename', 'editor']
+         "Taitt, David. 1916. "Journal of David Taitt's Travels f..."

.local/eval-batch10.log ADDED Viewed

	@@ -0,0 +1,41 @@

+────────────────────────────────────────────────────────────────
+  Provider  : Gemini 2.0 Flash
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 30   match-mode: text
+  Batch size: 10
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  Completed: 30/30 records
+=== Overall — Gemini 2.0 Flash ===
+Micro  P=0.825  R=0.870  F1=0.847  (TP=255  FP=54  FN=38)
+Macro  P=0.765  R=0.808  F1=0.779
+Per-element breakdown:
+  author                P=0.676  R=0.767  F1=0.719  (TP=23  FP=11  FN=7)
+  biblScope             P=0.971  R=0.895  F1=0.932  (TP=34  FP=1  FN=4)
+  date                  P=1.000  R=1.000  F1=1.000  (TP=34  FP=0  FN=0)
+  editor                P=0.800  R=0.800  F1=0.800  (TP=4  FP=1  FN=1)
+  forename              P=0.833  R=0.900  F1=0.865  (TP=45  FP=9  FN=5)
+  idno                  P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  label                 P=1.000  R=1.000  F1=1.000  (TP=3  FP=0  FN=0)
+  note                  P=0.200  R=0.333  F1=0.250  (TP=1  FP=4  FN=2)
+  orgName               P=0.100  R=0.333  F1=0.154  (TP=1  FP=9  FN=2)
+  pubPlace              P=0.722  R=0.929  F1=0.813  (TP=13  FP=5  FN=1)
+  publisher             P=0.909  R=0.833  F1=0.870  (TP=10  FP=1  FN=2)
+  surname               P=0.980  R=0.980  F1=0.980  (TP=50  FP=1  FN=1)
+  title                 P=0.750  R=0.735  F1=0.742  (TP=36  FP=12  FN=13)
+  Lowest-F1 records (top 5):
+    # 29  F1=0.571  missed=['forename', 'author']  spurious=['forename', 'forename', 'forename', 'author']
+         "Cohen, Gary B. Education and Middle Class Society in Im..."
+    #  2  F1=0.615  missed=['orgName', 'orgName', 'title']  spurious=['orgName', 'note']
+         "Commission Inter-IREM Collège & Commission Inter-IREM S..."
+    #  5  F1=0.625  missed=['forename', 'author', 'title']  spurious=['forename', 'author', 'title']
+         "Doyle JJ. 1998. Phylogenetic perspectives on nodulation..."
+    # 17  F1=0.667  missed=['publisher']  spurious=['orgName', 'orgName', 'publisher', 'orgName', 'pubPlace']
+         "McGrath, P. 2005 Toronto in the 1850s: A Transcription ..."
+    # 28  F1=0.667  missed=['title', 'publisher']  spurious=['note', 'note', 'orgName', 'orgName', 'orgName']
+         "Oxenford, J.L. & Williams, S.I., 2009. Failure and Root..."

.local/evaluate-llm.log ADDED Viewed

	@@ -0,0 +1,112 @@

+────────────────────────────────────────────────────────────────
+  Provider  : Gemini 2.0 Flash
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 10   match-mode: overlap
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  ────────────────────────────────────────────────────────────
+  Gold:       <author><orgName>Commission Inter-IREM Collège</orgName></author> &amp; <author><orgName>Commission Inter-IREM Statistiques et Probabilités</orgName></author>, (<date>2012</date>). <title level="a">Probabilités au collège : ne pas laisser l’enseignement des probabilités au hasard…</title>. Dans <title level="j">Brochure APMEP</title> n°<biblScope unit="volume">198</biblScope>.
+  Annotation: <orgName><author>Commission Inter-IREM Collège & Commission Inter-IREM Statistiques et Probabilités</author></orgName>, <date>(2012)</date>. <title>Probabilités au collège : ne pas laisser l’enseignement des probabilités au hasard….</title> Dans <title level="s">Brochure APMEP</title> <biblScope>n°198</biblScope>.
+  F1=0.857  missed=['orgName', 'author']  spurious=[]
+  ────────────────────────────────────────────────────────────
+  Gold:       <author><surname>Russell</surname>, <forename>D.A.</forename> and <forename>Michael</forename> <surname>Winterbottom</surname></author> <date>1989</date> [<date>1972</date>]. <title level="m">Classical Literary Criticism. Oxford World Classics</title>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford UP</publisher>.
+  Annotation: <author><surname>Russell</surname>, <forename>D.A.</forename></author> and <author><forename>Michael</forename> <surname>Winterbottom</surname></author> <date>1989</date> <date>[1972]</date>. <title level="m">Classical Literary Criticism</title>. <title level="s"><pubPlace>Oxford</pubPlace> World Classics</title>. Oxford: <publisher><orgName>Oxford UP</orgName></publisher>.
+  F1=0.783  missed=['pubPlace']  spurious=['author', 'pubPlace', 'title', 'orgName']
+  ────────────────────────────────────────────────────────────
+  Gold:       <label>17.</label><author><surname>Creed</surname> <forename>PA</forename>, <surname>Hicks</surname> <forename>RE</forename>, <surname>Machin</surname> <forename>MA</forename></author>. <title level="a">Behavioural plasticity and mental health outcomes for long-term unemployed attending occupational training programmes</title>. <title level="j">J Occup Org Psychol</title>. <date>1998</date>;<biblScope unit="volume">71</biblScope>: <biblScope unit="page">171-91</biblScope>.
+  Annotation: <label>17.</label><author><surname>Creed</surname> <forename>PA</forename></author>, <author><surname>Hicks</surname> <forename>RE</forename></author>, <author><surname>Machin</surname> <forename>MA</forename></author>. <title level="a">Behavioural plasticity and mental health outcomes for long-term unemployed attending occupational training programmes</title>. <title level="j">J Occup Org Psychol</title>. <date>1998</date>;<biblScope unit="volume">71</biblScope>: <biblScope unit="page">171-91</biblScope>.
+  F1=0.857  missed=['author']  spurious=['author', 'author', 'author']
+  ────────────────────────────────────────────────────────────
+  Gold:       <label>25.</label> <author><surname>Spickett-Jones</surname>, <forename>J. G.</forename> &amp; <forename>T.-Y.</forename> <surname>Eng</surname></author> (<date>2006</date>). “<title level="a">SMEs and the Strategic Context for Communication</title>”’, <title level="j">Journal of Marketing Communications</title>, Vol. <biblScope unit="volume">12</biblScope>(<biblScope unit="issue">3</biblScope>), <biblScope unit="page">225 - 243</biblScope>.
+  Annotation: <label>25.</label> <author><surname>Spickett-Jones</surname>, <forename>J. G.</forename></author> & <author><forename>T.-Y.</forename> <surname>Eng</surname></author> <date>(2006)</date>. <title level="a">“SMEs and the Strategic Context for Communication”’</title>, <title level="j">Journal of Marketing Communications</title>, Vol. <biblScope unit="volume">12</biblScope>(<biblScope unit="issue">3</biblScope>), <biblScope unit="page">225 - 243</biblScope>.
+  F1=0.960  missed=[]  spurious=['author']
+  ────────────────────────────────────────────────────────────
+  Gold:       <author><surname>Lillié</surname>, <forename>F.</forename></author>, <title level="m">Analyse tectonique de Gisement Claude</title> (<pubPlace>Cluff Lake, Saskatchewan</pubPlace>). <note type="report">Amok Internal Report</note>. <date>1982</date>.
+  Annotation: <author><surname>Lillié</surname>, <forename>F.</forename></author>, <title>Analyse tectonique de Gisement Claude (Cluff Lake, Saskatchewan)</title>. <note type="report">Amok Internal Report</note>. <date>1982</date>.
+  F1=0.923  missed=['pubPlace']  spurious=[]
+  Completed: 10/10 records
+=== Overall — Gemini 2.0 Flash ===
+Micro  P=0.914  R=0.944  F1=0.929  (TP=85  FP=8  FN=5)
+Macro  P=0.882  R=0.888  F1=0.882
+Per-element breakdown:
+  author                P=0.643  R=0.818  F1=0.720  (TP=9  FP=5  FN=2)
+  biblScope             P=1.000  R=1.000  F1=1.000  (TP=14  FP=0  FN=0)
+  date                  P=1.000  R=1.000  F1=1.000  (TP=11  FP=0  FN=0)
+  editor                P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  forename              P=1.000  R=1.000  F1=1.000  (TP=13  FP=0  FN=0)
+  label                 P=1.000  R=1.000  F1=1.000  (TP=2  FP=0  FN=0)
+  note                  P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  orgName               P=0.500  R=0.500  F1=0.500  (TP=1  FP=1  FN=1)
+  pubPlace              P=0.500  R=0.333  F1=0.400  (TP=1  FP=1  FN=2)
+  publisher             P=1.000  R=1.000  F1=1.000  (TP=2  FP=0  FN=0)
+  surname               P=1.000  R=1.000  F1=1.000  (TP=14  FP=0  FN=0)
+  title                 P=0.941  R=1.000  F1=0.970  (TP=16  FP=1  FN=0)
+  Lowest-F1 records (top 5):
+    #  4  F1=0.783  missed=['pubPlace']  spurious=['author', 'pubPlace', 'title', 'orgName']
+         "Russell, D.A. and Michael Winterbottom 1989 [1972]. Cla..."
+    #  2  F1=0.857  missed=['orgName', 'author']  spurious=[]
+         "Commission Inter-IREM Collège & Commission Inter-IREM S..."
+    #  7  F1=0.857  missed=['author']  spurious=['author', 'author', 'author']
+         "17.Creed PA, Hicks RE, Machin MA. Behavioural plasticit..."
+    #  9  F1=0.923  missed=['pubPlace']  spurious=[]
+         "Lillié, F., Analyse tectonique de Gisement Claude (Cluf..."
+    #  8  F1=0.960  missed=[]  spurious=['author']
+         "25. Spickett-Jones, J. G. & T.-Y. Eng (2006). “SMEs and..."
+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 10   match-mode: overlap
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  ────────────────────────────────────────────────────────────
+  Gold:       <author><orgName>Commission Inter-IREM Collège</orgName></author> &amp; <author><orgName>Commission Inter-IREM Statistiques et Probabilités</orgName></author>, (<date>2012</date>). <title level="a">Probabilités au collège : ne pas laisser l’enseignement des probabilités au hasard…</title>. Dans <title level="j">Brochure APMEP</title> n°<biblScope unit="volume">198</biblScope>.
+  Annotation: <author><orgName>Commission Inter-IREM Collège</orgName> & <orgName>Commission Inter-IREM Statistiques et Probabilités</orgName></author>, (<date>2012</date>). <title level="a">Probabilités au collège : ne pas laisser l’enseignement des probabilités au hasard….</title> Dans <title level="m">Brochure APMEP n°198</title>.
+  F1=0.857  missed=['author', 'biblScope']  spurious=[]
+  ────────────────────────────────────────────────────────────
+  Gold:       <author><surname>Russell</surname>, <forename>D.A.</forename> and <forename>Michael</forename> <surname>Winterbottom</surname></author> <date>1989</date> [<date>1972</date>]. <title level="m">Classical Literary Criticism. Oxford World Classics</title>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford UP</publisher>.
+  Annotation: <author><surname>Russell</surname>, <forename>D.A.</forename> and <forename>Michael</forename> <surname>Winterbottom</surname></author> <date>1989 [1972]</date>. <title level="m">Classical Literary Criticism</title>. <title level="s">Oxford World Classics</title>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford UP</publisher>.
+  F1=0.800  missed=['date', 'date']  spurious=['date', 'title']
+  ────────────────────────────────────────────────────────────
+  Gold:       <label>17.</label><author><surname>Creed</surname> <forename>PA</forename>, <surname>Hicks</surname> <forename>RE</forename>, <surname>Machin</surname> <forename>MA</forename></author>. <title level="a">Behavioural plasticity and mental health outcomes for long-term unemployed attending occupational training programmes</title>. <title level="j">J Occup Org Psychol</title>. <date>1998</date>;<biblScope unit="volume">71</biblScope>: <biblScope unit="page">171-91</biblScope>.
+  Annotation: <label>17</label>.<author><surname>Creed</surname> <forename>PA</forename></author>, <author><surname>Hicks</surname> <forename>RE</forename></author>, <author><surname>Machin</surname> <forename>MA</forename></author>. <title level="a">Behavioural plasticity and mental health outcomes for long-term unemployed attending occupational training programmes</title>. <title level="j">J Occup Org Psychol</title>. <date>1998</date>;<biblScope unit="volume">71</biblScope>: <biblScope unit="page">171-91</biblScope>.
+  F1=0.857  missed=['author']  spurious=['author', 'author', 'author']
+  ────────────────────────────────────────────────────────────
+  Gold:       <label>25.</label> <author><surname>Spickett-Jones</surname>, <forename>J. G.</forename> &amp; <forename>T.-Y.</forename> <surname>Eng</surname></author> (<date>2006</date>). “<title level="a">SMEs and the Strategic Context for Communication</title>”’, <title level="j">Journal of Marketing Communications</title>, Vol. <biblScope unit="volume">12</biblScope>(<biblScope unit="issue">3</biblScope>), <biblScope unit="page">225 - 243</biblScope>.
+  Annotation: <label>25</label>. <author><surname>Spickett-Jones</surname>, <forename>J. G.</forename> & <forename>T.-Y.</forename> <surname>Eng</surname></author> (<date>2006</date>). “SMEs and the Strategic Context for Communication”’, <title level="j">Journal of Marketing Communications</title>, <biblScope unit="volume">Vol. 12(3)</biblScope>, <biblScope unit="page">225 - 243</biblScope>.
+  F1=0.818  missed=['title', 'biblScope', 'biblScope']  spurious=['biblScope']
+  Completed: 10/10 records
+=== Overall — KISSKI / llama-3.3-70b-instruct ===
+Micro  P=0.932  R=0.911  F1=0.921  (TP=82  FP=6  FN=8)
+Macro  P=0.959  R=0.947  F1=0.952
+Per-element breakdown:
+  author                P=0.750  R=0.818  F1=0.783  (TP=9  FP=3  FN=2)
+  biblScope             P=0.917  R=0.786  F1=0.846  (TP=11  FP=1  FN=3)
+  date                  P=0.900  R=0.818  F1=0.857  (TP=9  FP=1  FN=2)
+  editor                P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  forename              P=1.000  R=1.000  F1=1.000  (TP=13  FP=0  FN=0)
+  label                 P=1.000  R=1.000  F1=1.000  (TP=2  FP=0  FN=0)
+  note                  P=1.000  R=1.000  F1=1.000  (TP=1  FP=0  FN=0)
+  orgName               P=1.000  R=1.000  F1=1.000  (TP=2  FP=0  FN=0)
+  pubPlace              P=1.000  R=1.000  F1=1.000  (TP=3  FP=0  FN=0)
+  publisher             P=1.000  R=1.000  F1=1.000  (TP=2  FP=0  FN=0)
+  surname               P=1.000  R=1.000  F1=1.000  (TP=14  FP=0  FN=0)
+  title                 P=0.938  R=0.938  F1=0.938  (TP=15  FP=1  FN=1)
+  Lowest-F1 records (top 5):
+    #  4  F1=0.800  missed=['date', 'date']  spurious=['date', 'title']
+         "Russell, D.A. and Michael Winterbottom 1989 [1972]. Cla..."
+    #  8  F1=0.818  missed=['title', 'biblScope', 'biblScope']  spurious=['biblScope']
+         "25. Spickett-Jones, J. G. & T.-Y. Eng (2006). “SMEs and..."
+    #  2  F1=0.857  missed=['author', 'biblScope']  spurious=[]
+         "Commission Inter-IREM Collège & Commission Inter-IREM S..."
+    #  7  F1=0.857  missed=['author']  spurious=['author', 'author', 'author']
+         "17.Creed PA, Hicks RE, Machin MA. Behavioural plasticit..."

.local/gemini-batch10-full.log ADDED Viewed

	@@ -0,0 +1,43 @@

+────────────────────────────────────────────────────────────────
+  Provider  : Gemini 2.0 Flash
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 10
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  Completed: 162/162 records
+=== Overall — Gemini 2.0 Flash ===
+Micro  P=0.799  R=0.696  F1=0.744  (TP=1126  FP=283  FN=492)
+Macro  P=0.704  R=0.589  F1=0.627
+Per-element breakdown:
+  author                P=0.522  R=0.623  F1=0.568  (TP=96  FP=88  FN=58)
+  biblScope             P=0.902  R=0.653  F1=0.758  (TP=111  FP=12  FN=59)
+  date                  P=0.891  R=0.780  F1=0.832  (TP=131  FP=16  FN=37)
+  editor                P=0.417  R=0.484  F1=0.448  (TP=15  FP=21  FN=16)
+  forename              P=0.900  R=0.754  F1=0.820  (TP=242  FP=27  FN=79)
+  idno                  P=1.000  R=0.500  F1=0.667  (TP=1  FP=0  FN=1)
+  label                 P=1.000  R=0.727  F1=0.842  (TP=8  FP=0  FN=3)
+  note                  P=0.412  R=0.318  F1=0.359  (TP=7  FP=10  FN=15)
+  orgName               P=0.207  R=0.545  F1=0.300  (TP=6  FP=23  FN=5)
+  page                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=0  FN=1)
+  ptr                   P=1.000  R=0.600  F1=0.750  (TP=3  FP=0  FN=2)
+  pubPlace              P=0.783  R=0.806  F1=0.794  (TP=54  FP=15  FN=13)
+  publisher             P=0.818  R=0.692  F1=0.750  (TP=45  FP=10  FN=20)
+  surname               P=0.977  R=0.796  F1=0.878  (TP=258  FP=6  FN=66)
+  title                 P=0.730  R=0.560  F1=0.634  (TP=149  FP=55  FN=117)
+  Lowest-F1 records (top 5):
+    #141  F1=0.000  missed=['surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'author', 'title', 'pubPlace', 'surname', 'forename', 'surname', 'forename', 'editor', 'title', 'publisher', 'pubPlace', 'biblScope', 'date']  spurious=[]
+         "Engelhardt, W. v., Hörz, F., Stöffler, D. and Bertsch, ..."
+    #142  F1=0.000  missed=['label', 'surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'author', 'title', 'title', 'date', 'biblScope', 'biblScope']  spurious=[]
+         "5-Decq P, Bokombe D, Nguyen Jp, Djindjian M, Molina P, ..."
+    #143  F1=0.000  missed=['surname', 'forename', 'author', 'title', 'title', 'biblScope', 'biblScope', 'date']  spurious=[]
+         "Hildebrand, A.R., et al., Mapping Chicxulub crater stru..."
+    #144  F1=0.000  missed=['surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'author', 'title', 'title', 'date', 'note']  spurious=[]
+         "Grande BM, Albuquerque M, Morin RD, “Towards a Cloud-re..."
+    #145  F1=0.000  missed=['surname', 'forename', 'author', 'title', 'publisher', 'title', 'pubPlace', 'date']  spurious=[]
+         "GREEN, Christopher, Art in France: 1900-1940, Yale Univ..."

.local/gemini-batch10-full.progress ADDED Viewed

@@ -0,0 +1,6 @@
---- , “Chiasmus in the New Testament.” In Ch:  43%|████▎     | 70/162 [05:22<06:52,  4.48s/rec, F1=0.609]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [06:10<06:15,  4.58s/rec, F1=0.609]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [06:10<06:15,  4.58s/rec, F1=0.706]
-, « Enjeux socio-culturels des discours amou:  68%|██████▊   | 110/162 [07:59<03:27,  3.99s/rec, F1=0.857]
-, « Enjeux socio-culturels des discours amou:  74%|███████▍  | 120/162 [08:45<02:55,  4.18s/rec, F1=0.857]
-, « Enjeux socio-culturels des discours amou:  74%|███████▍  | 120/162 [08:45<02:55,  4.18s/rec, F1=0.286]

+warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
+  batch_results = _evaluate_batch(
+  batch_results = _evaluate_batch(

.local/kisski-batch1.log ADDED Viewed

	@@ -0,0 +1,43 @@

+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 1
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  Completed: 162/162 records
+=== Overall — KISSKI / llama-3.3-70b-instruct ===
+Micro  P=0.876  R=0.858  F1=0.867  (TP=1388  FP=196  FN=230)
+Macro  P=0.663  R=0.634  F1=0.636
+Per-element breakdown:
+  author                P=0.731  R=0.831  F1=0.778  (TP=128  FP=47  FN=26)
+  biblScope             P=0.865  R=0.829  F1=0.847  (TP=141  FP=22  FN=29)
+  date                  P=0.958  R=0.940  F1=0.949  (TP=158  FP=7  FN=10)
+  editor                P=0.852  R=0.742  F1=0.793  (TP=23  FP=4  FN=8)
+  forename              P=0.968  R=0.941  F1=0.954  (TP=302  FP=10  FN=19)
+  idno                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=3  FN=2)
+  label                 P=0.333  R=0.182  F1=0.235  (TP=2  FP=4  FN=9)
+  note                  P=0.727  R=0.364  F1=0.485  (TP=8  FP=3  FN=14)
+  orgName               P=0.296  R=0.727  F1=0.421  (TP=8  FP=19  FN=3)
+  page                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=0  FN=1)
+  ptr                   P=0.750  R=0.600  F1=0.667  (TP=3  FP=1  FN=2)
+  pubPlace              P=0.841  R=0.866  F1=0.853  (TP=58  FP=11  FN=9)
+  publisher             P=0.852  R=0.800  F1=0.825  (TP=52  FP=9  FN=13)
+  surname               P=0.988  R=0.978  F1=0.983  (TP=317  FP=4  FN=7)
+  title                 P=0.783  R=0.707  F1=0.743  (TP=188  FP=52  FN=78)
+  Lowest-F1 records (top 5):
+    # 11  F1=0.000  missed=['surname', 'forename', 'author', 'date', 'title', 'forename', 'surname', 'editor', 'title', 'pubPlace', 'publisher', 'biblScope']  spurious=[]
+         "Jakobson, Roman 1960. "Closing Statement: Linguistics a..."
+    # 97  F1=0.182  missed=['surname', 'forename', 'author', 'title', 'ptr']  spurious=['forename', 'surname', 'author', 'title']
+         "York H. Dobyns Journal of Scientific Exploration, 1996 ..."
+    # 39  F1=0.400  missed=['title', 'publisher']  spurious=['author', 'orgName', 'title', 'orgName']
+         "Le Monde. 2016. “Opération Tulipe » : les coulisses de ..."
+    # 93  F1=0.545  missed=['publisher', 'biblScope']  spurious=['orgName', 'author', 'biblScope']
+         "Ministère des ressources naturelles (1996). L’énergie a..."
+    # 85  F1=0.571  missed=['title']  spurious=['author', 'orgName']
+         "PCC Access Points for Expressions Task Group. (2012). “..."

.local/kisski-batch1.progress ADDED Viewed

@@ -0,0 +1,4 @@
---- , “Chiasmus in the New Testament.” In Ch:  43%|████▎     | 70/162 [21:29<24:34, 16.03s/rec, F1=0.750]
---- , “Chiasmus in the New Testament.” In Ch:  44%|████▍     | 71/162 [21:40<22:00, 14.51s/rec, F1=0.750]
---- , “Chiasmus in the New Testament.” In Ch:  44%|████▍     | 71/162 [21:40<22:00, 14.51s/rec, F1=0.778]
------ , and Haendel, V. (1983). The motives :  49%|████▉     | 79/162 [23:37<22:23, 16.18s/rec, F1=0.880]
------ , and Haendel, V. (1983). The motives :  49%|████▉     | 80/162 [24:04<26:33, 19.44s/rec, F1=0.880]
------ , and Haendel, V. (1983). The motives :  49%|████▉     | 80/162 [24:04<26:33, 19.44s/rec, F1=0.914]
-, « Enjeux socio-culturels des discours amou:  68%|██████▊   | 110/162 [31:39<26:31, 30.60s/rec, F1=0.706]
-, « Enjeux socio-culturels des discours amou:  69%|██████▊   | 111/162 [31:52<21:25, 25.20s/rec, F1=0.706]
-, « Enjeux socio-culturels des discours amou:  69%|██████▊   | 111/162 [31:52<21:25, 25.20s/rec, F1=0.833]
-Luke and the Law (Cambridge: Cambridge Unive:  72%|███████▏  | 116/162 [33:29<14:45, 19.24s/rec, F1=0.800]
-Luke and the Law (Cambridge: Cambridge Unive:  72%|███████▏  | 117/162 [33:34<11:20, 15.12s/rec, F1=0.800]
-Luke and the Law (Cambridge: Cambridge Unive:  72%|███████▏  | 117/162 [33:34<11:20, 15.12s/rec, F1=0.889]


1	+ warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
2	+
3	+ result = annotate(
4	+

.local/kisski-batch10-t600.log ADDED Viewed

	@@ -0,0 +1,43 @@

+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 10
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  Completed: 162/162 records
+=== Overall — KISSKI / llama-3.3-70b-instruct ===
+Micro  P=0.869  R=0.839  F1=0.854  (TP=1357  FP=204  FN=261)
+Macro  P=0.652  R=0.612  F1=0.619
+Per-element breakdown:
+  author                P=0.741  R=0.818  F1=0.778  (TP=126  FP=44  FN=28)
+  biblScope             P=0.840  R=0.771  F1=0.804  (TP=131  FP=25  FN=39)
+  date                  P=0.921  R=0.905  F1=0.913  (TP=152  FP=13  FN=16)
+  editor                P=0.889  R=0.774  F1=0.828  (TP=24  FP=3  FN=7)
+  forename              P=0.958  R=0.931  F1=0.945  (TP=299  FP=13  FN=22)
+  idno                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=1  FN=2)
+  label                 P=0.286  R=0.182  F1=0.222  (TP=2  FP=5  FN=9)
+  note                  P=0.727  R=0.364  F1=0.485  (TP=8  FP=3  FN=14)
+  orgName               P=0.333  R=0.727  F1=0.457  (TP=8  FP=16  FN=3)
+  page                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=0  FN=1)
+  ptr                   P=0.667  R=0.400  F1=0.500  (TP=2  FP=1  FN=3)
+  pubPlace              P=0.831  R=0.881  F1=0.855  (TP=59  FP=12  FN=8)
+  publisher             P=0.820  R=0.769  F1=0.794  (TP=50  FP=11  FN=15)
+  surname               P=0.978  R=0.966  F1=0.972  (TP=313  FP=7  FN=11)
+  title                 P=0.785  R=0.688  F1=0.733  (TP=183  FP=50  FN=83)
+  Lowest-F1 records (top 5):
+    #111  F1=0.000  missed=['author', 'title', 'title', 'biblScope', 'date', 'biblScope']  spurious=[]
+         "-, « Enjeux socio-culturels des discours amoureux dans ..."
+    # 97  F1=0.182  missed=['surname', 'forename', 'author', 'title', 'ptr']  spurious=['forename', 'surname', 'author', 'title']
+         "York H. Dobyns Journal of Scientific Exploration, 1996 ..."
+    # 93  F1=0.222  missed=['publisher', 'title', 'date', 'biblScope']  spurious=['orgName', 'author', 'title']
+         "Ministère des ressources naturelles (1996). L’énergie a..."
+    # 39  F1=0.250  missed=['title', 'title', 'publisher']  spurious=['orgName', 'author', 'title']
+         "Le Monde. 2016. “Opération Tulipe » : les coulisses de ..."
+    #152  F1=0.462  missed=['forename', 'surname', 'forename', 'surname', 'forename', 'surname', 'title']  spurious=['surname', 'forename', 'surname', 'forename', 'surname', 'forename', 'title']
+         "Irda Fidrianny, Dian Ayu, Rika Hartati. Antioxidant Cap..."

.local/kisski-batch10-t600.progress ADDED Viewed

@@ -0,0 +1,6 @@
---- , “Chiasmus in the New Testament.” In Ch:  43%|████▎     | 70/162 [26:15<35:56, 23.44s/rec, F1=0.917]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [28:42<28:15, 20.68s/rec, F1=0.917]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [28:42<28:15, 20.68s/rec, F1=0.778]
-, « Enjeux socio-culturels des discours amou:  68%|██████▊   | 110/162 [38:17<17:53, 20.64s/rec, F1=0.714]
-, « Enjeux socio-culturels des discours amou:  74%|███████▍  | 120/162 [41:33<14:14, 20.33s/rec, F1=0.714]
-, « Enjeux socio-culturels des discours amou:  74%|███████▍  | 120/162 [41:33<14:14, 20.33s/rec, F1=0.000]

+warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
+  annotated_text = inject_xml(plain_text, deduped)
+  annotated_text = inject_xml(plain_text, deduped)

.local/kisski-batch10.log ADDED Viewed

	@@ -0,0 +1,68 @@

+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 10
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  [1-10/162] ERROR — The read operation timed out
+  [11-20/162] ERROR — The read operation timed out
+  [21-30/162] ERROR — The read operation timed out
+  [31-40/162] ERROR — The read operation timed out
+  [41-50/162] ERROR — The read operation timed out
+  [51-60/162] ERROR — The read operation timed out
+  [61-70/162] ERROR — The read operation timed out
+  [81-90/162] ERROR — The read operation timed out
+  [91-100/162] ERROR — The read operation timed out
+  [101-110/162] ERROR — The read operation timed out
+  [111-120/162] ERROR — The read operation timed out
+  [121-130/162] ERROR — The read operation timed out
+  [141-150/162] ERROR — The read operation timed out
+  [151-160/162] ERROR — The read operation timed out
+  Completed: 22/162 records  (140 failed)
+=== Overall — KISSKI / llama-3.3-70b-instruct ===
+Micro  P=0.877  R=0.811  F1=0.843  (TP=185  FP=26  FN=43)
+Macro  P=0.702  R=0.654  F1=0.672
+Per-element breakdown:
+  author                P=0.783  R=0.900  F1=0.837  (TP=18  FP=5  FN=2)
+  biblScope             P=0.714  R=0.556  F1=0.625  (TP=10  FP=4  FN=8)
+  date                  P=1.000  R=1.000  F1=1.000  (TP=21  FP=0  FN=0)
+  editor                P=1.000  R=0.667  F1=0.800  (TP=6  FP=0  FN=3)
+  forename              P=0.979  R=0.979  F1=0.979  (TP=46  FP=1  FN=1)
+  label                 P=0.000  R=0.000  F1=0.000  (TP=0  FP=0  FN=1)
+  note                  P=0.000  R=0.000  F1=0.000  (TP=0  FP=0  FN=4)
+  orgName               P=0.000  R=0.000  F1=0.000  (TP=0  FP=1  FN=0)
+  pubPlace              P=0.846  R=0.917  F1=0.880  (TP=11  FP=2  FN=1)
+  publisher             P=0.750  R=0.692  F1=0.720  (TP=9  FP=3  FN=4)
+  surname               P=0.979  R=0.979  F1=0.979  (TP=46  FP=1  FN=1)
+  title                 P=0.667  R=0.500  F1=0.571  (TP=18  FP=9  FN=18)
+  Lowest-F1 records (top 5):
+    # 14  F1=0.000  missed=['title']  spurious=['title']
+         "26-Vellin J-F, Achim V, Sinardet D, et al. Rapidly deve..."
+    # 17  F1=0.545  missed=['publisher', 'title']  spurious=['orgName', 'author', 'title']
+         "McGrath, P. 2005 Toronto in the 1850s: A Transcription ..."
+    # 12  F1=0.667  missed=['title', 'biblScope', 'pubPlace']  spurious=['pubPlace', 'publisher']
+         "Bybee, Joan L. 2002. Cognitive processes in grammatical..."
+    #  9  F1=0.667  missed=['title', 'title', 'biblScope', 'publisher', 'title']  spurious=['title', 'biblScope', 'biblScope']
+         "Lillié, F., Analyse tectonique de Gisement Claude (Cluf..."
+    # 19  F1=0.667  missed=['title', 'biblScope', 'title', 'biblScope']  spurious=['title']
+         "Taitt, David. 1916. "Journal of David Taitt's Travels f..."

.local/kisski-batch10.progress ADDED Viewed

@@ -0,0 +1,2 @@
---- , “Chiasmus in the New Testament.” In Ch:  43%|████▎     | 70/162 [14:00<18:24, 12.01s/rec]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [16:38<18:02, 13.21s/rec]
---- , “Chiasmus in the New Testament.” In Ch:  49%|████▉     | 80/162 [16:38<18:02, 13.21s/rec, F1=0.947]
-, « Enjeux socio-culturels des discours amou:  68%|██████▊   | 110/162 [22:38<10:44, 12.40s/rec, F1=0.947]
-, « Enjeux socio-culturels des discours amou:  74%|███████▍  | 120/162 [24:38<08:35, 12.28s/rec, F1=0.947]


1	+ warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
2	+

.local/kisski-batch162.log ADDED Viewed

	@@ -0,0 +1,11 @@

+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 162
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  [1-162/162] ERROR — The read operation timed out
+  ✗ All records failed — no results to report.

.local/kisski-batch162.progress ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
2	+

.local/kisski-batch50.log ADDED Viewed

	@@ -0,0 +1,17 @@

+────────────────────────────────────────────────────────────────
+  Provider  : KISSKI / llama-3.3-70b-instruct
+  Gold file : tests/fixtures/blbl-examples.tei.xml
+  Records   : 162   match-mode: text
+  Batch size: 50
+  GLiNER    : disabled
+────────────────────────────────────────────────────────────────
+  [1-50/162] ERROR — The read operation timed out
+  [51-100/162] ERROR — The read operation timed out
+  [101-150/162] ERROR — The read operation timed out
+  [151-162/162] ERROR — The read operation timed out
+  ✗ All records failed — no results to report.

.local/kisski-batch50.progress ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ warning: `VIRTUAL_ENV=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
2	+

.pytest_cache/.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Created by pytest automatically.
2	+ *

.pytest_cache/CACHEDIR.TAG ADDED Viewed

	@@ -0,0 +1,4 @@

+Signature: 8a477f597d28d172789f06886806bc55
+# This file is a cache directory tag created by pytest.
+# For information about cache directory tags, see:
+#	https://bford.info/cachedir/spec.html

.pytest_cache/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+# pytest cache directory #
+This directory contains data from the pytest's cache plugin,
+which provides the `--lf` and `--ff` options, as well as the `cache` fixture.
+**Do not** commit this to version control.
+See [the docs](https://docs.pytest.org/en/stable/how-to/cache.html) for more information.

.pytest_cache/v/cache/lastfailed ADDED Viewed

	@@ -0,0 +1 @@


1	+ {}

.pytest_cache/v/cache/nodeids ADDED Viewed

	@@ -0,0 +1,162 @@

+[
+  "tests/integration/test_pipeline_e2e.py::test_attributes_preserved_end_to_end",
+  "tests/integration/test_pipeline_e2e.py::test_context_longer_than_span_text",
+  "tests/integration/test_pipeline_e2e.py::test_fuzzy_context_match_flags_span",
+  "tests/integration/test_pipeline_e2e.py::test_hallucinated_context_span_rejected",
+  "tests/integration/test_pipeline_e2e.py::test_long_text_entity_in_second_chunk",
+  "tests/integration/test_pipeline_e2e.py::test_multiple_occurrences_disambiguated_by_context",
+  "tests/integration/test_pipeline_e2e.py::test_nested_spans_end_to_end",
+  "tests/integration/test_pipeline_e2e.py::test_plain_text_invariant_with_multiple_entities",
+  "tests/integration/test_pipeline_e2e.py::test_preexisting_xml_preserved",
+  "tests/test_builder.py::test_candidates_appear_in_prompt",
+  "tests/test_builder.py::test_correction_prompt_contains_original_response",
+  "tests/test_builder.py::test_empty_candidates_list_no_section",
+  "tests/test_builder.py::test_extraction_raises",
+  "tests/test_builder.py::test_json_enforced_prompt_contains_schema",
+  "tests/test_builder.py::test_json_enforced_prompt_shorter_than_text_gen",
+  "tests/test_builder.py::test_no_candidate_section_when_none",
+  "tests/test_builder.py::test_text_gen_prompt_contains_example",
+  "tests/test_builder.py::test_text_gen_prompt_contains_json_instruction",
+  "tests/test_builder.py::test_text_gen_prompt_contains_schema_elements",
+  "tests/test_builder.py::test_text_gen_prompt_contains_source_text",
+  "tests/test_chunker.py::test_chunk_boundary_does_not_split_xml_tag",
+  "tests/test_chunker.py::test_chunk_start_offsets_correct",
+  "tests/test_chunker.py::test_exact_chunk_size_no_overflow",
+  "tests/test_chunker.py::test_long_text_covers_all_characters",
+  "tests/test_chunker.py::test_long_text_multiple_chunks",
+  "tests/test_chunker.py::test_overlap_produces_repeated_content",
+  "tests/test_chunker.py::test_short_text_single_chunk",
+  "tests/test_evaluation.py::TestAggregate::test_aggregate_concatenates_lists",
+  "tests/test_evaluation.py::TestAggregate::test_aggregate_empty",
+  "tests/test_evaluation.py::TestAggregate::test_aggregate_sums_counts",
+  "tests/test_evaluation.py::TestCertLowUnionMatch::test_cert_low_unmatched_with_no_merger_is_fn",
+  "tests/test_evaluation.py::TestCertLowUnionMatch::test_merged_pred_scores_as_two_tps",
+  "tests/test_evaluation.py::TestCertLowUnionMatch::test_split_pred_still_matches_normally",
+  "tests/test_evaluation.py::TestComputeMetrics::test_all_wrong_element",
+  "tests/test_evaluation.py::TestComputeMetrics::test_empty_gold_and_pred",
+  "tests/test_evaluation.py::TestComputeMetrics::test_macro_vs_micro_differ_on_imbalanced",
+  "tests/test_evaluation.py::TestComputeMetrics::test_partial_precision",
+  "tests/test_evaluation.py::TestComputeMetrics::test_partial_recall",
+  "tests/test_evaluation.py::TestComputeMetrics::test_per_element_breakdown",
+  "tests/test_evaluation.py::TestComputeMetrics::test_perfect_prediction",
+  "tests/test_evaluation.py::TestComputeMetrics::test_report_returns_string",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_attributes_not_required_for_text_match",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_empty_annotation",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_exact_match_mode",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_missing_span_reduces_recall",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_perfect_annotation",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_plain_text_element",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_spurious_span_reduces_precision",
+  "tests/test_evaluation.py::TestEvaluateBibl::test_wrong_element_type",
+  "tests/test_evaluation.py::TestEvaluateElement::test_ampersand_in_text_is_escaped",
+  "tests/test_evaluation.py::TestEvaluateElement::test_attributes_not_required_for_text_match",
+  "tests/test_evaluation.py::TestEvaluateElement::test_empty_annotation",
+  "tests/test_evaluation.py::TestEvaluateElement::test_exact_match_mode",
+  "tests/test_evaluation.py::TestEvaluateElement::test_missing_span_reduces_recall",
+  "tests/test_evaluation.py::TestEvaluateElement::test_perfect_annotation",
+  "tests/test_evaluation.py::TestEvaluateElement::test_plain_text_element",
+  "tests/test_evaluation.py::TestEvaluateElement::test_spurious_span_reduces_precision",
+  "tests/test_evaluation.py::TestEvaluateElement::test_wrong_element_type",
+  "tests/test_evaluation.py::TestExtractSpans::test_attributes_preserved",
+  "tests/test_evaluation.py::TestExtractSpans::test_element_with_text_and_tail",
+  "tests/test_evaluation.py::TestExtractSpans::test_flat_two_elements",
+  "tests/test_evaluation.py::TestExtractSpans::test_namespace_stripped",
+  "tests/test_evaluation.py::TestExtractSpans::test_nested_elements",
+  "tests/test_evaluation.py::TestExtractSpans::test_no_children",
+  "tests/test_evaluation.py::TestExtractSpans::test_normalized_text_property",
+  "tests/test_evaluation.py::TestExtractSpans::test_plain_text_equals_itertext",
+  "tests/test_evaluation.py::TestMatchSpans::test_both_empty",
+  "tests/test_evaluation.py::TestMatchSpans::test_empty_gold",
+  "tests/test_evaluation.py::TestMatchSpans::test_empty_pred",
+  "tests/test_evaluation.py::TestMatchSpans::test_exact_perfect_match",
+  "tests/test_evaluation.py::TestMatchSpans::test_exact_wrong_element",
+  "tests/test_evaluation.py::TestMatchSpans::test_exact_wrong_offset",
+  "tests/test_evaluation.py::TestMatchSpans::test_greedy_each_span_matched_once",
+  "tests/test_evaluation.py::TestMatchSpans::test_overlap_mode_above_threshold",
+  "tests/test_evaluation.py::TestMatchSpans::test_overlap_mode_below_threshold",
+  "tests/test_evaluation.py::TestMatchSpans::test_text_mode_different_text_no_match",
+  "tests/test_evaluation.py::TestMatchSpans::test_text_mode_matches_despite_offset_difference",
+  "tests/test_evaluation.py::TestMatchSpans::test_text_mode_normalises_whitespace",
+  "tests/test_injector.py::test_attrs_rendered_in_tag",
+  "tests/test_injector.py::test_build_nesting_tree_siblings",
+  "tests/test_injector.py::test_build_nesting_tree_simple",
+  "tests/test_injector.py::test_nested_spans",
+  "tests/test_injector.py::test_no_spans_returns_source",
+  "tests/test_injector.py::test_overlapping_spans_warns_and_skips",
+  "tests/test_injector.py::test_single_span",
+  "tests/test_injector.py::test_span_at_end_of_text",
+  "tests/test_injector.py::test_span_at_start_of_text",
+  "tests/test_injector.py::test_span_covering_entire_text",
+  "tests/test_injector.py::test_two_non_overlapping_spans",
+  "tests/test_parser.py::test_attrs_defaults_to_empty_dict",
+  "tests/test_parser.py::test_invalid_json_no_retry_raises",
+  "tests/test_parser.py::test_markdown_fenced_json_parsed",
+  "tests/test_parser.py::test_missing_fields_items_skipped",
+  "tests/test_parser.py::test_non_list_response_raises",
+  "tests/test_parser.py::test_retry_still_invalid_raises",
+  "tests/test_parser.py::test_retry_triggered_on_first_failure",
+  "tests/test_parser.py::test_strip_fences_json_lang",
+  "tests/test_parser.py::test_strip_fences_no_fences",
+  "tests/test_parser.py::test_strip_fences_no_lang",
+  "tests/test_parser.py::test_strip_fences_with_preamble",
+  "tests/test_parser.py::test_valid_json_parsed_directly",
+  "tests/test_pipeline.py::test_annotate_empty_response",
+  "tests/test_pipeline.py::test_annotate_escapes_bare_ampersand",
+  "tests/test_pipeline.py::test_annotate_fuzzy_spans_surfaced",
+  "tests/test_pipeline.py::test_annotate_no_text_modification",
+  "tests/test_pipeline.py::test_annotate_preserves_existing_entity_references",
+  "tests/test_pipeline.py::test_annotate_preserves_existing_xml",
+  "tests/test_pipeline.py::test_annotate_smoke",
+  "tests/test_pipeline.py::test_annotate_text_generation_endpoint",
+  "tests/test_pipeline.py::test_no_duplicate_tags_when_same_element_detected",
+  "tests/test_pipeline.py::test_overlapping_spans_from_chunks_are_merged",
+  "tests/test_resolver.py::test_attrs_preserved",
+  "tests/test_resolver.py::test_children_start_empty",
+  "tests/test_resolver.py::test_context_not_found_rejected",
+  "tests/test_resolver.py::test_direct_fallback_when_fuzzy_context_misses",
+  "tests/test_resolver.py::test_empty_span_list",
+  "tests/test_resolver.py::test_exact_context_match",
+  "tests/test_resolver.py::test_fuzzy_text_fallback_when_newline_space_mismatch",
+  "tests/test_resolver.py::test_multiple_spans_resolved",
+  "tests/test_resolver.py::test_source_slice_verified",
+  "tests/test_resolver.py::test_text_equals_context_with_whitespace_diff",
+  "tests/test_resolver.py::test_text_not_in_context_window_rejected",
+  "tests/test_tei.py::test_biblstruct_depth0_excludes_children",
+  "tests/test_tei.py::test_biblstruct_depth1_includes_children",
+  "tests/test_tei.py::test_biblstruct_description",
+  "tests/test_tei.py::test_biblstruct_direct_children_present",
+  "tests/test_tei.py::test_biblstruct_has_type_attribute",
+  "tests/test_tei.py::test_biblstruct_in_schema",
+  "tests/test_tei.py::test_biblstruct_model_group_children_expanded",
+  "tests/test_tei.py::test_create_schema_returns_tei_schema",
+  "tests/test_tei.py::test_create_schema_unknown_element_raises",
+  "tests/test_tei.py::test_idno_description",
+  "tests/test_tei.py::test_idno_in_schema",
+  "tests/test_tei.py::test_idno_self_referential_child",
+  "tests/test_tei.py::test_idno_type_attribute_with_allowed_values",
+  "tests/test_tei.py::test_no_duplicate_attributes_on_element",
+  "tests/test_tei.py::test_no_duplicate_elements_in_schema",
+  "tests/test_validator.py::test_empty_span_list",
+  "tests/test_validator.py::test_free_string_attribute_passes",
+  "tests/test_validator.py::test_invalid_attribute_value_rejected",
+  "tests/test_validator.py::test_leading_space_absorbed_into_span_boundary_normalises",
+  "tests/test_validator.py::test_leading_trailing_whitespace_stripped_both_sides",
+  "tests/test_validator.py::test_multiline_source_normalises_same",
+  "tests/test_validator.py::test_multiple_spaces_in_source_normalise",
+  "tests/test_validator.py::test_out_of_bounds_span_rejected",
+  "tests/test_validator.py::test_space_dropped_between_words_raises",
+  "tests/test_validator.py::test_tab_in_source_normalises",
+  "tests/test_validator.py::test_trailing_space_shifted_outside_span_normalises",
+  "tests/test_validator.py::test_unknown_attribute_rejected",
+  "tests/test_validator.py::test_unknown_element_rejected",
+  "tests/test_validator.py::test_valid_constrained_attribute_passes",
+  "tests/test_validator.py::test_valid_span_passes",
+  "tests/test_validator.py::test_validate_output_dropped_word_raises",
+  "tests/test_validator.py::test_validate_output_duplicated_word_raises",
+  "tests/test_validator.py::test_validate_output_empty_passes",
+  "tests/test_validator.py::test_validate_output_error_contains_diff",
+  "tests/test_validator.py::test_validate_output_multiple_tags_passes",
+  "tests/test_validator.py::test_validate_output_plain_source_passes",
+  "tests/test_validator.py::test_validate_output_tags_injected_passes",
+  "tests/test_validator.py::test_validate_output_whitespace_difference_passes"
+]

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.12

.releaserc.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "branches": ["main"],
+  "plugins": [
+    "@semantic-release/commit-analyzer",
+    "@semantic-release/release-notes-generator",
+    [
+      "@semantic-release/changelog",
+      {
+        "changelogFile": "CHANGELOG.md"
+      }
+    ],
+    [
+      "@semantic-release/exec",
+      {
+        "prepareCmd": "python scripts/version.py ${nextRelease.version}"
+      }
+    ],
+    [
+      "@semantic-release/git",
+      {
+        "assets": [
+          "package.json",
+          "package-lock.json",
+          "pyproject.toml",
+          "uv.lock",
+          "tei_annotator/__init__.py",
+          "CHANGELOG.md"
+        ],
+        "message": "chore(release): ${nextRelease.version} [skip ci]\n\n${nextRelease.notes}"
+      }
+    ],
+    "@semantic-release/github"
+  ]
+}

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# [1.5.0](https://github.com/cboulanger/tei-annotator/compare/v1.4.0...v1.5.0) (2026-05-15)
+### Bug Fixes
+* drop schema-element tags from restore map to prevent invalid nesting ([f047302](https://github.com/cboulanger/tei-annotator/commit/f04730209b69d829eac2a62a50931100f84431f4)), closes [#2](https://github.com/cboulanger/tei-annotator/issues/2) [#4](https://github.com/cboulanger/tei-annotator/issues/4)
+### Features
+* add interactive annotation debugger script ([42ee837](https://github.com/cboulanger/tei-annotator/commit/42ee837f329742ab69154b148f336e3778c43f9c))
+# [1.4.0](https://github.com/cboulanger/tei-annotator/compare/v1.3.1...v1.4.0) (2026-05-15)
+### Features
+* validate injected XML text content matches source after tag stripping ([c749627](https://github.com/cboulanger/tei-annotator/commit/c749627cb5520d834046816a3c60e18cf41063f4))
+## [1.3.1](https://github.com/cboulanger/tei-annotator/compare/v1.3.0...v1.3.1) (2026-05-14)
+### Bug Fixes
+* downgrade diagnostic log messages from INFO to DEBUG ([e3ca37f](https://github.com/cboulanger/tei-annotator/commit/e3ca37fd260f318367062f0722a802f7422385a3)), closes [#2](https://github.com/cboulanger/tei-annotator/issues/2)
+# [1.3.0](https://github.com/cboulanger/tei-annotator/compare/v1.2.0...v1.3.0) (2026-05-14)
+### Features
+* add INFO-level pipeline diagnostics for issue [#2](https://github.com/cboulanger/tei-annotator/issues/2) debugging ([81dc61e](https://github.com/cboulanger/tei-annotator/commit/81dc61ec435c7bfcb1b0ea036de8b1a475e7ada3))
+# [1.2.0](https://github.com/cboulanger/tei-annotator/compare/v1.1.1...v1.2.0) (2026-05-14)
+### Features
+* add warning for span resolver context mismatches ([11fc401](https://github.com/cboulanger/tei-annotator/commit/11fc401dd0ca172306b94f7a0743d22d5a63f5a3)), closes [#2](https://github.com/cboulanger/tei-annotator/issues/2)
+## [1.1.1](https://github.com/cboulanger/tei-annotator/compare/v1.1.0...v1.1.1) (2026-05-14)
+### Bug Fixes
+* merge overlapping spans from chunks to prevent text reordering ([4b612a1](https://github.com/cboulanger/tei-annotator/commit/4b612a1fb5c7fe0114580b453cb0f5c8e7d52ab2)), closes [#2](https://github.com/cboulanger/tei-annotator/issues/2)
+# [1.1.0](https://github.com/cboulanger/tei-annotator/compare/v1.0.0...v1.1.0) (2026-05-14)
+### Features
+* **webservice:** fix timeout handling, reduce default LLM timeout to 60s ([d499dab](https://github.com/cboulanger/tei-annotator/commit/d499dabfbc04e3cb94aa34512b5b8d782e69c82b))
+# 1.0.0 (2026-05-14)
+### Bug Fixes
+* add [@spaces](https://github.com/spaces).GPU decorator to satisfy ZeroGPU spaces check; graceful fallback when spaces not installed ([40d8c92](https://github.com/cboulanger/tei-annotator/commit/40d8c92be6089d52b468f9004582e9f21e7759b7))
+* Add back gemini 2.0 flash model ([e7ad4b5](https://github.com/cboulanger/tei-annotator/commit/e7ad4b5e5334628e95851c0c9a02d53af04a0b44))
+* Add rate limiter to Kisski connector ([cfaba49](https://github.com/cboulanger/tei-annotator/commit/cfaba49b966649d17401ea5af06c995f7ceda375))
+* catch exceptions in do_evaluate to show error in UI instead of crashing ZeroGPU runtime ([82b98f7](https://github.com/cboulanger/tei-annotator/commit/82b98f75f16cfcc739133624e9074c34ca025d94))
+* disable SSR mode in Gradio launch to prevent Node.js server crash on HF Spaces ([01465bb](https://github.com/cboulanger/tei-annotator/commit/01465bbc41a7cb7ebca30c566fd1cf659a933ef9))
+* escape bare & in text nodes without double-encoding existing entities ([7a987f8](https://github.com/cboulanger/tei-annotator/commit/7a987f8743a595aadaeb3050c23fcc606053e834))
+* explicitly set hardware: cpu-basic in Space metadata to suppress spaces.GPU check ([3c798ff](https://github.com/cboulanger/tei-annotator/commit/3c798ff051e7d404c85efa029213cde3a4f8a342))
+* Fix config files ([0b7260e](https://github.com/cboulanger/tei-annotator/commit/0b7260eaa7ac5a61a50ce5c51e0f3f2a316565e1))
+* increase [@spaces](https://github.com/spaces).GPU timeout to 300s to avoid GPU task abort on slow LLM calls ([a395ade](https://github.com/cboulanger/tei-annotator/commit/a395adecb93fee97c78579fc15b2129f7bd77a1d))
+* Increase timeout ([4006797](https://github.com/cboulanger/tei-annotator/commit/4006797de91175d610ac26ea5f8e0cacbd317275))
+* prompt rule improvements from 2026-05-08 evaluation experiments ([d26a27c](https://github.com/cboulanger/tei-annotator/commit/d26a27c9c5b9d92e2855636c1d2ceddd0d5aea82))
+* remove local package install from requirements.txt (HF Spaces copies source directly) ([abc28a1](https://github.com/cboulanger/tei-annotator/commit/abc28a1feb52de3989fb1a6e0b83dd54081f32c9))
+* rename EvaluateRequest.schema → schema_id, guard response construction, raise keepalive ([1c4c16a](https://github.com/cboulanger/tei-annotator/commit/1c4c16aeec1afc37180ed6cb8b3ad5efbe7e3912))
+* replace editable install (-e .) with plain . in requirements.txt for HF Spaces compatibility ([0dcfc4c](https://github.com/cboulanger/tei-annotator/commit/0dcfc4c840ff72ea99f24c7d5188e48dee6e9359))
+* sync batch size with sample size in gradio app ([1659d98](https://github.com/cboulanger/tei-annotator/commit/1659d98a3820a251581bf3c7217aff44c6b7abf2))
+### Features
+* Add batch size configuration in api and frontends ([b530e33](https://github.com/cboulanger/tei-annotator/commit/b530e336135d488aa77fb37d24da07346f993941))
+* Add Gradio app for HF Spaces deployment ([331a802](https://github.com/cboulanger/tei-annotator/commit/331a80280d3a409237da960383ea7532698a19ae))
+* Add registry to support any kind of inference provider ([f65b650](https://github.com/cboulanger/tei-annotator/commit/f65b6501074293efba4c9597f860e0b49874b997))
+* Add security against malicious clients ([a1785f7](https://github.com/cboulanger/tei-annotator/commit/a1785f73c2c4212a6fec304217f2cce801e9c629))
+* Add webservice for demonstration ([c3f33b6](https://github.com/cboulanger/tei-annotator/commit/c3f33b679e71429229bde937ae38a36689c7da53))
+* cert="low" uncertain-boundary evaluation mechanism ([6826850](https://github.com/cboulanger/tei-annotator/commit/68268504ee31c64530bcc8e0fed9fb0bf816e08c))
+* collect_hard_examples.py — find challenging gold examples via mini-batch evaluation ([ccfad34](https://github.com/cboulanger/tei-annotator/commit/ccfad34450269f03d4c17993081a6606a24c1e02))
+* separate evaluation corpora from tests; add schema/corpus selection to webservice ([796d53e](https://github.com/cboulanger/tei-annotator/commit/796d53ee0f9c8a71675f57403436d46eff02453e))
+* Support more providers ([18a607c](https://github.com/cboulanger/tei-annotator/commit/18a607c518e3c18a56948169cf31b3df6395ca44))
+* **webservice:** show-examples mode, model status indicators, hard LLM timeout ([90e4492](https://github.com/cboulanger/tei-annotator/commit/90e4492189940156bbeb84158663b1eefc3be2bb))
+### Performance Improvements
+* cache blbl schema at module load instead of rebuilding per request ([4ba8656](https://github.com/cboulanger/tei-annotator/commit/4ba865620dd719b9972bbed58668a3240cba1d33))

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,149 @@

+# CLAUDE.md
+## Package manager
+Uses `uv`. Run tests with `uv run pytest`. Install deps with `uv sync` (add `--extra gliner` or `--extra webservice` for optional extras). API keys go in `.env` (copy from `.env.template`).
+`gh` is available for GitHub operations (issues, PRs, etc.).
+---
+## Project layout
+```
+tei_annotator/          core library
+  models/               TEIAttribute, TEIElement, TEISchema; SpanDescriptor, ResolvedSpan
+  inference/            EndpointConfig, EndpointCapability
+  chunking/             chunk_text()
+  detection/            detect_spans() — GLiNER pre-detection (needs [gliner] extra)
+  prompting/            build_prompt(), make_correction_prompt(); Jinja2 templates
+  postprocessing/       parse_response(), resolve_spans(), validate_spans(), inject_xml()
+  schemas/              build_bibl_schema(), build_bibl_reference_segmenter_schema()
+    registry.py         SCHEMA_REGISTRY — maps schema key → build fn + root/child elements
+  providers/            LLM connectors: hf / gemini / kisski / openai / claude
+  evaluation/           EvaluationSpan, extract_spans(), compute_metrics(), evaluate_file()
+  pipeline.py           annotate() — top-level entry point
+  tei.py                create_schema() — parse RNG → TEISchema
+scripts/
+  evaluate_llm.py       run any provider against a gold-standard TEI file
+  debug_annotation.py   step-by-step pipeline debug for a single text snippet
+  smoke_test_llm.py     quick connectivity check
+  smoke_test_webservice.py
+tests/
+  test_*.py             unit tests (fully mocked, < 0.5 s) — run with: uv run pytest
+  integration/          real GLiNER / end-to-end tests (excluded from CI by default)
+data/
+  corpus/               git-tracked gold-standard TEI corpora (bibl.default.tei.xml, etc.)
+  raw/                  gitignored raw source batches and collected hard examples
+webservice/             FastAPI JSON API + browser UI
+docs/                   see Documentation section below
+```
+---
+## Key design rules
+- The LLM prompt talks about **spans** (emit a span / cover a span), never XML tags. Schema descriptions must match this vocabulary.
+- `SpanDescriptor` is always **flat** — no nesting. `ResolvedSpan.children` is populated later by the injector.
+- Source text is **never modified** by any model call.
+- Cross-element constraints belong in `TEISchema.rules` (rendered as numbered "General Rules" before element descriptions), not duplicated inside individual element descriptions.
+---
+## Debugging annotation bugs
+When a text snippet is annotated incorrectly, run `debug_annotation.py` **before**
+touching any code. It executes the full pipeline step-by-step and prints every
+intermediate result so you can pinpoint exactly where accuracy is lost.
+```bash
+uv run scripts/debug_annotation.py --text "<failing snippet>"
+# pass --show-prompt to inspect the full LLM prompt
+# pass --provider / --model to test a different model
+```
+**Read the output top-to-bottom and identify the first stage where the problem
+appears:**
+| Stage | What to look for | Likely fix |
+| --- | --- | --- |
+| **Parsed spans** | LLM emitted the wrong element, wrong text, or missing span | Improve the element description or schema rules |
+| **Resolved spans** | Span parsed correctly but not resolved (context mismatch) | LLM's context string doesn't match source — improve prompt or context instructions |
+| **Validated spans** | Resolved but rejected (unknown element / bad attribute value) | Schema element name or attribute value list is wrong |
+| **Final XML** | All spans correct but XML is malformed or nesting is wrong | `inject_xml` / injector issue |
+Only fix schema descriptions or rules (in `tei_annotator/schemas/`) to address
+**Parsed spans** problems. Do not patch the pipeline code for prompt-quality issues.
+After changing schema descriptions, re-run the debugger on the same snippet to
+confirm the fix, then run the evaluator to check for regressions.
+---
+## Running the evaluator
+```bash
+# quick run: 5 records, gemini, bibl-reference-segmenter schema
+uv run scripts/evaluate_llm.py \
+    --provider gemini --schema bibl-reference-segmenter --max-items 5 --verbose
+# re-run only failing records
+uv run scripts/evaluate_llm.py --verbose --match-mode overlap \
+    --grep "Creed|Robins" --provider kisski
+# all providers, all records
+uv run scripts/evaluate_llm.py --schema bibl --output-file results.txt
+```
+Key flags: `--provider`, `--model`, `--schema`, `--gold-file`, `--max-items`,
+`--batch-size`, `--match-mode`, `--verbose`, `--grep`, `--shuffle`, `--timeout`.
+---
+## Skills
+**`/optimize-element-descriptions`** — iterative workflow for improving schema prompt rules and element descriptions to maximise F1 against a gold standard. Includes guidance for handling genuinely ambiguous gold boundaries via `cert="low"`. See [.claude/skills/optimize-element-descriptions/SKILL.md](.claude/skills/optimize-element-descriptions/SKILL.md).
+---
+## Documentation
+### Module READMEs
+| Path | Topic |
+|------|-------|
+| [tei_annotator/models/README.md](tei_annotator/models/README.md) | TEISchema, TEIElement, TEIAttribute; SpanDescriptor, ResolvedSpan |
+| [tei_annotator/detection/README.md](tei_annotator/detection/README.md) | GLiNER pre-detection |
+| [tei_annotator/chunking/README.md](tei_annotator/chunking/README.md) | Text chunking strategy |
+| [tei_annotator/prompting/README.md](tei_annotator/prompting/README.md) | Prompt templates and builder |
+| [tei_annotator/inference/README.md](tei_annotator/inference/README.md) | EndpointConfig; provider setup examples |
+| [tei_annotator/postprocessing/README.md](tei_annotator/postprocessing/README.md) | Parse → resolve → validate → inject pipeline |
+| [tei_annotator/schemas/README.md](tei_annotator/schemas/README.md) | Built-in schemas, registry, adding a new schema |
+| [tei_annotator/providers/README.md](tei_annotator/providers/README.md) | LLM connectors, adding a new provider |
+| [tei_annotator/evaluation/README.md](tei_annotator/evaluation/README.md) | Evaluation flow, match modes, metrics, `cert="low"` uncertain-boundary handling |
+| [webservice/README.md](webservice/README.md) | FastAPI webservice setup and API |
+### Guides
+| Path | Topic |
+|------|-------|
+| [docs/tei-element-descriptions.md](docs/tei-element-descriptions.md) | Evidence-based guidelines for writing effective TEIElement descriptions |
+### Experiments
+| Path | Summary |
+|------|---------|
+| [docs/experiments/evaluation-results.md](docs/experiments/evaluation-results.md) | Running evaluation results table across models and schemas |
+| [docs/experiments/batch-annotation-experiment.md](docs/experiments/batch-annotation-experiment.md) | Batching multiple records per LLM call to reduce latency |
+| [docs/experiments/2026-05-08-gemini-kisski-bibl-refseg.md](docs/experiments/2026-05-08-gemini-kisski-bibl-refseg.md) | Gemini 2.0 Flash vs KISSKI/Qwen3-Coder on bibl and bibl-reference-segmenter |
+| [docs/experiments/2026-05-08-kisski-model-comparison-bibl-refseg.md](docs/experiments/2026-05-08-kisski-model-comparison-bibl-refseg.md) | KISSKI 4-model comparison on bibl-reference-segmenter |
+### History
+| Path | Topic |
+|------|-------|
+| [docs/history/implementation-plan.md](docs/history/implementation-plan.md) | Original design and implementation plan (historical) |

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ pinned: false
 license: mit
 short_description: Annotate plain text with TEI XML tags using an LLM backend
 ---
 A Python library for annotating plain text with [TEI XML](https://tei-c.org/) tags using a two-stage LLM pipeline.
 1. **(Optional) GLiNER pre-detection** — fast CPU-based span labelling generates candidates for the LLM to verify and extend.
@@ -216,7 +217,6 @@ FINAL OUTPUT                          (annotated XML)
 ## Demo and webservice
 - **HuggingFace demo:** <https://huggingface.co/spaces/cmboulanger/tei-annotator>
-- **`app.py`** — Gradio app for HuggingFace Spaces. See [docs/huggingface-deployment.md](docs/huggingface-deployment.md).
 - **`webservice/`** — FastAPI JSON API + browser UI, all five providers. See [webservice/README.md](webservice/README.md).
 ---

 license: mit
 short_description: Annotate plain text with TEI XML tags using an LLM backend
 ---
 A Python library for annotating plain text with [TEI XML](https://tei-c.org/) tags using a two-stage LLM pipeline.
 1. **(Optional) GLiNER pre-detection** — fast CPU-based span labelling generates candidates for the LLM to verify and extend.
 ## Demo and webservice
 - **HuggingFace demo:** <https://huggingface.co/spaces/cmboulanger/tei-annotator>
 - **`webservice/`** — FastAPI JSON API + browser UI, all five providers. See [webservice/README.md](webservice/README.md).
 ---

package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

package.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "name": "tei-annotator",
+  "version": "1.5.0",
+  "description": "TEI XML annotation library using LLM pipelines",
+  "license": "MIT",
+  "private": true,
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/cboulanger/tei-annotator.git"
+  },
+  "devDependencies": {
+    "@commitlint/cli": "^18.4.0",
+    "@commitlint/config-conventional": "^18.4.0",
+    "@semantic-release/changelog": "^6.0.3",
+    "@semantic-release/exec": "^6.0.3",
+    "@semantic-release/git": "^10.0.1",
+    "commitizen": "^4.3.0",
+    "cz-conventional-changelog": "^3.3.0",
+    "husky": "^8.0.3",
+    "semantic-release": "^22.0.0"
+  },
+  "config": {
+    "commitizen": {
+      "path": "./node_modules/cz-conventional-changelog"
+    }
+  },
+  "scripts": {
+    "test": "uv run pytest",
+    "semantic-release": "semantic-release",
+    "commit": "cz"
+  }
+}

pyproject.toml CHANGED Viewed

@@ -18,7 +18,6 @@ webservice = [
     "uvicorn[standard]>=0.30",
     "python-multipart>=0.0.9",
 ]
-gradio = ["gradio>=6.9"]
 [tool.pytest.ini_options]
 addopts = "-m 'not integration'"
@@ -29,7 +28,6 @@ markers = [
 [tool.taskipy.tasks]
 test = "uv run pytest"
 webservice = "uv run python webservice/main.py"
-gradio = "uv run python app.py"
 [dependency-groups]
 dev = [

     "uvicorn[standard]>=0.30",
     "python-multipart>=0.0.9",
 ]
 [tool.pytest.ini_options]
 addopts = "-m 'not integration'"
 [tool.taskipy.tasks]
 test = "uv run pytest"
 webservice = "uv run python webservice/main.py"
 [dependency-groups]
 dev = [

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+# HuggingFace Spaces — install the package and its gradio extra.
+# Spaces reads this file automatically; no pyproject.toml extras support needed.
+gradio>=6.9
+jinja2>=3.1
+lxml>=5.0
+python-dotenv>=1.2.2
+rapidfuzz>=3.0

schema/tei-bib.rng ADDED Viewed

The diff for this file is too large to render. See raw diff

tei_annotator/providers/README.md CHANGED Viewed

@@ -65,7 +65,7 @@ _ALL_CONNECTORS: list[Connector] = [
 ]
 ```
-That's all. The evaluate script, webservice, and Gradio app pick it up automatically.
 ---

 ]
 ```
+That's all. The evaluate script and webservice pick it up automatically.
 ---

webservice/nginx.conf ADDED Viewed

	@@ -0,0 +1,86 @@

+# Nginx reverse-proxy configuration for tei-annotator webservice.
+#
+# All routes are publicly accessible over HTTPS.
+#
+# Setup:
+#   1. sudo cp webservice/nginx.conf /etc/nginx/sites-available/tei-annotator
+#   2. sudo ln -s /etc/nginx/sites-available/tei-annotator /etc/nginx/sites-enabled/
+#   3. Replace YOUR_DOMAIN with your actual domain (3 occurrences):
+#        sudo sed -i 's/YOUR_DOMAIN/your.domain.example/g' /etc/nginx/sites-available/tei-annotator
+#   4. Install certbot if needed:
+#        sudo apt install certbot python3-certbot-nginx   # Debian/Ubuntu
+#        sudo dnf install certbot python3-certbot-nginx   # RHEL/Fedora
+#   5. Obtain a Let's Encrypt certificate using --standalone (nginx must be stopped
+#      first because the cert does not exist yet and nginx refuses to start with
+#      missing ssl_certificate paths — bootstrap chicken-and-egg):
+#        sudo systemctl stop nginx
+#        sudo certbot certonly --standalone -d your.domain.example
+#        sudo systemctl start nginx
+#   6. Verify nginx is running and auto-renewal works:
+#        sudo systemctl status nginx
+#        sudo certbot renew --dry-run
+#
+# To run the webservice as a systemd service, see webservice/tei-annotator.service.
+# Rate limit: 6 requests/minute per IP with a burst of 10.
+# Covers normal interactive use; blocks scripted automation.
+limit_req_zone $binary_remote_addr zone=tei_api:10m rate=6r/m;
+upstream tei_annotator {
+    server 127.0.0.1:8099;
+    keepalive 16;
+}
+# ── HTTP: redirect everything to HTTPS ───────────────────────────────────────
+server {
+    listen 80;
+    listen [::]:80;
+    server_name YOUR_DOMAIN;
+    # Let certbot's ACME challenge through, redirect everything else
+    location /.well-known/acme-challenge/ {
+        root /var/www/certbot;
+    }
+    location / {
+        return 301 https://$host$request_uri;
+    }
+}
+# ── HTTPS ────────────────────────────────────────────────────────────────────
+server {
+    listen 443 ssl http2;
+    listen [::]:443 ssl http2;
+    server_name YOUR_DOMAIN;
+    # Paths written by certbot --nginx (or --webroot); update if using a
+    # different certificate tool or path.
+    ssl_certificate     /etc/letsencrypt/live/YOUR_DOMAIN/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/YOUR_DOMAIN/privkey.pem;
+    # Modern TLS settings
+    ssl_protocols             TLSv1.2 TLSv1.3;
+    ssl_prefer_server_ciphers on;
+    ssl_ciphers               ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305;
+    ssl_session_cache         shared:SSL:10m;
+    ssl_session_timeout       1d;
+    add_header Strict-Transport-Security "max-age=63072000" always;
+    # Reject request bodies larger than 64 KB to cap token usage.
+    client_max_body_size 64k;
+    location / {
+        limit_req zone=tei_api burst=10 nodelay;
+        limit_req_status 429;
+        proxy_pass         http://tei_annotator;
+        proxy_set_header   Host              $host;
+        proxy_set_header   X-Real-IP         $remote_addr;
+        proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
+        proxy_set_header   X-Forwarded-Proto $scheme;
+        proxy_buffering    off;
+        proxy_read_timeout 360s;
+    }
+}