Sprint 6 AP

1. Sprint Action Points

AP-001: Produce LLM Experiment Artifact (Jupyter Notebook)

Owner: Aizat
What: Create a Jupyter notebook (or similar) that:
- Loads a clinical case (from the team’s JSON structure)
- Sends a series of predefined questions to an LLM (DeepSeek free API, local model, or any accessible model)
- Records the model’s responses
- Compares responses against expected answers (from gold standard)
- Includes commentary on what works, what doesn’t, and why the chosen model/prompt is acceptable.
Due: April 19, 2026
Output: .ipynb file committed to docs/research/ or a separate branch.

AP-002: Demonstrate a Working Conversation

Owner: Aizat, Ilnar
What: Show a simple interactive conversation (could be CLI script or basic web UI) where a user can ask questions and the LLM responds based on the case context. This does not require full backend integration – a standalone script is acceptable.
Due: April 19, 2026
Output: Screen recording or live demo during next mentor meeting.

AP-003: Define Evaluation Criteria for LLM Responses

Owner: Alina, Aizat
What: Write a short document (1-2 pages) specifying:
- What “correct” vs “incorrect” response means for a given case.
- How the team will measure LLM performance (e.g., accuracy on expected symptoms, handling of out-of-scope questions).
- A simple scoring rubric (e.g., pass/fail per question).
Due: April 19, 2026
Output: Markdown file in docs/ai-evaluation.md.

AP-004: Finalize Case JSON Schema and Document It

Owner: Ilnar, Alina
What: Based on the mentor’s feedback and research, finalize the JSON schema for clinical cases. Ensure it separates “patient view” (visible to LLM) from “gold standard” (used for evaluation). Add documentation to docs/case-schema.md.
Due: April 16, 2026

AP-005: Implement Chat UI Contract Without Waiting for Backend

Owner: Timur
What: Even though the backend chat endpoints are not ready, implement the frontend chat UI component with mock data or a placeholder service. This will demonstrate the UI flow and allow the team to define the required API contract (fields, endpoints).
Due: April 16, 2026

AP-006: Adopt Pull Request Workflow for All Changes

Owner: All team members
What: For any code change (including documentation, research notebooks), create a branch and open a Pull Request. Do not push directly to main. Use PR descriptions to explain what was done and link to related issues.
Due: Immediately (ongoing)

AP-007: Prepare Russian Language Test

Owner: Aizat
What: After the basic conversation works in English, run the same prompts in Russian. Document any differences in response quality (e.g., hallucinations, refusal to answer). Update the research artifact accordingly.
Due: April 23, 2026

AP-008: Request API Key (Conditional)

Owner: Alina
Trigger: Once AP-001 and AP-002 are completed and shown to the mentor, the mentor will provide an API key (or proxy endpoint) for production integration.
What to prepare: Link to the notebook, a short summary of findings, and a demo recording.

2. Critical Priorities (Next Week)

Priority	Task	Owner
P0	Produce Jupyter notebook with LLM experiments	Aizat
P0	Show a working conversation (any form)	Aizat, Ilnar
P1	Finalize case JSON schema	Ilnar, Alina
P1	Implement chat UI with mock data	Timur
P2	Define LLM evaluation criteria	Alina, Aizat
P2	Switch to PR-based workflow	All

3. Summary for the Team

The mentor’s message is clear: “I don’t see any working conversational engine yet. You say you have experimented, but without an artifact, it doesn’t count. Show me a notebook, show me a conversation – even with a free model – and I will give you the API key. Also, open Pull Requests so everyone can see who is doing what.”

Next mentor meeting expected in one week (April 19). By then, the team must deliver:

A Jupyter notebook with LLM experiments.
A visible conversation demo (script or UI).
Updated case schema documentation.
At least one open PR showing active work.