# TextIQ Onboarding Guide
> Sample document for testing the KB Browser markdown + mermaid preview
> (UP-MD-MERMAID). Covers GFM features, multiple Mermaid diagram types,
> non-mermaid code blocks, and edge cases.
---
## 1. Overview
TextIQ is a knowledge-management platform that ingests enterprise documents
into a vector store and serves them back through chat, search, and document
generation. This guide walks new content specialists through the **upload →
review → publish** workflow.
**You should be able to complete onboarding in roughly 30 minutes.**
---
## 2. Component map
```mermaid
graph LR
U[User] -->|JWT cookie| BFF[textiq-user-portal-api]
A[Content Specialist] -->|JWT cookie| APA[textiq-admin-portal-api]
APA -->|HMAC| DE[textiq-doc-extraction
Airflow plugin]
DE -->|PUT object| S3[(SeaweedFS
bucket: source-files)]
DE -->|chunks + embeddings| MIL[(Milvus)]
DE -->|metadata row| PG[(Postgres
documents)]
BFF -->|read| PG
BFF -->|stream| S3
BFF -->|search| MIL
BFF --> U
```
The user portal (UP) is **read-only**; uploads and edits flow through the
admin portal (AP) and its BFF. Both portals share the same source-of-truth
Postgres `documents` table and SeaweedFS bucket.
---
## 3. Upload state machine
```mermaid
stateDiagram-v2
[*] --> validating: POST /documents/upload
validating --> rejected: extension / size / magic-byte fail
validating --> queued: passed
queued --> processing: Airflow DAG picks task
processing --> failed: parse / embed error
processing --> completed: chunks + embeddings written
failed --> processing: manual retry
rejected --> [*]
completed --> [*]
```
Statuses are observable in the BFF response from
`GET /api/v1/kb/documents?status=...`:
| Status | Returned when | Visible in KB Browser? |
| --- | --- | --- |
| `queued` | DE accepted the upload | Yes (greyed) |
| `processing` | DAG is running | Yes (spinner) |
| `completed` | Chunks + embeddings persisted | **Yes (full)** |
| `failed` | DAG threw, message in `error_message` | Yes (red) |
| `rejected` | Failed validation, no DB row | No |
---
## 4. Approval review flow (sequence)
```mermaid
sequenceDiagram
autonumber
actor CS as Content Specialist
participant AP as Admin Portal UI
participant APA as Admin Portal API
participant DE as Doc Extraction
participant DB as Postgres
CS->>AP: Open Review Inbox
AP->>APA: GET /pending-files
APA->>DB: SELECT * FROM documents WHERE status='completed' AND approved=false
DB-->>APA: 12 rows
APA-->>AP: 12 pending files
AP-->>CS: render queue
CS->>AP: click "Approve" on row 3
AP->>APA: POST /documents/{id}/approve
APA->>DE: HMAC-signed approve hook
DE->>DB: UPDATE documents SET approved=true, approver_id=:cs
DE-->>APA: 200 OK
APA-->>AP: success toast
AP-->>CS: row removed from queue
```
---
## 5. Configuration snippet
The BFF needs the following env vars to stream from SeaweedFS. Plain code
block (not mermaid) — should render as `` with normal styling:
```yaml
SEAWEEDFS_ENDPOINT_URL: http://seaweedfs-s3.seaweedfs.svc.cluster.local:8333
SEAWEEDFS_ACCESS_KEY_ID:
SEAWEEDFS_SECRET_ACCESS_KEY:
SEAWEEDFS_UPLOAD_BUCKET: source-files
```
Python client wiring:
```python
# app/core/object_storage.py
storage = ObjectStorage(settings)
metadata, stream = await storage.open_object_stream(file_path)
return StreamingResponse(stream, media_type=metadata.content_type or "application/pdf")
```
---
## 6. Class diagram — domain model
```mermaid
classDiagram
class Document {
+UUID id
+UUID tenant_id
+UUID node_id
+string filename
+string file_path
+string file_type
+int file_size_bytes
+string status
+datetime created_at
+datetime processed_at
}
class Chunk {
+UUID id
+UUID document_id
+int sequence
+string content
+vector embedding
}
class AuditLog {
+UUID id
+UUID actor_user_id
+UUID target_document_id
+string action
+datetime timestamp
}
Document "1" --> "many" Chunk
Document "1" --> "many" AuditLog : delete / approve
```
---
## 7. Edge cases the renderer must handle
- **Empty mermaid fence** — should fall back to raw display:
```mermaid
```
- **Invalid mermaid syntax** — should show error alert + raw source, not crash:
```mermaid
this is not valid mermaid syntax!
```
- **Inline HTML** — must be escaped (rehype-raw disabled per security review):
`` should appear as literal text, not run.
- **Task list (GFM)**:
- [x] Upload PDF
- [x] Click preview
- [ ] Verify mermaid renders
- [ ] Verify table cells align
- **Autolink (GFM)**: https://textiq.fpt-aic.com should become a clickable link.
- **Emphasis**: *italic*, **bold**, ~~strikethrough~~, `inline code`.
---
## 8. Pie chart
```mermaid
pie title KB content mix
"PDFs": 62
"Word/Office": 22
"Markdown": 11
"Plain text": 5
```
---
## 9. Troubleshooting
| Symptom | Likely cause | Fix |
| --- | --- | --- |
| Preview hangs on spinner | BFF can't reach SeaweedFS | Check `SEAWEEDFS_ENDPOINT_URL` in secret + NetworkPolicy in `metis-dev` |
| `content_unavailable` 404 | `documents.file_path` points to a missing S3 key | Check DE worker log for the upload's `key=...` line |
| Mermaid block shows raw source | Syntax error or > 3s timeout | Validate at https://mermaid.live first |
| GFM table renders without borders | Tailwind typography missing | Confirm `@plugin "@tailwindcss/typography"` in `tailwind.css` |
| Sidebar primary still navy | Old `active_theme` cookie | Open browser DevTools → Application → Cookies → delete `active_theme` |
---
## 10. Done?
If preview opens, mermaid renders, GFM table aligns, and the script tag
above is escaped (not executed), the UP-MD-MERMAID feature is working
end-to-end. 🎉