> ## Documentation Index
> Fetch the complete documentation index at: https://help.pixwel.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Translation memories

This document describes how **Translation Memories** (TM) are indexed (written) and fetched (read). A translation memory is a per-project store of previously-translated lines so that the same source line, encountered again, can be auto-filled with its prior translation.

Phrases are stored in the `phrases` collection (`Phrases` model). Each phrase records a single source line (`ov`) and its translation (`text`), scoped by `project` and `language`.

***

## Data Model

`Phrases` (`api/models/Phrases.php`)

| Field         | Type    | Notes                                                                                   |
| ------------- | ------- | --------------------------------------------------------------------------------------- |
| `_id`         | id      |                                                                                         |
| `project`     | id      | The project the phrase belongs to. **Match scope.**                                     |
| `asset`       | id      | The asset it was indexed from. Recorded but **not used when fetching.**                 |
| `translation` | id      | The translation document it was indexed from. Used to wipe a doc's phrases on re-index. |
| `language`    | string  | Target language code. **Match scope.**                                                  |
| `ov`          | string  | The tag/newline-stripped source (OV) line. **Match key.**                               |
| `text`        | string  | The translated line returned to the editor.                                             |
| `print`       | boolean | `true` if generated from a print asset's `printLines`. Written but **never read.**      |

**Validation:** `project` and `asset` must exist; `language`, `ov`, and `text` must be non-empty. A blank cleaned-OV or blank translation silently fails to save.

***

## Indexing (writing phrases)

### Trigger

Indexing runs through the `GeneratePhrases` save filter, wired into the `Translations` model save chain (`api/models/Translations.php:41`, implemented in `api/models/translations/save/GeneratePhrases.php`). It fires on **every translation save**, but indexes only if **all** of the following hold:

1. `status === 'submitted'` — a `for-review` save does **not** index. The later approval save (which flips status to `submitted`) is what indexes a reviewed translation.
2. `multiTranslation` is empty — **dual-language / multiTranslation submissions never index.**
3. `translator !== "Pixwel"` — transcriptions (OV authoring) are excluded.
4. The asset has an OV transcription of the matching type. The type is resolved from the work request:
   * graphics translation → `autogfx` (if `auto`) or `graphics`
   * dialogue translation → `dialogue` (graphics asset) or `_id`
   * otherwise → `_id`

For **image** assets, indexing runs twice: `indexByOV` **and** `indexByPrintOV`.

### Per-line rules (`indexByOV` / `indexByPrintOV`)

`api/models/Translations.php:394` and `:456`.

1. **Bail entirely** if `count(translation lines) !== count(OV lines)`, or the translation doesn't exist. Matching is strictly by line **position** (`k`), not by content.
2. Wipe all existing phrases for this translation first (`Phrases::remove(['translation' => _id])`, plus `print: true` for the print variant).
3. Per line, **skip** if `text` is empty.
4. Per line, **skip** if `custom === false && machine === false`. Only lines that were **user-edited** (`custom`) or **machine-translated** (`machine`) are indexed. Lines taken verbatim from a TM match (`auto`) or left as untranslated OV are skipped. (See [Line source flags](#line-source-flags).)
5. Compute the match key: `cleanLine(OV_line[k].text)` — strips HTML tags, `\r`, and `\n`.
6. Remove any prior phrase with the same `{ov, project, language}` (+ `print: true` for print) — so a `{project, language, ov}` triple holds at most one phrase.
7. Create the new `Phrases` document.

### Line source flags

The subtitler maps the editor's `translationFrom` value onto the stored flags when saving (`ui/3x/modules/services/subtitle-service.js:307-309`):

| `translationFrom`                             | `auto` | `custom` | `machine` | Indexed?  |
| --------------------------------------------- | ------ | -------- | --------- | --------- |
| `'custom'` (user-edited)                      | false  | **true** | false     | ✅         |
| `'mt'` (machine translation)                  | false  | false    | **true**  | ✅         |
| `'tm'` (filled from a TM match)               | true   | false    | false     | ❌ skipped |
| `'ov'` (untranslated / fell back to original) | false  | false    | false     | ❌ skipped |

### Bulk reindex

`Assets::indexTranslations()` (`api/models/Assets.php:580`) wipes all phrases for an asset and re-runs `indexByOV` for every submitted, non-OV translation on it. Only invoked by the `RegeneratePhrases` migration (`api/migrations/RegeneratePhrases.php`) — **never from the UI.**

***

## Fetching (reading phrases)

### Endpoint

`GET /translations/translate?id=…&language=…` (`api/controllers/Translations.php:18`).
Adding `machine=true` routes to AWS Translate instead of TM. Resolves to `Translations::translate()` → `getMemoryTranslation()` (`api/models/Translations.php:239`).

### Frontend input

`getTranslationId(workRequest)` (`ui/3x/utils/orders.js:175`) supplies the `id`. It is the **target translation's** `_id` (dialogue, graphics, or both, depending on order mode):

| Order mode        | `id` returned                                |
| ----------------- | -------------------------------------------- |
| `print`, `script` | `workRequest.translation._id`                |
| `gfx`, `autogfx`  | `workRequest.graphicsTranslation._id`        |
| `script+gfx`      | `[translation._id, graphicsTranslation._id]` |

For **dual-language** orders (e.g. `GER-PFR`), the frontend splits the language, makes one call per language, and merges results positionally with a `<span></span>` separator to match the dual-language editor rendering (`ui/3x/modules/services/translation-service.js` `getTranslationMemories`).

### Server lookup rules (`getMemoryTranslation`)

1. Load the document by the passed `_id`. Use `printLines` for `document`/`image` media types, otherwise `lines`.
2. For each line, look up a `Phrases` match on exactly **`{ project, ov: cleanLine(line.text), language }`**. Matching is by **exact tag/newline-stripped source-text equality**, scoped to **project + language** — **not asset.** This is what enables reuse of a translation across different assets in the same project.
3. `order: ['_id' => 'desc']` — on multiple matches, the **most recently created** phrase wins.
4. Return one entry per line: the phrase's `text`, or `null` when there is no match.
5. The lookup does **not** filter on `print`.

***

## Subtitler Editor — auto-translation & provenance

This section covers how the subtitler *applies* TM and MT to the editor and how each line's source is shown. (Distinct from indexing/fetching above, which is the API side.)

### Toggles and layering

The subtitler has two independent toggles — **Translation Memories** and **Machine Translations** — that can both be on at once. `TranslationService.autoTranslate()` resolves each line to the highest-priority source that has data:

**Precedence: `Custom > TM > MT > OV`**

* **Custom** — a manual edit. Always wins and is **never overwritten** by a toggle; `autoTranslate` returns custom lines untouched.
* **TM** — a translation-memory match (when the TM toggle is on).
* **MT** — a machine translation (when the MT toggle is on). MT is applied first, then TM overwrites per line where a match exists, so with both toggles on TM takes precedence and MT backfills the rest.
* **OV** — the fallback when no higher source applies (and the line isn't a custom edit).

Turning a toggle off recomputes non-custom lines (reverting them to OV when nothing else applies); custom edits remain.

### History

Layering and custom-edit preservation are the **original** behavior. PR **#3187** (`6ad8027b6`, "allow user to enable only one of the two toggles…", PLATFORM-3916, May 2026) made the toggles mutually exclusive and changed `autoTranslate` to overwrite custom edits (tracking a `customText` field to restore them on toggle-off). That PR has been reverted — the toggles are independent again and custom edits are preserved by precedence, so the `customText` machinery is gone. Every reverted behavior (mutual exclusion, custom overwrite, `customText`, the related tests) traces solely to #3187.

### Provenance colors

Each line's source is shown by a **colored left accent bar** on the translation field, using fixed semantic colors:

| Source | Color  | Hex       |
| ------ | ------ | --------- |
| Custom | green  | `#2f9e44` |
| TM     | violet | `#9d4edd` |
| MT     | orange | `#f08c00` |

Chosen for strong separation in **both hue and lightness** (green = dark, violet = medium, orange = light) so the three sources remain distinguishable for colorblind users and in grayscale — verified against deuteranopia/protanopia/tritanopia simulations.
\| OV | — (neutral) | — |

These colors live in **`ui/3x/constants/provenance.js`** as `PROVENANCE_COLORS` and are **intentionally not part of the themeable palette** (`~/theme`) — provenance is a semantic status signal that must stay stable across themes/white-labeling.

The same colors are reused for:

* A **color key** (`ProvenanceKey`, `data-testid="sub-provenance-key"`) in the actions bar — Custom / Memories / Machine swatches + labels, the always-visible legend for the accent-bar colors.
* The **TM / MT toggles** — each toggle's checked state takes its source color (`accent` prop on `TranslationToggle`), so a toggle visually matches the lines it produces.

The left accent is declared before the `:focus` / `.is-editing` rules so the blue active-cell border still wins while editing.

**Notes:**

* There is **no per-source icon**. An earlier version colored a per-line icon (`circleCheck` / `translationMemories` / `machineTranslations`); it became redundant once the accent bar + color key carried the signal, and was removed. Split/merge icons remain (yellow).
* Non-color fallback: the field carries a `title` tooltip with the source label (Custom/Memories/Machine/OV), and the edit menu shows the same label as text for the active row.
* `Icon` resolves `Theme[color] || color`, so it accepts both theme keys and raw hex (kept for the split/merge icons and future use).

***

## Key Rules

1. **Match scope is `{project, language, ov-text}`** — never asset. TM is reused project-wide.
2. **Match key is the cleaned source line** — HTML tags and line breaks are stripped on both write and read, so matching is exact on the visible source text only.
3. **Only user-edited (`custom`) or machine (`machine`) lines are indexed.** Untouched OV lines and lines accepted verbatim from a TM suggestion are not written back.
4. **`submitted` status indexes; `for-review` does not.** Reviewed translations index when they are later approved to `submitted`.
5. **multiTranslations never index** — but the fetch path fully supports reading dual-language TM.
6. **Most recent phrase wins** on duplicate matches (`_id desc`).
7. **Image assets index both subtitle and print phrases.**
8. **Line count must match the OV** or the whole translation is skipped during indexing.

***

## Known Asymmetries

These are mismatches between the write and read paths, relevant to ongoing TM work:

* **`print` is written but never read.** `indexByPrintOV` tags phrases `print: true`, but `getMemoryTranslation` never filters on it. A print fetch can therefore return a non-print phrase (and vice versa) — whichever is newest.
* **multiTranslations are read but never written.** Dual-language submissions contribute nothing to the memory, even though the fetch path does elaborate per-language merging to read them.
* **Index records `asset`; fetch ignores it.** Reuse is project-wide by design — confirm that is the intended boundary for any given workflow.
* **Index keys on the OV transcription's line text; fetch keys on the passed translation document's line text.** They align only because matching is positional on index and source-text-equality on read.

***

## Code References

* **`GeneratePhrases::filter()`** — `api/models/translations/save/GeneratePhrases.php` — indexing trigger and gate conditions
* **`Translations::indexByOV()` / `indexByPrintOV()`** — `api/models/Translations.php:394` / `:456` — per-line indexing
* **`Translations::translate()` / `getMemoryTranslation()`** — `api/models/Translations.php:217` / `:239` — fetch logic
* **`Translations::cleanLine()`** — `api/models/Translations.php:294` — match-key normalization
* **`Assets::indexTranslations()`** — `api/models/Assets.php:580` — bulk reindex (migration only)
* **`Phrases`** — `api/models/Phrases.php` — phrase schema and validation
* **`translate` route** — `api/controllers/Translations.php:18` — endpoint binding
* **`TranslationService.getTranslationMemories()`** — `ui/3x/modules/services/translation-service.js` — frontend fetch + dual-language merge
* **`getTranslationId()`** — `ui/3x/utils/orders.js:175` — resolves which translation id to fetch
* **`fetchTranslationMemories()`** — `ui/3x/modules/hooks/use-subtitler-queries.js:789` — assembles TM into the editor
* **`SubtitleService.to2xTranslation()`** — `ui/3x/modules/services/subtitle-service.js:298` — maps editor source onto `custom`/`machine`/`auto` flags
* **`TranslationService.autoTranslate()`** — `ui/3x/modules/services/translation-service.js` — applies TM/MT to the editor (Custom > TM > MT > OV)
* **`PROVENANCE_COLORS`** — `ui/3x/constants/provenance.js` — fixed semantic source colors (not themed)
* **Provenance rendering** — `ui/3x/modules/components/subtitler/segment/index.js` (`source-*` classes + `title` on the field) and `segment.css.js` (left accent bar)
* **Toggles & color key** — `ui/3x/pages/subtitler/index.js` and `subtitler.css.js` (`TranslationToggle` `accent` prop, `ProvenanceKey`)