Chapter 12. Working with formatted text

1. Formatting tags
2. Tag operations
3. Tag group nesting
4. Tag group overlapping
5. Tag validation options
6. Tag group validation
7. Hints for tags management

Formatting information present in the source file usually needs to be reproduced in the target file. The in-line formatting information made possible by the supported formats (in particular DocBook, HTML, XHTML, Open Document Format(ODF) and Office Open XML (MS Office 2007 and later) at the time of writing) is presented as tags in OmegaT. Normally tags are ignored when considering the similarity between different texts for matching purposes. Tags reproduced in the translated segment will be present in the translated document.

1. Formatting tags

Tag naming:

The tags consist of one to three characters and a number. Unique numbering allows tags, corresponding to each other to be grouped together and differentiates between tags, that have the same shortcut character, but are in fact different. The shortcut characters used try to reflect the underlying meaning of the tag (e.g. b for bold, i for italics, etc.)

Tag numbering:

Tags are numbered incrementally by tag group. "Tag groups" in this context are a single tag (such as <i0> and </i0>). Within a segment, the first group (pair or single) receives the number 0, the second the number 1 etc. The first example below has 3 tag groups (a pair, a single, and then another pair), the second example has one group only (a pair).

Pairs and singles:

Tags are always either singles or paired. Single tags indicate formatting information that does not affect the surrounding text (an extra space or line break for example).

<b0><Ctr+N></b0>, <br1><b2><Enter></b2><segment 2132>

<br1> is a single tag and does not affect any surrounding text. Paired tags usually indicate style information that applies to the text between the opening tag and the closing tag of a pair. <b0> and </b0> below are paired and affect the text log.txt. Note that the opening tag must always come before the corresponding closing tag:

<Log file (<b0>log.txt</b0>) for tracking operations and errors.<segment 3167>

OmegaT creates its tags before the process of sentence segmenting. Depending upon the segmenting rules, the pair of tags may get separated into two consecutive segments and the tag validation will err on the side of caution and mark the two segments.

2. Tag operations

Care must be exercised with tags. If they are accidentally changed, the formatting of the final file may be corrupted. The basic rule is that the sequence of tags must be preserved in the same order. However, it is possible, if certain rules are strictly followed, to deviate from this basic rule.

Tag duplication:

To duplicate tag groups, just copy them in the position of your choice. Keep in mind that in a pair group, the opening tag must come before the closing tag. The formatting represented by the group you have duplicated will be applied to both sections.

Example:

<b0>This formatting</b0> is going to be duplicated here.<segment 0001>

After duplication:

<b0>This formatting</b0> has been <b0>duplicated here</b0>.<segment 0001>

Tag group deletion:

To delete tag groups, just remove them from the segment. Keep in mind that a pair group must have both its opening and its closing tag deleted to ensure that all traces of the formatting are properly erased, otherwise the translated file may become corrupted. By deleting a tag group you will remove the related formatting from the translated file.

Example:

<b0>This formatting</b0> is going to be deleted.<segment 0001>

After deletion:

This formatting has been deleted.<segment 0001>

3. Tag group nesting

Modifying tag group order may result in the nesting of a tag group within another tag group. This is acceptable, provided the enclosing group totally encloses the enclosed group. In other words, when moving paired tags, ensure that both the opening and the closing tag are both either inside or outside other tag pairs, or the translated file may be corrupted and fail to open.

Example:

<b0>Formatting</b0> <b1>one</b1> is going to be nested inside formatting zero.<segment 0001>

After nesting:

<b0>Formatting <b1>one</b1></b0> has been nested inside formatting zero.<segment 0001>

4. Tag group overlapping

Overlapping is the result of bad manipulations of tag pairs and is guaranteed to result in formatting corruption and sometimes in the translated file not opening at all.

Example:

<b0>Formatting</b0> <b1>one</b1> is going to be messed up.<segment 0001>

After a bad manipulation:

<b0>Formatting <b1>one</b0> </b1>is very messed up now.<segment 0001>

5. Tag validation options

To customize the work with tags, one can set down some of the rules in the Options > Tag validation... window:

The behaviour, stated here, applies to all the source files and not just to some of the file types, like formatted text.

  • Printf variables - do not check, check simple, check all

    OmegaT can check that programming variables (like %s for instance) in the source exist in the translation. You can decide not to check at all, check for simple printf variables (like %s %d etc) or for print variables of all types.

  • Check simple java MessageFormat patterns

    Activating this check box will cause OmegaT to check if simple java MessageFormat tags (like {0}) are processed correctly.

  • Custom tag(s) regular expression

    A regular expression entered here will cause OmegaT treat the detected instances as customer tags. It checks that the number of tags and their order is identical, just like it is the case for omegat-tags.

  • Fragment(s) that should be removed from the translation regular expression

    One can enter a regular expression for unwanted contents in the target. Any matches in the target segment will then be painted red, i.e. easy to identify and correct. When looking for fuzzy matches the remove pattern is ignored. A fixed penalty of 5 is added if the removed part does not match some other segment, so the match does not show up as 100%

6. Tag group validation

The validate tags function detects changes to tag sequences (whether deliberate or accidental), and shows the affected segments. Launching this function – Ctrl+T - opens a window containing all segments in the file containing suspected broken or bad tags in the translation. Repairing the tags and recreating the target documents is easy with the validate tags function. The window that opens when Ctrl+T is pressed features a 3-column table with a link to the segment, the original segment and the target segment

Figure 12.1. Tag validation entry

Tag validation entry

The tags are highlighted in bold blue for easy comparison between the original and the translated contents. Click on the link to activate the segment in the Editor. Correct the error if necessary (in the case above it is the missing <i2></i2> pair) and press Ctrl+T to return to the tag validation window to correct other errors. Tag errors are tag sequences in the translation in which the same tag order and number as in the original segment is not reproduced. Some tag manipulations are necessary and are benign, others will cause problems when the translated document is created.

7. Hints for tags management

Simplify the original text

Tags generally represent formatting in some form of the original text. Simplifying the original formatting greatly contributes to reducing the number of tags. Where circumstances permit, unifying used fonts, font sizes, colors, etc. should be considered, as it could simplify the translation and reduce the potential for tag errors. Read the tag operations section to see what can be done with tags. Remember that if you find tags a problem in OmegaT and formatting is not extremely relevant for the current translation, removing tags may be the easiest way out of problems.

Pay extra attention to tag pairs

If you need to see tags in OmegaT but do not need to retain most of the formatting in the translated document you are free not to include tags in the translation. In this case pay extra attention to tag pairs since deleting one side of the pair but forgetting to delete the other is guaranteed to corrupt your document's formatting. Since tags are included in the text itself, it is possible to use segmentation rules to create segments with fewer tags. This is an advanced feature and some experience is required in order for it to be applied properly.

OmegaT is not yet able to detect mistakes in formatting fully automatically, so it will not prompt you if you make an error or change formatting to fit your target language better. Sometimes, however, your translated file may look strange, and – in the worst case – may even refuse to open.