Run Edit Distance Report

Overview: String Edit Distance

The string edit distance, also referred to as edit distance, is a way to see the difference between two strings, such as words or sentences, after they have been edited.

It has many forms and interpretations, but within the Enterprise TMS, we utilize the Levenshtein Distance for determining string edit distance. In simple terms, the distance between two words is the minimum number of single-character operations it takes to change one word into another.

For example, if you were to change HAT to CAP, the edit distance is 2 (1 operation to change H to C, and another operation to change T to P).

In practice, you could use the string edit distance metric to:

  • Measure the quality of machine translation over time (e.g. when training an MT engine)

  • In this case, a lower edit distance means better MT because reviewers are having to make fewer edits.

  • Measure or determine the degree of change during the human translation or review process

  • In this case, a higher edit distance means there were more edits performed on a segment; and reviewers are, in theory, being more thorough in their review/edit phase.

Edit Distance Report within the Enterprise TMS

Project Managers can generate the report whenever they want against an entire project, multiple documents, or a single document. It’s important to remember that, in order to compare translations from two phases in the workflow, translations must exist (be completed) in at least two phases.

The Edit Distance report feature must be enabled for a Community by a System Administrator. Contact sales@lingotek.com to learn more.

There are three ways for a Project Manager to generate the Edit Distance Report:

  1. Project level (all documents in a Project)

    1. Home

    2. Projects

    3. Select a Project

    4. In the Project Summary section, click the “Actions” dropdown and select “Run Edit Distance Report”

    5. Configure and run the report

    6. Navigate to the Processes Queue page to download the report when it is complete

  2. The selected document(s)

    1. Home

    2. Projects

    3. Select a Project

    4. In the Documents section, use the checkboxes next to the individual document(s) to select the documents to use

    5. Click the “Actions” dropdown and select “Run Edit Distance Report”

    6. Configure and run the report

    7. Navigate to the Processes Queue page to download the report when it is complete

  3. A single document

    1. Home

    2. Projects

    3. Select a Project

    4. In the Documents section, click on a single document

    5. In the Document Summary section, click the “Actions” dropdown and select “Run Edit Distance Report”

    6. Configure and run the report

    7. Navigate to the Processes Queue page to download the report when it is complete

String Edit Distance Normalized

In the exported Edit Distance Report, we’ve included a “String Edit Distance Normalized” column. The intent of this column is to provide a way to give segments, regardless of their size, a normalized metric for comparing them against each other. This normalized number is important for understanding the proportion of change applied to a segment.

As an example, let’s say you wanted to compare two-segment translations that both had a string edit distance of 20. At a glance, the edit distance metric wouldn’t tell you how much of the segment actually changed, proportionately. Did the whole translation change? Does 20 mean half the segment changed? In both cases, the edit distance was 20 operations, but the first segment was 25 characters and the second segment was 200 characters. This means that in reality, the first segment had the greater proportion of the original translation changed. Having this normalized number helps you get a clearer picture of the amount of change across all segments in the report.

We calculate this “normalized” number by taking the total characters in the segment (from the first phase) and dividing it by the string edit distance.

Example 1

String Edit Distance = 20

Phase 1 Translation = 200 characters

20/200 = .1 (or 10% changed)

Example 2

String Edit Distance = 250 characters

Phase 1 Translation = 200 characters

250/200 = 1.25 (or 125% changed)