ISOs on translation

In February 2024, the International Organization for Standardization (ISO) published ISO 5060 Translation services, Evaluation of translation output—General guidance. It gives guidance on evaluating human translation output, post-edited machine translation output, and unedited machine translation output.

I searched the ISOs online store for other ISOs dealing with translation. I have prepared summaries below from the landing pages on each ISO and from the ‘free sample’ available from each landing page.  The free sample is typically just some introductory material from the beginning of that ISO. The ISOs themselves are beyond a paywall. I have grouped the summaries into the following categories:

  • producing and editing translation;
  • translation services;
  • specialised translation;
  • terminology;
  • managing language resources; and
  • information technology;

Before preparing these summaries, I hadn’t been aware of the importance of some of the topics covered, including a typology of language registers (ISO 20694), standardising the presentational format of translations (ISO 2384), evaluating translation output (ISO 5060), post editing of machine translation output (ISO 18587), word segmentation (ISO 24614), persistent identifiers of digital locations (24619) and various aspects of linguistic annotation (including ISOs 24610-24617).

Producing and editing translations

Translation-oriented writing

The ISO has a project to create an ISO on translation-oriented writing for authors and editors. It will:

  • recommend how to produce and evaluate technical texts intended for translation into other languages by human and/or machine translation; and
  • address text production and translation interface requirements for technical texts.

Translators and translation service providers will be able to use the document in assessing the suitability of technical texts for translation, and tool manufacturers to use it in developing automatic language testing and verification procedures. This document will not cover specific requirements of producing fiction, journalistic, advertising and other non-technical texts.

ISO 20539:2023 Translation, interpreting and related technology
Vocabulary

This document defines terms for International Standards on translation, interpreting and related technology.

ISO 24495-1 Plain language — Part 1: Governing principles and guidelines

I covered ISO 24495-1 last year at https://languagemiscellany.com/2023/06/iso-on-plain-language/

ISO/TR 20694:2018 A typology of language registers

ISO/TR 20694:2018 gives the general principles for language registers in both descriptive and prescriptive environments. It defines key concepts and describes examples of different language registers that can be applied across all or many languages and of registers that are specific to some languages.

The document defines:

  • a language register as a language variety used for a particular purpose or in an event of language use, depending on the type of situation, especially its degree of formality;
  • a language variety as the largest subset of an individual language that is homogeneous both with regard to a certain criterion for linguistic variation and with regard to a certain structural criterion for linguistic variation.

The document notes that people usually have more than one language register in their verbal repertoire and can vary their use of register for different purposes or domains.

Computational management of language resources requires appropriate descriptors and tags for different language varieties. A description of a language register needs to state whether it is a written or a spoken register, or expressed by some other modality. It is therefore multifaceted, and polyhierarchical, fitting in with existing ISO standards such as ISO/TS 24620-1 ISO 24620-1:2015 Language resource management, Controlled natural language (CNL), Part 1: Basic concepts and principles and ISO 639:2023 Code for individual languages and language groups in order to attain maximum impact.

For more on ISO 639, please see Russian: an official language at the ISO – Language Miscellany

The document lays down guidelines for language registers needed in a wide range of environments, including:

  • terminology work, where it contributes to the development of a wide range of standards;
  • translation, so that appropriate language levels can be chosen in target languages, to match that of the source language;
  • lexicography, to improve descriptors of non-geographic language variants;
  • second language teaching and learning, to help students avoid pitfalls of using inappropriate language;
  • software, to improve tagging of language variants in computer applications.

A typology of language registers will aid appropriate communication in business and commerce, for example where:

  • a marketing campaign needs to address consumers in a friendly, informal register; or
  • in medicine, where communication between professionals needs to differ from simple communication for public health campaigns.

ISO 2384:1977 Documentation
Presentation of translations

This ISO sets out rules for presenting translations in a standard form to simplify their use by different categories of user. It applies to the translation of all documents, whether the translation is complete, partial or abridged. The document discusses translations of:

  • books and other separately issued publications;
  • periodical and other serial publications;
  • contribution and articles; and
  • patents and similar documents.  

The topics covered include presentation, notes (and footnotes and bibliographical references), formulae (and equations, symbols and units), figures (and legends and titles of figures and tables), transliteration, names and symbols of organizations, abbreviations, terminology, identification of authors, retranslation, geographical names, dates, name(s) of translator(s) and authority to publish a translation.

Translation services

ISO/TS 11669:2012 Translation projects
General guidance

This Technical Specification provides general guidance for all phases of a translation project. Its main purpose is to facilitate communication among the parties involved in a project.

It is intended for use by all stakeholders of the translation project, including those who request translation services, those who provide the services and those who make use of the results of the project — in particular, the translation product. It applies to multiple sectors, including the commercial and government sectors, and non-profit organizations.

It provides a framework for developing structured specifications for translation projects, but does not cover legally binding contracts between parties involved in a translation project.

It addresses quality assurance and provides the basis for qualitative assessment, but does not provide procedures for quantitative measures of the quality of a translation product.

ISO 17100:2015 Translation services
Requirements for translation services

ISO 17100 covers the core processes, resources, and other aspects necessary for the delivery of a quality translation service that meets applicable specifications. Applicable specifications can include those of the client, of the translation service provider (TSP) itself, and of any relevant industry codes, best-practice guides, or legislation.

The use of raw output from machine translation plus post-editing is outside the scope of ISO 17100:2015.

ISO 5060:2024 Translation services
Evaluation of translation output
General guidance

This ISO gives guidance on evaluating human translation output, post-edited machine translation output, and unedited machine translation output. It also provides guidance on the qualifications and competences of evaluators, and discusses the role of sampling.

This ISO can also support the evaluation of source texts intended for translation and the human evaluation of translation output. The document does not apply to related elements such as assuring the quality of translation output and corrective actions.

This document focuses on an analytic translation evaluation approach using error types and penalty points configured to produce an error score and a quality rating.

This ISO applies to translation service providers (TSPs), including individual translators, translation companies or in-house translation services, their clients and other interested parties in the translation sector, such as translator education and training institutions. To make the document applicable for many users, its approach is designed to reflect minimum complexity.

ISO 18587:2017 Translation services
Post-editing of machine translation output
Requirements

ISO 18587:2017 provides requirements for the process of full, human post-editing of machine translation output and for post-editors’ competences. It is for use by TSPs, their clients, and post-editors.

Interpreting services

Most of the ISOs mentioned above say they don’t cover interpreting services. ISOs on interpreting services include ones on general requirements and recommendations for interpreting services (ISO 18841), community interpreting (ISO 13611), on legal interpreting (ISO 20228), healthcare interpreting (ISO 21998) and conference interpreting (ISO 23155)

Specialised translation

ISO 20771:2020 Legal translation
Requirements

This document specifies requirements for the competences and qualifications of legal translators, revisers and reviewers, best translation practices and the translation process directly affecting the quality and delivery of legal translation services.

It specifies the core processes, resources, confidentiality, professional development requirements, training and other aspects of the legal translation service provided by individual translators.

The document does not cover use of output from machine translation, even with post-editing.

Terminology

ISO 12616-1:2021 Terminology work in support of multilingual communication
Part 1: Fundamentals of translation-oriented terminography

This document specifies requirements and recommendations on fundamentals of translation-oriented terminography for producing bilingual or multilingual terminology collections. It:

  • deals with the main tasks, skills, processes and technologies for translation-oriented terminography practiced in low-complexity settings as part of non-terminological activities.
  • does not cover terminology management involving sophisticated workflows, a multitude of roles, or advanced terminological skills and competences.

Managing language resources

ISO 24614-1:2010 Language resource management
Word segmentation of written texts
Part 1: Basic concepts and general principles

ISO 24614 consists of the following parts:

  • Part 1: Basic concepts and general principles
  • Part 2: Word segmentation for Chinese, Japanese and Korean

Word segmentation for other languages is to form the subject of a future Part 3.

Part 1 presents the basic concepts and general principles of word segmentation, and provides language-independent guidelines to enable written texts to be segmented, in a reliable and reproducible manner, into word segmentation units (WSU). Many applications and fields need to segment texts into words. They include translation, content management, speech technologies, computational linguistics and lexicography.

In segmenting a text into words, it is critical to define what comprises a word. Rules based only on spaces and punctuation do not account for situations such as hyphenated compounds, abbreviations, idioms or word-like expressions that contain symbols or numbers. Word segmentation is even more problematic for:

  • languages (such as Chinese and Japanese) that do not use spaces to separate words; and
  • agglutinative languages (such as Korean) that express some functional word classes as affixes.

The need to segment texts into words arises in many applications and fields, including;

  • Translation. Word count is often used in calculating the cost of a translation. Word segmentation is a standard function in translation memory systems and computer-assisted translation (CAT) tools. Word segmentation is performed by term extraction tools, which are sometimes provided in terminology management systems and CAT tools.
  • Content management. Most content management systems and databases allow searching by individual words.
  • Speech technologies. Text-to-speech systems require segmentation into words for lexicon lookup, stress assignment, prosodic pattern assignment, etc.
  • Computational linguistics. Various natural language processing (NLP) systems need to segment text into words. NLP systems include morphosyntactic processors, syntactic parsers, spellcheckers, text classification systems, and corpus linguistics annotators.
  • Lexicography. Lexical resources are often evaluated by size, usually by referring to the number of words.

ISO 24614-2:2011 Language resource management
Word segmentation of written texts
Part 2: Word segmentation for Chinese, Japanese and Korean

The basic concepts and general principles defined in ISO 24614-1 apply to Chinese, Japanese and Korean. ISO 24614-2 covers further factors to consider in segmenting text into words in texts written in these languages:

  • There is no white space between words in Chinese, Japanese or pre-modern Korean texts. This creates a need for a consistent way of identifying word segmentation units (WSUs) for those languages. On the other hand, in modern-day Korean text, word forms or verbal stems are agglutinated with grammatical affixes (‘eojeol’ or ‘malmadi’) that are separated by white space in English written texts.
  • Korean and Japanese borrow or derive many words from Chinese words and base their internal structures on the word formation principles of Chinese. Thus, general rules for identifying WSUs in Chinese, especially internal WSUs embedded in larger WSUs, also apply to some extent in processing of Japanese and Korean texts.
  • The use of characters does not play a real role in identifying WSUs in a text. Many Korean words can be written either in Chinese or in Korean characters, but the same principles of analysing Chinese-derived words and identifying sub-WSUs of those words apply.
  • A newspaper published in Beijing is written in simplified Chinese characters, but a Hong Kong newspaper may be written in traditional Chinese characters. Here again, the same principles of identifying WSUs apply to both newspapers.

ISO 24614-2:2011 first sets out the general rules for identifying WSUs in all 3 languages, then addresses specific rules for each language.

Segmenting a text into words or other WSUs is distinct:

  • from morphological or syntactic analysis, although it greatly depends on morphosyntactic analysis.
  • from constructing a lexicon and identifying its lexical entries, namely lemmas and lexemes. Frameworks for the latter tasks are provided by ISOs on other aspects of language resource management, including feature structures (ISO24610), morpho-syntactic annotation (ISO 24611), linguistic annotation framework (ISO 24612), lexical markup (ISO 24613) and syntactic annotation (ISO 24615).

ISO 24616:2012 Language resources management
Multilingual information framework

ISO 24616:2012 provides a generic platform for modelling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modelling applications.  It provides:

  • a metamodel and a set of generic data categories for various application domains.
  • strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).

ISO 24617-14:2023 Language resource management
Semantic annotation framework (SemAF)
Part 14: Spatial semantics

ISO 24617-7:2020 Language resource management, Semantic annotation framework, Part 7: Spatial information specifies ways of annotating spatial information used in natural language such as English. The information references to locations, general spatial entities, spatial relations (involving topological, orientational, and metric values), dimensional information, motion events, paths, and event-paths triggered by motions.

ISO 24617-14:2023 extends ISO 24617-7:2020 by establishing a formal semantics for its abstract syntax. The proposed semantics:

  • translates annotation structures to semantic forms (represented in a type-theoretic first-order logic); and
  • interprets semantic forms (with respect to a model for part of the world to which an annotated language is referentially, or denotationally, anchored).

ISO 24617 has the general title Language resource management—Semantic annotation framework and consists of the following parts:

  • 1 Time and events
  • 2 Dialogue acts
  • 4 Semantic roles
  • 5 Discourse structures
  • 6 Principles of semantic annotation
  • 7 Spatial information
  • 8 Semantic relations in discourse, core annotation schema
  • 9 Reference Annotation Framework
  • 11 Measurable quantitative information

The following parts are in preparation:

  • 10 Visual Information
  • 12 Quantification
  • 15 Measurable quantitative information extraction

I have found no reference to parts 3 or 13.

ISO 24619:2011 Language resource management
Persistent identification and sustainable access (PISA)

ISO 24619:2011 specifies requirements for the persistent identifier (PID) framework and for using PIDs as references and citations of language resources in documents as well as in language resources themselves. Examples of language resources include digital dictionaries, language-purposed terminological resources, machine-translation lexica, annotated multimedia/multimodal corpora, text corpora that have been annotated with, for example, morpho-syntactic information.

PID frameworks provide:

  • references to internet addresses that will persist even when web resources are relocated.
  • references to parts of a resource on the internet. This is especially important for language resources, because several layers of granularity are usually superimposed on the same data set or resource collection.
  • metadata, for instance copyright information.

ISO 24619:2011 mentions several possible existing standards and de-facto standards, including the Digital Object Identifier (DOI). https://doi.org

ISO/IEC 20382-1:2017 Information technology User interfaces Face-to-face speech translation
Part 1: User interface

ISO/IEC 20382-1:2017 specifies face-to-face speech translation designed to interoperate among multiple translation systems with different languages. It also specifies the speech translation features, general requirements and functionality, thus providing a framework to support a convenient speech translation service in face-to-face situations. It:

  • applies to user interfaces for speech translation and communication protocols for setting up a translation session among users; but
  • does not apply to defining the speech translation engine itself.

A flexible and interoperable standardized framework is needed to work with all different languages, utilizing many speech translation systems already developed in many countries.

Information technology

ISO/IEC 30122-3:2017 Information technology
User interfaces
Voice commands Part 3: Translation and localization

This document contains requirements and recommendations on translating and localising spoken words or phrases for voice commands used for controlling information and computer technology devices.

Leave a comment

Your email address will not be published. Required fields are marked *