Using regular expressions in translation and localization
Regular expressions is a formal language based on wildcard characters and used to define search patterns. Regular expressions are supported by most modern text editors and CAT-tools, including Microsoft Word, Notepad++, SDL Trados Studio, MemoQ, XBench. While in word processors regular expressions are only used to search and replace text, in CAT-tools they can also change segmentation rules or define what exactly will be considered as tags.
Many translators have the opinion that regular expressions is something really hard to master. In fact, anyone can learn them, and using regular expressions can save hours of routine work. In this article, we have collected examples of how regular expressions can be useful to a translator.
Delete text typed in Latin/Cyrillic characters
Delete text typed in Latin characters | Delete text typed in Cyrillic characters | |
^[^А-я^\r^\n]+$ | -> | |
<- | ^[^A-z^\r^\n]+$ |
Replace decimal separator in numbers
925.10 | 925,10 | |
([0-9])[.]([0-9]) | -> | \1[,]\2 |
\1[.]\2 | <- | ([0-9])[,]([0-9]) |
Add commas comma to separate groups of thousands
3421876925 | 3,421,876,925 | |
[0-9](?=(?:[0-9]{3})+(?![0-9])) | -> | $&, |
Change date format
15.01.2007 | 15/01/2007 | |
([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4}) | -> | \1/\2/\3 |
\1/\2/\3 | <- | ([0-9]{1,2})[\/]([0-9]{1,2})[\/]([0-9]{2,4}) |
Search for capitalized words
Contract | \<[A-Z][a-z]+\> |
Useful materials:
A brief description of regular expressions:
http://www.pnotepad.org/docs/search/regular_expressions/
Using regular expressions in MemoQ:
https://help.memoq.com/current/en/Places/regular-expressions.html