Using regular expressions in translation and localization
Regular expressions is a formal language based on wildcard characters and used to define search patterns. Regular expressions are supported by most modern text editors and CAT-tools, including Microsoft Word, Notepad++, SDL Trados Studio, MemoQ, XBench. While in word processors regular expressions are only used to search and replace text, in CAT-tools they can also change segmentation rules or define what exactly will be considered as tags.
Many translators have the opinion that regular expressions is something really hard to master. In fact, anyone can learn them, and using regular expressions can save hours of routine work. In this article, we have collected examples of how regular expressions can be useful to a translator.
Delete text typed in Latin/Cyrillic characters
| Delete text typed in Latin characters | Delete text typed in Cyrillic characters | |
| ^[^А-я^\r^\n]+$ | -> | |
| <- | ^[^A-z^\r^\n]+$ |
Replace decimal separator in numbers
| 925.10 | 925,10 | |
| ([0-9])[.]([0-9]) | -> | \1[,]\2 |
| \1[.]\2 | <- | ([0-9])[,]([0-9]) |
Add commas comma to separate groups of thousands
| 3421876925 | 3,421,876,925 | |
| [0-9](?=(?:[0-9]{3})+(?![0-9])) | -> | $&, |
Change date format
| 15.01.2007 | 15/01/2007 | |
| ([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4}) | -> | \1/\2/\3 |
| \1/\2/\3 | <- | ([0-9]{1,2})[\/]([0-9]{1,2})[\/]([0-9]{2,4}) |
Search for capitalized words
| Contract | \<[A-Z][a-z]+\> |
Useful materials:
A brief description of regular expressions:
http://www.pnotepad.org/docs/search/regular_expressions/
Using regular expressions in MemoQ:
https://help.memoq.com/current/en/Places/regular-expressions.html
