Using regular expressions in translation and localization

Regular expressions is a formal language based on wildcard characters and used to define search patterns. Regular expressions are supported by most modern text editors and CAT-tools, including Microsoft Word, Notepad++, SDL Trados Studio, MemoQ, XBench. While in word processors regular expressions are only used to search and replace text, in CAT-tools they can also change segmentation rules or define what exactly will be considered as tags.

Many translators have the opinion that regular expressions is something really hard to master. In fact, anyone can learn them, and using regular expressions can save hours of routine work. In this article, we have collected examples of how regular expressions can be useful to a translator.

Delete text typed in Latin/Cyrillic characters

Delete text typed in Latin characters		Delete text typed in Cyrillic characters
^[^А-я^\r^\n]+$	->
	<-	^[^A-z^\r^\n]+$

Replace decimal separator in numbers

925.10		925,10
([0-9])[.]([0-9])	->	\1[,]\2
\1[.]\2	<-	([0-9])[,]([0-9])

Add commas comma to separate groups of thousands

3421876925		3,421,876,925
[0-9](?=(?:[0-9]{3})+(?![0-9]))	->	$&,

Change date format

15.01.2007		15/01/2007
([0-9]{1,2})[\.]([0-9]{1,2})[\.]([0-9]{2,4})	->	\1/\2/\3
\1/\2/\3	<-	([0-9]{1,2})[\/]([0-9]{1,2})[\/]([0-9]{2,4})

Search for capitalized words

Contract

\<[A-Z][a-z]+\>

Useful materials:

A brief description of regular expressions:
http://www.pnotepad.org/docs/search/regular_expressions/

Using regular expressions in MemoQ:
https://help.memoq.com/current/en/Places/regular-expressions.html