Today I've learned that Golang (and other re2-based engines) supports Unicode characters class matching, which is essentially means "match any character of language X".
The syntax is: \p{Name}
where Name
is a class name.
E.g. to match both English and Arabic words, use:
var re = regexp.MustCompile(`[\p{Arabic}\p{Latin} ]+`)
re2 syntax reference, refer to "Unicode character class names" table. It appeared that groups like "punctuation", "currency symbol", "letter number" also exists, which might be quite useful when you work with languages like Arabic, which has its own numerals.