April 9, 2020

Java Joy: Using Named Capturing Groups In Regular Expressions

In Java we can define capturing groups in regular expression. We can refer to these groups (if found) by the index from the group as defined in the regular expression. Instead of relying on the index of the group we can give a capturing group a name and use that name to reference the group. The format of the group name is ?<name> as first element of the group definition. The name of the group can be used with the group method of the Matcher class. Also we can use the name when we want to reference the capturing group for example with the replaceAll method of a Matcher object. The format is ${name} to reference the group by name. Finally we can use a named capturing group also as backreference in a regular expression using the syntax \k<name>.

In the following example we define a regular expression with named groups and use them with several methods:

package mrhaki.pattern;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class NamedPatterns {
    public static void main(String[] args) {
        // Define pattern and use names for the capturing groups.
        // The first group has the name project, second org unit number and finally a project number.
        // The format is ?<name>.
        // To make sure the separator is - or / (and not a combination)
        // we use group with name sep and use the backreference \k<sep> to match.
        Pattern issuePattern = Pattern.compile("(?<project>[A-Z]{3})(?<sep>[-/])(?<org>\\w{3})\\k<sep>(?<num>\\d+)$");

        // Create Matcher with a string value.
        Matcher issueMatcher = issuePattern.matcher("PRJ-CLD-42");

        assert issueMatcher.matches();
        // We can use capturing group names to get group.
        // Using separator / also matches.
        assert issuePattern.matcher("EUR/ACC/91").matches();
        // But we cannot mix - and /.
        assert !issuePattern.matcher("EUR-ACC/91").matches();

        // Backreferences to the capturing groups can be used by
        // their names, using the syntax ${name}.
        assert issueMatcher.replaceAll("${project} ${num} in ${org}.").equals("PRJ 42 in CLD.");

Written with Java 14.