Java Intermediate

0% completed

Previous
Next
Introduction to Regular Expressions in Java

Regular expressions (regex) are sequences of characters that define search patterns. They are incredibly powerful tools for searching, validating, and manipulating text. In Java, regex is used extensively through the Pattern and Matcher classes. For beginners, it is important to understand the various components of a regex, such as meta characters, quantifiers, and flags, and how these elements work together to form a pattern.

Before we learn to create and use the regular expression, let's learn few key terms used in creating the regular expression.

Meta Characters

Meta characters are special symbols that have specific meanings in a regex. They help control how the pattern matches text. Here’s a table summarizing some common meta characters along with explanations:

MetacharacterDescriptionExample
|Alternation operator: matches any one of the patterns separated by |.cat|dog|fish matches "cat", "dog", or "fish".
.Matches any single character (except a newline).a.c matches "abc", "aXc", etc.
^Anchors the match at the beginning of a string or line.^Hello matches any string that starts with "Hello".
$Anchors the match at the end of a string or line.World$ matches any string that ends with "World".
\dMatches any digit (0-9).\d matches "5" in "a5b".
\sMatches any whitespace character (space, tab, newline).\s matches a space in "Hello World".
\bMatches a word boundary (the position between a word and a non-word character).\bword\b matches "word" as a whole word.
\uxxxxMatches the Unicode character specified by the hexadecimal number xxxx.\u0041 matches "A".

Quantifiers

Quantifiers define the number of times a character or group should appear in the input for a match to be valid. The table below explains the most common quantifiers:

QuantifierDescriptionExample
n+Matches one or more occurrences of the preceding element n.a+ matches "a", "aa", "aaa", etc.
n*Matches zero or more occurrences of the preceding element n.a* matches "", "a", "aa", "aaa", etc.
n?Matches zero or one occurrence of the preceding element n.a? matches "" or "a".
n{x}Matches exactly x occurrences of the preceding element n.a{3} matches "aaa".
n{x,y}Matches between x and y occurrences (inclusive) of the preceding element n`.a{2,4} matches "aa", "aaa", or "aaaa".
n{x,}Matches at least x occurrences of the preceding element n.a{2,} matches "aa", "aaa", etc.

Flags/Modifiers

Flags, also known as modifiers, adjust the default behavior of a regex pattern. They can be embedded in the pattern itself or passed as parameters when compiling the regex. Here is a summary of common flags:

Flag/ModifierDescriptionExample
(?i) or Pattern.CASE_INSENSITIVEEnables case-insensitive matching, so uppercase and lowercase letters are treated as equal.(?i)cat matches "Cat", "cAt", "CAT", etc.
(?m) or Pattern.MULTILINEChanges the behavior of ^ and $ so that they match the start and end of each line, respectively.Useful when working with multi-line text.
(?s) or Pattern.DOTALLAllows the dot . to match newline characters, making it match any character, including line terminators.Enables . to match across lines.
(?x) or Pattern.COMMENTSPermits whitespace and comments within the regex pattern, which are ignored, thus enhancing readability.Helps in writing complex regex patterns with comments.

Constructing Regex Patterns for Phone Numbers and Email Addresses

Phone Numbers:

  • Simple 10-Digit Number (Digits Only):

    • Regex Pattern: ^\d{10}$
    • Explanation:
      • ^ asserts the start of the string.
      • \d{10} matches exactly 10 digits.
      • $ asserts the end of the string.
  • Formatted with Hyphens (e.g., 123-456-7890):

    • Regex Pattern: ^\d{3}-\d{3}-\d{4}$
    • Explanation:
      • \d{3} matches exactly 3 digits, followed by a hyphen (-).
      • This pattern repeats to match the standard phone number format.

Email Addresses:

  • Common Email Format:
    • Regex Pattern: ^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$
    • Explanation:
      • ^ asserts the start of the string.
      • [A-Za-z0-9+_.-]+ matches one or more allowed characters (letters, digits, plus, underscore, dot, or hyphen) before the @ symbol.
      • @ matches the literal character @.
      • [A-Za-z0-9.-]+ matches one or more allowed characters for the domain.
      • $ asserts the end of the string.

Regular expressions are versatile tools for text processing in Java. By understanding the roles of meta characters, quantifiers, and flags, you can build complex patterns to validate and manipulate text.

In the next lesson, we will use how to use the regular expression for pattern matching and searching.

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next