There may come a time in your application development journey where you need to find patterns in strings. Some methods can take time to execute, while others save you time.
One time-saving technique called regex — regular expressions — can be used to find or validate string patterns. This article takes a look at what regular expressions are, how they might be used in Java, and resources to help you with writing Java-flavored regex.
What are Regular Expressions?
At a very high level, regular expressions are just pattern matchers — characters that match a pattern. The actual metacharacters used to build a regular expression are fairly language agnostic for those languages that support using it. This means that the pattern in Java is most likely the same in JavaScript or Ruby.
The methods in Java are fairly similar to those found in other languages as well. It’s mainly the syntax that’s different.
What are Regular Expressions Used for?
A Regular Expression is useful when you need to search and replace a pattern in a string, and when you need to validate a form. Depending on the circumstances, you can test your regex pattern in a number of different ways.
Search and Replace
The first use case for using regular expressions would be if you want to search for a particular pattern and then replace it with something else.
Let’s say we have a customer database that has some email address for each customer:
Sample Customer Database
id | first_name | last_name | |
1 | Doris | Symcox | dsymcox0@elpais |
2 | Linnell | Friar | lfriar1@pinterest.com |
3 | Paulie | Mathes | pmathes2hud.gov |
4 | Davina | Boam | dboam3@people.com.cn |
5 | Hulda | Coneybeare | hconeybeare4@taobao.com |
6 | Jaymee | Barnes | jbarnes5@vistaprint.com |
7 | Wendy | Fley | wfley6@unesco..com |
8 | Torr | Rustich | trustich7@java.com |
9 | Norene | Redwing | nredwing8@umn.edu |
10 | Philbert | Merveille | pmerveille9@jimdo.com |
If you take a quick look at the database, you can see there are some typos in the email field. Now imagine having 1,000 of those fields! It would be too timely to check over each record by hand.
If we are unsure as to whether or not all of the addresses are valid email addresses, we can use regex methods to make sure it has the correct format. If it does not, we can replace it with something else — either a null value or something of your choosing to indicate that the email is incorrect.
Validation
The other way to use regular expressions is to validate something. When we validate, we want to make sure it follows the correct format. This is an optimal time to make sure a user is giving you the proper format for their input fields.
Take, for instance, when a user inputs a phone number into a form. You can use regex to write a function that makes certain that the input from the user is in the format we want. When working with databases, it’s important to have the same format for all the fields. It makes working with the data much easier.
For validators, you can either raise an error to the user that the input needs to be entered in a certain format or you can write a function that will take the user’s input and save it in the format you’d like. This is fairly easy to do if you use regular expressions.
Types of RegEx Pattern Matchers
There are several different types of matchers that will assist you with writing your regular expressions, which include: literal characters, metacharacters, and quantifiers.
Literal Characters
The most simple example of a regex is a literal string. The string “hello” can be an example of a regex, as can this whole paragraph.
Examples:
/hello/
⇒ collection of five distinct characters.
When a regex pattern is applied here, it looks for each of these characters in succession. “hello”, “helloing”, or “helloed” passes, but “Hello”, “helo”, or “HeLlo” would not.
/The most simple example of a regex is a literal string. The string ‘hello’ can be an example of a regex, as can this whole paragraph\./
⇒ collection of characters that make up a paragraph can also be a regex.
The same applies to this second example. The pattern looks for exactly that arrangement of characters in that exact same way when it’s tested against a string.
When a test is failed, the terminal will throw an error when it’s executed.
Take another look at the second example. Do you notice the forward slash (\) next to the period (.)? This is what it means to escape a character. For the regex engine to overlook that and actually see it as a period, we have to escape it with a forward slash. The period, or dot, we will learn is a defined pattern matcher in regex.
Here are some other characters that need to be escaped if you want the literal character instead of the translated meaning that the regex engine compiles it to.
- Asterisk *
- Backslash /
- Plus +
- Caret ^
- Dollar Sign $
- Dot/Period .
- Pipe |
- Question Mark ?
- Parentheses – both types ()
- Curly Braces – both types {}
Literal characters return exactly that — the literal collection of characters you are looking for. If you need to look for special characters in addition to alphanumeric characters, be sure to escape the character so that the regex pattern goes looking for the wrong thing.
Common Matchers
The purpose of a matcher is to match multiple letters in a pattern. This collection of pattern matching symbols is fairly consistent among the programming languages that use regex.
Matcher | Description | Example |
. | Matches any character | /n.w/ would match now, naw, or new, etc. Any character passes the test |
^regex | Looks for pattern at beginning of the line | /^hello/ would match hello in a line that started with that pattern |
regex$ | Looks for pattern at end of the line | /world$/ would match world in a line that ended with that pattern |
[abc] | Matches a, b, or c | /[misp]/ would match any string that has those characters in it. For example, it could match all the individual letters in mississippi, and miss, but only some of the letters in marsh, and missouri |
[abc][xyz] | Matches a, b, or c followed by x, y, or z | /[Mm].s/ would match any string that starts with M or m, followed by any character, followed by an ss.Mass, miss, moss, would all match |
[^abc] | Not a, b, or c | /[^rstlne]/ would match any character that is not r, s, t, l, n, or e |
[a-zA-Z0-9] | Matches any character within the range | /[a-n]/ would match any character between a and n.and, end, blind, can, all have characters that entirely match here |
A|B | A or B | /^M|m./ would match any word two characters in length that matches either an M or an m at the beginning of a line |
CAT | Matches C, followed by A, followed by T | /hello world/ would match hello world exactly |
Metacharacters
Regular expressions also use metacharacters to describe a pattern. Metacharacters have some sort of meaning behind them and will describe the shape of the pattern.
Metacharacter | Description | Example |
\d | Matches any digit | /\d/ would match 1, 2, or 3, etc. Shorthand for [0-9] |
\D | Matches any non-digit character | /\D/ would match A, B, g, etc.. Shorthand for [^0-9] |
\s | Matches any whitespace character | /\s/ would match new lines, tabs, spaces, etc. Shorthand for [\t\n\x0b\r\f] |
\S | Matches any non-whitespace character | /\S/ shorthand for [^\t\n\x0b\r\f] |
\w | Matches any word character | A word character, short for [a-zA-Z_0-9] |
\W | Matches any non-word character | /[\W]/ would match any special characters. Shorthand for [^\w] |
"Career Karma entered my life when I needed it most and quickly helped me match with a bootcamp. Two months after graduating, I found my dream job that aligned with my values and goals in life!"
Venus, Software Engineer at Rockbot
Note: Capital letter metacharacters (\W, \D, etc) usually correspond to the opposite of what the lowercase letter metacharacters do (\w, \d, etc).
Quantifiers
Quantifier | Description | Example |
+ | One of more of preceding character | /\d+/ would match two or more digits |
* | Zero or more of preceding character | /.*/ would match any character 0 or more times |
? | Zero or one of the preceding character | /a?.*/ would match a, any, hello, world |
{number} | Matches preceding character exactly number of times | /\d{3}/ matches exactly three digits [0-9] |
{num1,num2} | Matches preceding character in a range of nums | /\d{3,5}/ matches 3 to 5 digits that are [0-9] |
Use the quantifiers, metacharacters, and other matchers as the building blocks for your regular expressions. The syntax mentioned here is synonymous across multiple languages. However, there are some things that are used, for instance, in Java, that would not be transferable to JavaScript.
Let’s learn a little more about how regular expressions work in Java.
How to use Regex in Java
When using regular expressions in Java, use the java.util.regex API. The actual regular expression pattern itself relies mostly on the syntax mentioned in the previous section. The actual creation of an instance of a regular expression and the actual match operation is reliant upon the three classes and one interface that are in the java.util.regex package.
These classes and interface are:
Pattern.
A compiled representation of a regular expression. The Pattern class contains a compile method that will take the pattern you create and compile it to a regular expression. This regular expression can be used elsewhere.Matcher.
A class object that takes a Pattern instance and performs matching operations on it against an input string.PatternSyntaxException.
An error class object that is thrown when a syntax error occurs in a regular expression.MatchResult.
An interface that represents the result of a match operation. It has methods to help queries here can be seen but not modified.
Here is an example of this API in action (adapted from Oracle’s documentation):
// Copyright (c) 1995, 2008, Oracle and/or its affiliates. // *All rights reserved. import java.io.Console; import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.regex.MatchResult; public class Main { public static void main(String[] args) { Console console = System.console(); Pattern pattern = Pattern.compile("hEllO wORld", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher("hello world"); if(console == null) { System.err.println("No console"); System.exit() } boolean found = false; while(matcher.find()) { MatchResult matchresult = matcher.toMatchResult(); console.format("%s%nRegex: %s%n", matchresult.group(), pattern.pattern()); found = true; } if(!found) { console.format("No expression matches found%nRegex: %s", pattern.pattern()); } } }
This code snippet, adapted from the tutorial and documentation found on Oracle’s website, illustrates use of the MatchResult interface along with the Pattern and Matcher classes.
The Pattern object is the actual compiled regex itself, seen here with the regex pattern as the first argument. The second argument is a flag that indicates whether the method will treat upper and lowercase characters as the same character. Flags are optional but can be helpful if you have a particular need for one.
The compiled regex pattern from the Pattern instance is used in the Pattern object’s matcher method to create a Matcher object. This allows the object to have access to all of the methods in the Matcher class. One of these methods is the find()
method.
The find()
method is a Boolean method that basically tells us whether or not the regex pattern can be found in the string you passed.
In our code, we assign the found value to a MatchResult instance. The MatchResult is an immutable value that can only be queried and not changed. The group()
method queries the portion of the string that matched the regex pattern that was applied. If you happen to need to know the indexes of the beginning and end of the string the pattern was matched to, you can grab it with start()
and end()
.
This code can be used to test regex in your IDE. Test it out with other regex and strings to learn how it works.
Java Regex Resources
The documentation that surrounds the use of regex in Java requires some navigation. A lot of the top queries on the subject is older documentation that may not be necessarily up-to-date. Take a look at these resources to supplement the documentation that is on Oracle’s website.
Java Regex Tester
Use the Java Regex Tester that was adapted from Oracle’s RegexTestHarness. Career Karma’s version updates the code from where it was when Oracle’s version was written and posted to its Regex tutorial. Replace the arguments in lines 41 and 42 with your regex and your string to test your regex pattern. Add flags as needed to your compile function.
Regex Planet
Regex Planet gives us the ability to test regular expressions in Java. All you have to do is give it the regular expression, select the flags you want to enable, and then test it against multiple inputs that you give it.
Java Regular Expressions Cheatsheet
This is a useful at-a-glance cheatsheet that compiles methods and classes and everything Java regex related onto one single page website. It is super helpful when you need to create your own regex and use it in business logic.
Conclusion
In this article we’ve covered what regular expressions are and how regular expressions are tested in Java. In short, regular expressions are just ways to match patterns in our code that help us to validate or search and replace values.
Java’s regular expression syntax is no more or no less difficult than it is for other languages. The adjustments mainly come from using Java’s regex classes to use regular expressions in your logic.
The best thing to do is to really just play with Java’s flavor of regular expressions for a while. Once you see the patterns and realize what the characters stand for, regular expressions can become less overwhelming and more useful to use.
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.