Java 9 Regular Expressions
eBook - ePub

Java 9 Regular Expressions

  1. 158 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Java 9 Regular Expressions

About this book

Solve real world problems using Regex in Java.About This Book• Discover regular expressions and how they work• Implement regular expressions with Java to your code base• Learn to use regular expressions in emails, URLs, paths, and IP addressesWho This Book Is ForThis book is for Java developers who would like to understand and use regular expressions. A basic knowledge of Java is assumed.What You Will Learn• Understand the semantics, rules, and core concepts of writing Java code involving regular expressions• Learn about the java.util.Regex package using the Pattern class, Matcher class, code snippets, and more• Match and capture text in regex and use back-references to the captured groups• Explore Regex using Java String methods and regex capabilities in the Java Scanner API• Use zero-width assertions and lookarounds in regex• Test and optimize a poorly performing regex and various other performance tipsIn DetailRegular expressions are a powerful tool in the programmer's toolbox and allow pattern matching. They are also used for manipulating text and data. This book will provide you with the know-how (and practical examples) to solve real-world problems using regex in Java.You will begin by discovering what regular expressions are and how they work with Java. This easy-to-follow guide is a great place from which to familiarize yourself with the core concepts of regular expressions and to master its implementation with the features of Java 9. You will learn how to match, extract, and transform text by matching specific words, characters, and patterns. You will learn when and where to apply the methods for finding patterns in digits, letters, Unicode characters, and string literals. Going forward, you will learn to use zero-length assertions and lookarounds, parsing the source code, and processing the log files. Finally, you will master tips, tricks, and best practices in regex with Java.Style and approachThis book will take readers through this learning journey using simple, easy-to-understand, step-by-step instructions and hands-on examples at every stage.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Information

Understanding the Core Constructs of Java Regular Expressions

Using Java as a regular expression flavor, in this chapter, we will go a bit deeper and learn in detail about anchors, quantifiers, boundary matchers, all the available character classes, negated character classes, predefined character classes, and character classes escaping rules, using Java as a reference point. You will also learn Unicode text matching, using regular expressions in Java. We will also cover greedy versus non-greedy (lazy matching) and the change of regular expression behavior with lazy matching.
We will cover the following topics in this chapter:
  • Anchors and quantifiers
  • Boundary matchers
  • Character classes
  • Regex escaping rules
  • Escaping inside character classes
  • Negated character classes
  • Predefined character classes
  • Unicode characters matching
  • Greedy quantifiers
  • Lazy quantifiers
  • Possessive quantifiers
  • Various embedded modes in regular expressions and their meaning
  • Enabling/disabling regular expression modes inside the regex

Understanding the core constructs of regular expressions

Certain special character constructs are allowed literally in Java regular expressions. Here they are:
Special Character Meaning
\0c A character with the octal value c
\0cc A character with the octal value cc
\0ncc A character with the octal value ncc, where n cannot be more than 3
\xhh A character with the hexadecimal value 0xhh
\uhhhh A character with the hexadecimal value 0xhhhh
\x{h...h} A character with the hexadecimal value 0xh...h, where h must be a valid CODE_POINT
\n Newline character or u000A
\t Tab character or u0009
\r Carriage return character or u000D
\f Form feed character or u000C
\e Escape character or u\u001B
\a Bell character or \u0007
\cn A control character represented by n

Quantifiers

We briefly looked at quantifiers in the first chapter. Quantifiers allow us to quantify the occurrences of our matches. We can match the input in various ways, such as an optional match, an open-ended range, a closed range, and by using a fixed number. Let's take a closer look at them, as quantifiers are integral to most of the regular expressions.

Basic quantifiers

The following table lists all the quantifiers available in Java regular expressions:
Quantifier Meaning
m* Match m zero or more times
m+ Match m one or more times
m? Match m one or zero times (also called an optional match)
m{X} Match m exactly X times
m{X,} Match m X or more times
m{X,Y} Match m at least X and at most Y times
In all the aforementioned cases, m can be a single character or a group of characters. We will discuss grouping in more detail later.

Examples using quantifiers

Let's look at few examples to understand these basic quantifiers better.
Which regex pattern should be used to match a two-digit year or a four-digit year?
 \d{2}|\d{4} 
Which regex pattern should be used to match a signed decimal number? The pattern should also match a signed integer number:
 ^[+-]?\d*\.?\d+$ 
Here is the breakup of the preceding regex pattern:
  • The ^ and $ symbols are the start/end anchors
  • The [+-]? pattern makes either the + sign or the - sign (optional because of ?) at the start
  • The \d* pattern matches zero or more digits
  • The \.? pattern matches an optional dot (.) literally
  • The \d+ pattern matches one or more digits
The preceding regex will match all of these inputs:
  • .45
  • 123789
  • 5
  • 123.45
  • +67.66
  • -987.34
What would be the regex to match a number that is at least 10 but not more than 9999?
 ^\d{2,4}$ 
Since we have a minimum of two digits, 10 is the smallest match, whereas the maximum number of digits allowed is four, and hence, 9999 is the highest match.
What is the regex for an input that has seven digits and that can have + or - at the start?
 ^[+-]?\d{7}$ 
The [+-]? pattern makes it an optional match at the start before we match the seven digits using \d{7}.
The preceding regex can also be written as ^[+-]?[0-9]{7}$, as \d is a shorthand property to match [0-9]

Greedy versus reluctant (lazy) matching using quantifiers

So far, we have discussed all the quantifiers available to us in a regular expression to match fixed-size or variable-length text. These quantifiers are, by default, greedy in nature. Greediness is...

Table of contents

  1. Title page
  2. Copyright
  3. Credits
  4. About the Author
  5. About the Reviewer
  6. www.PacktPub.com
  7. Customer Feedback
  8. Preface
  9. Getting Started with Regular Expressions
  10. Understanding the Core Constructs of Java Regular Expressions
  11. Working with Groups, Capturing, and References
  12. Regular Expression Programming Using Java String and Scanner APIs
  13. Introduction to Java Regular Expression APIs - Pattern and Matcher Classes
  14. Exploring Zero-Width Assertions, Lookarounds, and Atomic Groups
  15. Understanding the Union, Intersection, and Subtraction of Character Classes
  16. Regular Expression Pitfalls, Optimization, and Performance Improvements

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Java 9 Regular Expressions by Anubhava Srivastava in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming. We have over one million books available in our catalogue for you to explore.