Computer Science

Regular Expressions

Regular expressions are sequences of characters that define a search pattern, used for string manipulation and searching within text. They provide a powerful way to match, search, and manipulate text based on patterns, allowing for complex and flexible text processing. In computer science, regular expressions are widely used in tasks such as text parsing, data validation, and pattern matching.

Written by Perlego with AI-assistance

3 Key excerpts on "Regular Expressions"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Language and Computers
    • Markus Dickinson, Chris Brew, Detmar Meurers(Authors)
    • 2012(Publication Date)
    • Wiley-Blackwell
      (Publisher)

    ...In formal language theory, language is treated mathematically, and a set of strings defines a language. For instance, English is defined as the set of all legitimate English sentences. As in other formalisms, Regular Expressions as such have no linguistic content; they are simply descriptions of a set of strings encoding a natural language text. While some patterns cannot be specified using Regular Expressions (see Under the Hood 4 on the complexity of grammar), Regular Expressions are quite suitable for our purposes. Regular Expressions are used throughout the computer world, and thus there are a variety of Unix tools (grep, sed, etc.), editors (Emacs, jEdit, etc.), and ­programming languages (Perl, Python, Java, etc.) that incorporate Regular Expressions. There is even some support for regular expression usage on Windows platforms (e.g., wingrep). The various tools and languages differ with respect to the exact syntax of the Regular Expressions they allow, but the principles are the same. Implementations are very efficient so that large text files can be searched quickly, but they are generally not efficient enough for web searching. 4.4.1 Syntax of Regular Expressions We can now turn to how Regular Expressions are used to describe strings. In this ­section we will discuss the basics of the syntax, and in the next section we will walk through the use of Regular Expressions with one particular tool. Note that while some of the symbols are the same as with basic search operators (e.g., *), as outlined in section 4.2, they often have different meanings. Unlike search operators – whose definitions can vary across systems – Regular Expressions have a mathematical grounding, so the definition of operators does not change. In fact, Regular Expressions can consist of a variety of different types of special characters, but there is a very small set of them...

  • Understanding Corpus Linguistics
    • Danielle Barth, Stefan Schnell(Authors)
    • 2021(Publication Date)
    • Routledge
      (Publisher)

    ...Searching for some constructions or grammar phenomena may require more than PoS tagged corpora. For many languages there are no tagged corpora available, so to find grammatical phenomena, corpus linguists may have to rely on string searchers. Many constructions use particular words or strings or a limited set of these. When this is a small number, Regular Expressions can be helpful (cf. 5.11). Another way one can find grammatical constructions or search for grammatical patterns is to first identify the possible structures in a smaller tagged corpus and then use string information in a larger corpus (à la Bresnan et al. 2007 described in 4.2.5). Some kinds of grammatical phenomena are difficult to find with string searches. In that case, additional annotation for specific categories is probably needed. That is why many corpora are hand-annotated for a limited set of phenomena, although this can be time-consuming (cf. Chapter 7). 5.11 Regular Expressions and specialised query languages Some corpus programs and interfaces have their own query languages that you need to learn to use if you want to search for more than basic strings. Many programs, however, use a standard kind of query called Regular Expressions, also abbreviated as regex or regexp, used for finding patterns in data. Regular Expressions can be combined with text stings to pull out information or replace it with other information. Regular Expressions treat everything as a character, and certain Regular Expressions are used to refer to certain types of characters like letters [a-zA-Z] and classes of characters (like digits with \d). But Regular Expressions also allow us to match less obvious things, like every non-digit character \D. Letters, digits, and underscores can be referenced with \w to match so-called word characters and non-word characters with \W...

  • Bioinformatics Algorithms
    eBook - ePub

    Bioinformatics Algorithms

    Design and Implementation in Python

    • Miguel Rocha, Pedro G. Ferreira(Authors)
    • 2018(Publication Date)
    • Academic Press
      (Publisher)

    ...And, of course, the hypotheses are endless. There are other ways to select groups of characters, using the \ symbol followed by a letter. Some examples of this syntax are given below: •  \s – includes all white space (spaces, newlines, tabs, etc); •  \S – is the negation of the previous, thus matches with all non-white-space characters; •  \d – matches with digits; •  \D – matches with non-digits. Other important meta-characters include the | that works as a logical or (disjunction), stating that the pattern can match with either the expression on the left or the expression on the right, $ matches with the end of a line and ̂ with the beginning of a line. Some examples of strings representing Regular Expressions and possible matching strings are given in Table 5.1. Table 5.1 Examples of Regular Expressions and matching strings. RE Matching strings ACTG ACTG AC.TC ACCTC, ACCTC, ACXTC,... A[AC]A AAA, ACA A*CCC CCC, ACCC, AACCC,... ACC |G.C ACC, GAC, GCC,... AC(AC){1,2}A ACACA, ACACACA [AC]3 CAC, AAA, ACC,... [actg]* a, ac, tg, gcgctgc,... Python includes, within the package re, a number of tools to work with REs, allowing to test their match over strings. The main functions and their description are provided in Table 5.2. Table 5.2 Functions/methods working over Regular Expressions. Function Description re.search (regexp, str) checks if regexp matches str ; returns results on the first match re.match (regexp, str) checks if regexp matches str in the beginning of the string re.findall (regexp, str) checks if regexp matches vstr; returns results on all matches as a list re.finditer (regexp, str) same as previous, but returns results as an iterator In these functions, the result of a match is kept in a Python object that holds relevant information about the match. The methods m. group() and m. span(), applied over an object m returned from a match, allow to retrieve the matching pattern in the string and the initial and final positions of the match...