Difference between revisions of "Regular Expressions"

From TRCCompSci - AQA Computer Science
Jump to: navigation, search
(Regular expression meta-characters)
Line 41: Line 41:
 
   
 
   
 
==Regular expression meta-characters==
 
==Regular expression meta-characters==
 
+
{| class="wikitable"
Symbol Meaning Example
+
|-
│ Used to separate alternatives a│b
+
! Symbol !! Meaning !! Example
Means a or b
+
|-
? Used to denote zero or one of the preceding element a?
+
|<nowiki></nowiki>|| Used to separate alternatives || a│b (Means a or b)
0 or 1 as; matches with ‘’ & ‘a’
+
|-
* Used to denote zero or more of the preceding element a*
+
|? ||Used to denote zero or one of the preceding element ||a? (0 or 1 as; matches with ‘’ & ‘a’)
0 or more as; matches with ‘’, ‘a’, ‘aa’, etc.
+
|-
+ Used to denote one or more of the preceding element a+
+
|* ||Used to denote zero or more of the preceding element ||a* (0 or more as; matches with ‘’, ‘a’, ‘aa’, etc.)
1 or more as; matches with ‘a’, ‘aa”’etc.
+
|-
( ) Used to group characters together, to indicate the scope of another operator (ab)*
+
|+ ||Used to denote one or more of the preceding element ||a+ (1 or more as; matches with ‘a’, ‘aa”’etc.)
0 or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
+
|-
[ ] Another way of denoting alternatives (instead of vertical bar). Defines a character class [ab]
+
|( ) ||Used to group characters together, to indicate the scope of another operator ||(ab)* (Example 0 or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
means a or b
+
|-
\ The escape character (this turns the metacharacter into an ordinary character) a\*
+
|[ ] ||Another way of denoting alternatives (instead of vertical bar). Defines a character class ||[ab] (means a or b)
the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.
+
|-
^ Used to indicate the negation of a character class
+
|\ ||The escape character (this turns the metacharacter into an ordinary character) ||a\* (the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.)
 
+
|-
Also used to match the position before the first character in a string
+
|^ ||Used to indicate the negation of a character class. Also used to match the position before the first character in a string || a[^bc] (a followed by a character that is not a b or c) ^abc will match with abc only if it is at the beginning of a string
a[^bc]
+
|-
a followed by a character that is not a b or c
+
|$ ||Used to match with the position after the last character in a string ||abc$ (will match with abc only if it is at the end of a string)
^abc
+
|-
will match with abc only if it is at the beginning of a string
+
|. ||Matches with any single character ||a.a (will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’)
$ Used to match with the position after the last character in a string abc$
+
|-
will match with abc only if it is at the end of a string
+
| - ||Used to specify a range of values in a character class ||[A-Z] (character in the range of A to Z)
. Matches with any single character a.a
+
|}
will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’
 
- Used to specify a range of values in a character class [A-Z]
 
character in the range of A to Z
 

Revision as of 18:56, 22 May 2017

A regular expression is a notation for defining all the valid strings of a formal language.

Examples of Regular Expression Notation

Regular Expression Meaning
a Matches a string consisting of just the symbol a
b Matches a string consisting of just the symbol b
ab Matches a string consisting of the symbol a followed by the symbol b
a* Matches a string consisting of zero or more a’s
a+ Matches a string consisting of one or more a’s
abb? Matches the string ab or the string abb. The ? symbol indicates zero or one of the preceding element
a|b Matches a string consisting of the symbol a or the symbol b

Precedence Rules

When using regular expressions, the rules of arithmetic precedence are as follows:

+ and * are done first

Concatenation (ie joining elements together) is done next

| comes last

More Examples

Examples of regular expressions using the alphabet {a, b, c}

  • abc defines the language with only the string ‘abc’
  • abc | cba defines the language with two strings’ abc’ and ‘cba’
  • (a | b) c (a | b) gives four strings: ‘aca’, ‘acb’, ‘bca’, ‘bcb’
  • a+ gives an infinite number of strings: ‘a’, ‘aa’, ‘aaa’, etc
  • ab* gives an infinite number of strings: ‘a’, ‘ab’, ‘abb’, ‘abbb’, etc
  • (ab)* gives an infinite number of strings: ‘’, ‘ab’, ‘abab’, ‘ababab’, etc
  • (a | c)+ gives all possible strings of a and c (not including the empty string)

Regular expression meta-characters

Symbol Meaning Example
Used to separate alternatives a│b (Means a or b)
? Used to denote zero or one of the preceding element a? (0 or 1 as; matches with ‘’ & ‘a’)
* Used to denote zero or more of the preceding element a* (0 or more as; matches with ‘’, ‘a’, ‘aa’, etc.)
Used to denote one or more of the preceding element a+ (1 or more as; matches with ‘a’, ‘aa”’etc.)
( ) Used to group characters together, to indicate the scope of another operator (ab)* (Example 0 or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
[ ] Another way of denoting alternatives (instead of vertical bar). Defines a character class [ab] (means a or b)
\ The escape character (this turns the metacharacter into an ordinary character) a\* (the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.)
^ Used to indicate the negation of a character class. Also used to match the position before the first character in a string a[^bc] (a followed by a character that is not a b or c) ^abc will match with abc only if it is at the beginning of a string
$ Used to match with the position after the last character in a string abc$ (will match with abc only if it is at the end of a string)
. Matches with any single character a.a (will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’)
- Used to specify a range of values in a character class [A-Z] (character in the range of A to Z)