peerlibrary/xregexp

Name: xregexp

Owner: PeerLibrary

Description: Extended JavaScript regular expressions

Created: 2015-05-26 03:32:58.0

Updated: 2015-05-26 03:32:59.0

Pushed: 2014-10-14 08:55:28.0

Homepage: http://xregexp.com/

Size: 5410

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

?XRegExp 3.0.0-pre

XRegExp provides augmented and extensible JavaScript regular expressions. You get new syntax, flags, and methods beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping simpler and more powerful, while freeing you from worrying about pesky cross-browser inconsistencies and the dubious lastIndex property.

XRegExp supports all native ES5 regular expression syntax. It works with Internet Explorer 5.5+, Firefox 1.5+, Chrome, Safari 3+, and Opera 11+. You can also use it on the server with Node.js, or as a RequireJS module. The base library is about 3.8 KB, minified and gzipped.

See what's new in version 3.0.0-pre.

Performance

XRegExp regexes compile to native RegExp objects, and therefore perform just as fast as native regular expressions. There is a tiny extra cost when compiling a pattern for the first time.

Usage examples
sing named capture and flag x (free-spacing and line comments)
date = XRegExp('(?<year>  [0-9]{4} ) -?  # year  \n\
                (?<month> [0-9]{2} ) -?  # month \n\
                (?<day>   [0-9]{2} )     # day   ', 'x');

RegExp.exec gives you named backreferences on the match result
match = XRegExp.exec('2014-02-22', date);
h.year; // -> '2014'

t also includes optional pos and sticky arguments
pos = 3, result = [];
e (match = XRegExp.exec('<1><2><3><4>5<6>', /<(\d+)>/, pos, 'sticky')) {
result.push(match[1]);
pos = match.index + match[0].length;
 result -> ['2', '3', '4']

RegExp.replace allows named backreferences in replacements
Exp.replace('2014-02-22', date, '${month}/${day}/${year}'); // -> '02/22/2014'
Exp.replace('2014-02-22', date, function(match) {
return match.month + '/' + match.day + '/' + match.year;
// -> '02/22/2014'

n fact, XRegExps compile to RegExps and work perfectly with native methods
.test('2014-02-22'); // -> true

he *only* caveat is that named captures must be referenced using numbered backreferences
4-02-22'.replace(date, '$2/$3/$1'); // -> '02/22/2014'

f you want, you can extend native methods so you don't have to worry about this.
oing so also fixes numerous browser bugs in the native methods
Exp.install('natives');
4-02-22'.replace(date, '${month}/${day}/${year}'); // -> '02/22/2014'
4-02-22'.replace(date, function(match) {
return match.month + '/' + match.day + '/' + match.year;
// -> '02/22/2014'
.exec('2014-02-22').year; // -> '2014'

xtract every other digit from a string using XRegExp.forEach
Exp.forEach('1a2345', /\d/, function(match, i) {
if (i % 2) this.push(+match[0]);
]); // -> [2, 4]

et numbers within <b> tags using XRegExp.matchChain
Exp.matchChain('1 <b>2</b> 3 <b>4 a 56</b>', [
XRegExp('(?is)<b>.*?</b>'),
/\d+/
// -> ['2', '4', '56']

ou can also pass forward and return specific backreferences
html = '<a href="http://xregexp.com/">XRegExp</a>' +
       '<a href="http://www.google.com/">Google</a>';
Exp.matchChain(html, [
{regex: /<a href="([^"]+)">/i, backref: 1},
{regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
// -> ['xregexp.com', 'www.google.com']

erge strings and regexes into a single pattern, safely rewriting backreferences
Exp.union(['a+b*c', /(dog)\1/, /(cat)\1/], 'i');
> /a\+b\*c|(dog)\1|(cat)\2/i

These examples should give you the flavor of what's possible, but XRegExp has more syntax, flags, methods, options, and browser fixes that aren't shown here. You can even augment XRegExp's regular expression syntax with addons (see below) or write your own. See xregexp.com for more details.

Addons

You can either load addons individually, or bundle all addons together with XRegExp by loading xregexp-all.js. XRegExp's npm package uses xregexp-all.js, so addons are always available when XRegExp is installed using npm.

Unicode

In browsers, first include the Unicode Base script and then one or more of the addons for Unicode blocks, categories, properties, or scripts.

ipt src="src/xregexp.js"></script>
ipt src="src/addons/unicode/unicode-base.js"></script>
ipt src="src/addons/unicode/unicode-categories.js"></script>
ipt src="src/addons/unicode/unicode-scripts.js"></script>

Then you can do this:

est the Unicode category L (Letter)
unicodeWord = XRegExp('^\\pL+$');
odeWord.test('???????'); // -> true
odeWord.test('???'); // -> true
odeWord.test('???????'); // -> true

est some Unicode scripts
Exp('^\\p{Hiragana}+$').test('????'); // -> true
Exp('^[\\p{Latin}\\p{Common}]+$').test('Über Café.'); // -> true

By default, \p{?} and \P{?} support the Basic Multilingual Plane (i.e., code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. In XRegExp, this is called astral mode. You can implicitly apply astral mode for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{?} and \P{?} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.

sing flag A. The test string uses a surrogate pair to represent U+1F4A9
Exp('^\\pS$', 'A').test('\uD83D\uDCA9'); // -> true

mplicit flag A
Exp.install('astral');
Exp('^\\pS$').test('\uD83D\uDCA9'); // -> true

Opting in to astral mode disables the use of \p{?} and \P{?} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

XRegExp uses Unicode 7.0.0.

XRegExp.build

In browsers, first include the script:

ipt src="src/xregexp.js"></script>
ipt src="src/addons/build.js"></script>

You can then build regular expressions using named subpatterns, for readability and pattern reuse:

time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', {
hours: XRegExp.build('{{h12}} : | {{h24}}', {
    h12: /1[0-2]|0?[1-9]/,
    h24: /2[0-3]|[01][0-9]/
}),
minutes: /^[0-5][0-9]$/


.test('10:59'); // -> true
Exp.exec('10:59', time).minutes; // -> '59'

Named subpatterns can be provided as strings or regex objects. A leading ^ and trailing unescaped $ are stripped from subpatterns if both are present, which allows embedding independently-useful anchored patterns. {{?}} tokens can be quantified as a single unit. Any backreferences in the outer pattern or provided subpatterns are automatically renumbered to work correctly within the larger combined pattern. The syntax ({{name}}) works as shorthand for named capture via (?<name>{{name}}). Named subpatterns cannot be embedded within character classes.

See also: Creating Grammatical Regexes Using XRegExp.build.

XRegExp.matchRecursive

In browsers, first include the script:

ipt src="src/xregexp.js"></script>
ipt src="src/addons/matchrecursive.js"></script>

You can then match recursive constructs using XRegExp pattern strings as left and right delimiters:

str = '(t((e))s)t()(ing)';
Exp.matchRecursive(str, '\\(', '\\)', 'g');
> ['t((e))s', '', 'ing']

xtended information mode with valueNames
= 'Here is <div> <div>an</div></div> example';
Exp.matchRecursive(str, '<div\\s*>', '</div>', 'gi', {
valueNames: ['between', 'left', 'match', 'right']

> [
e: 'between', value: 'Here is ',       start: 0,  end: 8},
e: 'left',    value: '<div>',          start: 8,  end: 13},
e: 'match',   value: ' <div>an</div>', start: 13, end: 27},
e: 'right',   value: '</div>',         start: 27, end: 33},
e: 'between', value: ' example',       start: 33, end: 41}


mitting unneeded parts with null valueNames, and using escapeChar
= '...{1}\\{{function(x,y){return y+x;}}';
Exp.matchRecursive(str, '{', '}', 'g', {
valueNames: ['literal', null, 'value', null],
escapeChar: '\\'

> [
e: 'literal', value: '...', start: 0, end: 3},
e: 'value',   value: '1',   start: 4, end: 5},
e: 'literal', value: '\\{', start: 6, end: 8},
e: 'value',   value: 'function(x,y){return y+x;}', start: 9, end: 35}


ticky mode via flag y
= '<1><<<2>>><3>4<5>';
Exp.matchRecursive(str, '<', '>', 'gy');
> ['1', '<<2>>', '3']

XRegExp.matchRecursive throws an error if it scans past an unbalanced delimiter in the target string.

Installation and usage

In browsers:

ipt src="src/xregexp.js"></script>

Or, to bundle XRegExp with all of its addons:

ipt src="xregexp-all.js"></script>

Using npm:

install xregexp

In Node.js:

XRegExp = require('xregexp'); // Requires XRegExp 3.0

The CommonJS-style require('xregexp').XRegExp also works, and is the only method supported by XRegExp 2.0.

In an AMD loader like RequireJS:

ire({paths: {xregexp: 'xregexp-all'}}, ['xregexp'], function(XRegExp) {
console.log(XRegExp.version);

Changelog
About

XRegExp copyright 2007-2014 by Steven Levithan.

Tools: Unicode range generators by Mathias Bynens, and adapted from his unicode-data project. Source file concatenator by Bjarke Walling.

Tests: Uses Jasmine for unit tests, and Benchmark.js for performance tests.

Prior art: XRegExp.build inspired by Lea Verou's RegExp.create. XRegExp.union inspired by Ruby. XRegExp's syntax extensions and flags come from Perl, .NET, etc.

All code, including addons, tools, and tests, is released under the terms of the MIT License.

Fork me to show support, fix, and extend.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.