This is a documentation subpage for Template:Regex. It may contain usage information, categories and other content that is not part of the original template page. |
This template uses Lua: |
This template helps field the details in the wikitext of any page on the wiki. Normally searches ignore non-alphanumeric characters, but regular expressions (regex) accept all characters, plus metacharacters.
This template acts as a doorway by helping to develop a database query before running it on the wiki, and it does this by way of a search link that can also be used to share such discoveries. This template can also be used to learn the regular expression syntax of this version of Cirrus Search. You could use a bare ((search link)) to do all this, but this template saves a lot of typing (see below), so you only need to focus on entering a regexp.
An important alternative to using this template is performing a search directly with insource:"quotes-delimited arguments". These find wikitext without resorting to the regex searches this template does with insource:/slash-delimited arguments/, (which is a common syntax for regex searches). See § About CirrusSearch below for a better understanding of when this template is not needed. See below for other search tools.
Regular expressions are little computer programs, so it is characteristic of regex searches that they must always be tested to achieve their potential precision and thoroughness. But only a few of these intensive searches are technically able to run at a time against the database. This template minimizes your footprint, and guarantees that you will never run an untested regexp on every namespace in the wiki, even if your default search would let you do that. Use of this template enables the smallest possible footprint by using filters to limit the search domain. The first domain it targets is its own page in an ad hoc sandbox. Once your regexp pattern is honed, you add a search domain, by setting |prefix=
.
Further information: Help:Parameters |
|pattern= or (({1))} |
a regexp search pattern. Pattern is also the first positional parameter. |
|prefix= or (({2))} |
search domain. Prefix accepts a namespace number, or n for the current namespace, or : for mainspace, plus it has the usual prefix: meaning. Defaults to its current page (fullpagename) if a pattern is given alone.
|
|label= or (({3))} |
search link label. Label is also a positional parameter. |
Decide whether you really need a thoroughly precise regexp search, or whether you can find the general wikitext of interest with a plain insource: filter. Examples of the plain insource: search are in § Parameters hastemplate and insource. In those cases, ((search link)) is sufficient, and sandboxing is not being suggested.
Namespace plus pagename equals fullpagename.
The procedure here is an iterative, read-evaluate-modify cycle.
|pattern=
. Prefix will be added later.|prefix=
. Start with a namespace. At the complete query trim results via the first letter(s) of pagenames tacked onto the namespace's automatically-given colon.Step 6 is the core provision of this template. Caveat emptor: if you change the target, you'll have to re-save it to the database. If you target it again immediately, you'll want to purge that target. You don't have to ever purge if you just change |pattern=
. Note that you can target any single page using prefix:.
Regular expressions are little computer programs, so it is characteristic of regex searches that they must be written while studying the target data, and tested to achieve their potential precision and thoroughness. However, only a few of these intensive searches are technically able to run at a time against the database.[1] A sandbox minimizes your footprint, and guarantees that you will never run an untested regexp on every namespace in the wiki, even if your default search would let you do that.
Although a normal search targeting the entire wiki will run quickly, a regexp search should target as few pages as possible by using filters in order to run quickly. A filter is part or whole of a database query. Filters include:
Order is not important because the search is optimized by the software before it is run.
To target just one page while experimenting with or developing a regex search, target a fullpagename. From the search box use the filter prefix:fullpagename. From the edit box (of any section of the page with the target data), you can always just write prefix:((FULLPAGENAME)) and it will "expand" for you to the fullpagename. Although you can edit a history page, technically a "history page" is not a page (in the database), and so ((FULLPAGENAME)) there will point to the database version (not its own rendering). For the same reason, you cannot search for the wikitext on a page that is not already saved (to the database), although you can certainly change the search parameters again and again with no need to save them.
Fullpagename is namespace:pagename. Knowing this you can adjust your Prefix parameter. Although prefix can filter down to one page, it can filter up to a namespace, and it also accepts the beginning letter(s) of set of pagenames if you want to reduce the namespace search domain.
Regex sandboxing uses an ad hoc sandbox made by editing any page containing the target data, and using it as a "sandbox" (not editing it to save it). It then develops by using adding a search link that includes insource:/regexp/, with the filter prefix:((FULLPAGENAME)) alongside.
Use of a sandbox enables the smallest possible footprint by using filters to limit the search domain. Once your regexp pattern is honed, you increase the search domain. A regex search is best run with filters, not alone even if it is a polished rexexp.
Rather than use the search box, where entering an equals sign and a pipe character, and "quotes around phrases" is a straightforward matter, it is still easiest to use a regex-based search-link template — ((regex)) or ((tlusage)) — on the page with sample data, because then you can focus on the target data there and on writing the regexp pattern. It is easier, that is, if you already understand how templates "escape" the pipe character and the equals sign. See Help:Template#Parameters for other important details.
The procedure here is an iterative, read-evaluate-modify cycle. Regex development requires that you study the target data while writing and rewriting its pattern.
Caveat emptor: if you change the target for an immediate retesting, you'll have to save and purge, but not if you just change the regexp.
As an ad hoc sandbox, you can show the wikitext of a section like this, (already saved in the database), modify some of the patterns in the regex-search-link template calls on this page, do a Show Preview, and see what matches when you click on the newly formed regex search-link, all quite safely, and without changing a thing in the database.
The template calls that produce "1 ft/s, 2 sq ft, 3 m/s, 4 m*s-2, 5 ft.s-2, 6 °C/J, and 7 J/C" appear in the wikitext of this section like this:
Note how the above targets are |numbered|, then click on the links below.
Query | Search link | Answer |
---|---|---|
Q1 Using ((search link)), does this page employ template Val ? | ((sl|hastemplate: Val)) → hastemplate: Val
|
A. No, because this pagename is in Help not Article space.(Search link default). 1300 search results. |
Q2 Using ((search link)) responsibly, does this page use Val's fmt parameter? | ((sl|insource:/\{[Vv]al\((!))[^}]*fmt/ prefix:((FULLPAGENAME)))) →
|
A2.1. Look for 1 and 3 in the search results in bold text. (Adds an appropriate filter.) |
Using ((regex)) instead... | ((slre|\{[Vv]al\((!))[^}]*fmt)) →
|
A2.2 Less typing than ((search link)). |
Using ((template usage)) instead... | ((tlre|Val|pattern=fmt)) →
|
A2.3 Easiest for templates. |
Q3. Who uses u=ft OR ul=ft? (one-letter differs) | ((regex|ul?=ft)) →
|
A. Look for 1, 2, and 5 in bold text. |
Using ((template usage))... | ((tlre|val|pattern = ul?=ft)) →
|
Finds same pattern, but only inside a Val template. |
Q4. AND of these, who also uses fmt=commas after that? | ((slre|ul?=ft.*commas)) →
|
A. No context shown, but article title is shown. A half a Bug? |
Who has one space before the word "commas"? | ((slre|. commas)) → insource:/. commas/ prefix:Template:Regex/doc
|
A. 1 but not 2. |
Q5. Who uses either u or ul with "ft" OR uses "fmt=commas". | ((slre|(ul? *= *ft((!))fmt *= *commas)))
|
A. 1, 2, 3, and 5. (The pattern matches all possible spacing.) |
Q6. Who uses ft or m, in |u= or |ul= ?
|
((slre|ul? *((=)) *(ft((!))m)))
|
A. 1, 2, 3, 4, and 5.
Used ((!)) for the alternation metacharacter. Used ((=)). (Could have used named |
Q7. Who uses . or * in the unit code? | ((tlre|val|pattern = u *= *(\.((!))\*)/))
|
A. 4 and 5. |
Who uses a pipe? | ((regex|\|)) → insource:/\/ prefix:Template:Regex/doc
|
All of them |
Q8. Who uses / or - within the |u= or |ul= paramter?
|
((tlre|val|ul? *= *[^((!))}]+(\/((!))-)))
|
A. 1,3,4,5,6 and 7. |
Q9. Where is Val used in the template namespace for numbers only, (no u, ul, up, or upl parameters). | ((tlre|val|pattern = ~(u[lp].)|prefix = 10))
→ hastemplate:"val" insource:/\{\{ *[Vv]al *\|[^}]*~(u[lp].)/ prefix:Template: |
A. In the 30 or so templates listed. |
Q10. Which articles use ((Convert))'s and(-) option? | ((tlre|convert|pattern=and\(-\)| prefix=0))
→ hastemplate:"convert" insource:/\{\{ *[Cc]onvert *\|[^}]*and\(-\)/ prefix:: |
A Coast Range Arc and Skipjack shad |
In Q2, notice how the MediaWiki software ignores the spaces around parameters, but how in Q4 the same MediaWiki software processes the spaces inside parameters. Q2 might have been solved with a plain insource:val fmt search because "fmt" and "val" are whole words, and fmt is rarely seen apart from inside Val. How about hastemplate:val insource:fmt?