WikiSyntaxTextModSyntax polishing → Step 4

Links

With the fourth step of syntax polishing all links are processed. Possible links are detected by [ and afterwards by :// string search.

One goal is to adapt link targets, another aim is formatting of links in a common and readable manner, which can be detected easily by other scripts and bots.

[edit]

If not explicitly mentioned, in this section the term “bracket” means square brackets [[]].

Syntax correction

[edit]
[edit]

Sometimes an external URL is used, like

[http://en.wikipedia.org/wiki/Main_page

as well as [https: and protocol relative URL.

This is turned into wikilink format if possible.

Links by URL do not appear on WhatLinksHere and GlobalUsage.

[edit]

If directly before or after a wikilink target a (usually invisible) bidi character is present it will be discarded. Thie does not affect the functionality. On link or an old fashioned interlanguage into arabic language wikipedia the link target begins with :ar: snd is not affected anyway.

Wikipedia in other languages and major sister projects

[edit]

Correct external links like

[http://de.wikipedia.org/wiki/Schur%E2%80%93Zassenhaus-Theorem

are not enclosed in <ref> or moved as external link into other sections by this script.

Not only Wikipedia, but also other major sister projects (with a shortcut) linked by URL are detected and transformed into wikilink format.

It is a unique format used with a shortcut p (1 letter or wikt or meta):[1][2]

A leading colon ahead of project identifier is used by some authors but redundant and will be discarded.

The inverted order :lang:p:Lemma is quite rare and will be brought into usual sequence despite it works both ways.[3]

[edit]

This means something like

Gem%C3%A4ldegalerie_%28Berlin%29#Die_Gem.C3.A4ldegalerie_in_Dahlem

This brewage in URL-Escape/UTF-8 is made more pleasant.

As generally known this is born if authors copy the URL of the target page into wikilink. Underscores are replaced by spaces. Escape sequences are identified and replaced by UCS characters.

[edit]

This means a wikilink targetting to the current page (self):

[[self]]

will be unlinked, a differing link title

[[self|Alter Ego]]

shall become

Alter Ego

Often as

[[self#section|

to be replaced by

[[#section|

Within a includeonly or onlyinclude region link on itself is permitted and required and kept.

[edit]

Titled wikilinks to other pages like

[[pointing device|pointing devices]]

are simplified as

[[pointing device]]s

The same rules implemented in the parser are applied here avoiding changed appearance.

This goes especially for

Sometimes for the human reader the coinciding target word splits the matching link title at strange positions not expected for syllabification.

For titled links the resulting clickable (blue) part shall be the same as the bracketed title, merging

[[Component (software)|component]]s

into

[[Component (software)|components]]

Pipe trick

[edit]

In the first days of wikipedia the pipe trick has been invented: If a link target contains an expression in round parentheses () or a comma, the part before will be displayed as link title if an empty link title is given: The pipe symbol is followed by closing backets |]] immediately.

This was supposed to reduce typing. However, only a few authors are familiar with this notation, and the small pipe symbol might be overlooked easily. This script evaluates the construct by the same rules as the parser does and inserts the resulting and displayed link target explicitly.

It is less known even to authors swearing on the abbreviated format that the pipe trick does not work within “tag extensions” like <ref> or <gallery> (and other delicacies won’t work there either). In this case the explicit title is producing the intended behaviour the first time.

Formatting

[edit]

One of the general rules later text search may rely on:

[edit]

For recognition of URL only the following protocols are used: http https ftp git mms svn and protocol relative [//. Other schemes are permitted in wikitext but quite rare.

If not explicitly mentioned, in this section the term “bracket” means square brackets [].

[edit]
[edit]

Two of the general rules later text search may rely on:

URL formatting

[edit]
[edit]

For weblinks with brackets related to wiki projects the following action is taken:

On WMF URL without brackets which might be formatted as wikilink nothing is changed, but a warning will be issued.

[edit]

User defined modifications of wikilink, URL, or the adhering text segments are applied immediately to any detected link target.

If it is needed the link target will be protected against textual modification.

Remarks

[edit]
  1. ^ A longer project name is replaced by the common shortcut.
    Instead of [[wikisource:lang:Title something like
    ''[[s:lang:Title|Title]]'' for the *** language [[Wikisource]]
    etc. should be written to show any reader clearly into which language a link will lead.
  2. ^ Both m: and meta: are possible, but meta: is used for easier readability.
  3. ^ See also recommendations at meta:Help:Interwiki linking #Prefixes.

[ German page ]