![]() | The contents of the Mish (function) page were merged into Rectifier (neural networks) on 29 July 2022. For the contribution history and old versions of the redirected page, please see its history; for the discussion at that location, see its talk page. |
This is the talk page for discussing improvements to the Rectifier (neural networks) article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
![]() | This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
![]() | Daily pageviews of this article
A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org |
In the "Advantages" section, the article states that "Rectified linear units, compared to sigmoid function or similar activation functions, allow faster and effective training of deep neural architectures on large and complex datasets.", which is quite a bold claim, and there is no justification. I don't necessarily think this is incorrect, in fact, I think it is correct, but I think evidence would be very helpful. — Preceding unsigned comment added by Gsmith14 (talk • contribs) 00:55, 8 August 2022 (UTC)
Use of the softplus function as an approximator of the rectifier function is not warranted by any of the current four references. Note also that the softplus is a fairly bad approximation for values roughly between -2 and 2, while in general we can expect such small values. I propose to remove the section about the softplus. Angelorf (talk) 09:41, 2 July 2013 (UTC)
This article was marked as a stub in 2012. It seems to have enough information to no longer be considered a stub. I propose removing the "stub" marking at the end of the article.Cajunbill (talk) 07:53, 20 March 2015 (UTC)
I read that the function is not differentiable at 0 which confused me as I was looking at the image. Then I read the actual function max(0, x) and realized that the image is flawed. Please upload a non-flawed image. — Preceding unsigned comment added by 24.4.21.209 (talk) 00:17, 24 August 2015 (UTC)
@User:Ita140188 why did you revert my edit? I just changed one word to clearify this misleading sentence:
"Non-differentiable at zero; however it is differentiable anywhere else, and a
valueslope of 0 or 1 can be chosen arbitrarily to fill the point where the input is 0."
This sentence is wrong. You cannot choose the value because it's defined by the function. The value at must always zero, because this results from the definition
Whats wrong with my edit? Some explanation would be nice. --2003:CB:770E:3983:F88F:E5D4:E07C:7E39 (talk) —Preceding undated comment added 14:14, 2 April 2019 (UTC)
The ReLU can be viewed as a switch rather than an activation function. Then for a particular input to the neural network each switch is in a particular state. Thrown or not thrown and weighted sums connect together (or not) in certain ways. A weighed sum of weighed sums is still a linear system. Hence there is a particular linear projection from the input to the output for a particular input and a within neighborhood around that input such that no switch changes state. A ReLU neural network then is a system of switched linear projections. Since ReLU switches at zero there is no sudden discontinuity in the output as the input changes gradually, despite switches being definitely thrown on or off. For a particular output neuron and a particular input there is a linear composite of weighted sums giving the output value. Those multiple weighted sums can be combined into a single weighted sum. You could then go and see what single weighed sum was looking at in the input. And there are a number of metrics you look at such as the angle between the input vector and the weight vector of the single weighed sum. S. O'Connor — Preceding unsigned comment added by 113.190.132.240 (talk) 16:20, 25 September 2019 (UTC) Further information and examples: https://ai462qqq.blogspot.com/2019/11/artificial-neural-networks.html For example the dot products being switched need not be simple weighted sums, they could derive from fast transforms like the FFT. S. O'Connor — Preceding unsigned comment added by 14.162.218.184 (talk) 08:42, 8 April 2020 (UTC)
Currently the article focuses on the publication of Hahnloser as the earliest usage of the ReLU, but should we also mention Fukushima, who already used this activation function in the "Neocognitron"-network about twenty years earlier? (see eq (2) https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf and also the discussion in https://stats.stackexchange.com/questions/447674) --Feudiable (talk) 14:38, 29 May 2020 (UTC)
In a field of real time rendering of signed distance field (SDF)s, there is the concept of signed max and signed min. It is an operator, or actually just a function that smoothly blends two values ensuring both smooth transition between them, and continues first derivative, it is usually defined as . The is a smoothing factor. It has a property of , , (uniformly for all and ), and equal left and right derivatives at the point of . Similar can be made for . A RELU can be easily implemented efficiently using function, as . However, this might not be suitable for practical uses, because this function is 0, and has all derivatives 0, for x < -k. So many solvers that use derivatives will not work too well. 81.6.34.172 (talk) 17:05, 31 May 2020 (UTC)
I've removed the following bullet, which was listed as an "advantage" of ReLU:
An unbounded function is certainly not biologically plausible (the firing rate of a neuron has an upper limit). And the comment about tanh was marked as a non sequitur by someone last October. So it looks useful to delete this bullet.
I removed the following text from the article:
I originally had replaced it with this text, but then I chose to remove the paragraph entirely:
I'm not sure what the author of the original text meant by a "unit." When I read their reference, I could not discern whether it meant an activation function or a connection function (such as a fully-connected or convolutional function) followed by an activation function. Does anyone have expertise on this? Thank you! --Yoderj (talk) 18:32, 8 April 2021 (UTC)
References
((cite conference))
: Cite uses deprecated parameter |authors=
(help)
brownlee
was invoked but never defined (see the help page).medium-relu
was invoked but never defined (see the help page).it would be nice to have an image showing a graph of each function inline with the equations. --157.131.95.172 (talk) 21:25, 20 July 2021 (UTC)
"Sparse activation: For example, in a randomly initialized network, only about 50% of hidden units are activated (have a non-zero output)."
Isn't this a disadvantage? Half the neurons are wasted and computed for no reason, and contribute nothing to the output, making the model less accurate? — Omegatron (talk) 18:59, 25 September 2023 (UTC)
The following function could work also:
You could see its basic properties here
Maybe someone more experienced in ReLu could add it. 45.181.122.234 (talk) 22:41, 26 March 2024 (UTC)
The redirect Rectifier (neural networks has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 April 9 § Rectifier (neural networks until a consensus is reached. Utopes (talk / cont) 01:49, 9 April 2024 (UTC)