In the regularization,why we use θ^2 rather than θ?

C.J

enter image description here

The regularization is lambda*sum(θ^2)

bakkal

I've already answered this in your previous question (see last paragraph), but I'll try again.

The problem regularizing with sum(θ) is that you may have θ parameters that cancel each other

Example:

θ_1 = +1000000
θ_2 = -1000001

The sum(θ) here is +1000000 -1000001 = -1 which is small

The sum(θ²) is 1000000² + (-1000001)² which is very big.

If you use sum(θ) you may end up without regularization (which was the goal) because of large θ values that escaped the regularization because the terms cancel each other out.

You may use sum(|θ|) depending on your search/optimisation algorithm. But I know θ² (L2 norm) to be popular and works well with gradient descent.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Why should we use classes rather than records, or vice versa?

From Dev

Why do some CLIs use ` and ' rather than ‛ and ’?

From Dev

Why use TimeSpan.CompareTo() rather than < > or =

From Dev

Why use a function rather than a reference to member?

From Dev

Why use ImageIcon rather than Image?

From Dev

Why to use ByteArrayInputStream rather than byte[] in Java

From Dev

Why use "nohup &" rather than "exec &"

From Dev

Why do some CLIs use ` and ' rather than ‛ and ’?

From Dev

Why use Python format rather than slicing?

From Dev

Why use regex finditer() rather than findall()

From Java

Angular 6 - Why use @ngrx/store rather than service injection

From Dev

Why do STL numeric algorithms use 'op' rather than 'op='?

From Dev

Why use address of first element of struct, rather than struct itself?

From Dev

Why does Clojure Spels use quoted symbols, rather than keywords?

From Java

Why do CDNs use different domain names rather than subdomains?

From Java

Why should I use a pointer rather than the object itself?

From Dev

Why does Apple use assign rather than weak to store a delegate?

From Dev

Why does `libpq` use polling rather than notification for data fetch?

From Dev

Why do Microsoft use 14.667 rather than 14 as the ControlContentThemeFontSize?

From Dev

Why do *nix folk like to use ``...'' rather than the " character?

From Dev

Why is 'use strict' usually after an IIFE (rather than at the top of a script)?

From Dev

Why does Prolog use =< Rather than <= like most languages?

From Dev

Why does Go use ^ rather than ~ for unary bitwise-not?

From Dev

Why should I use simple arrays rather than container classes?

From Dev

Why does GWT use code generators rather than annotation processors?

From Dev

Why use Spring Boot rather than Spring Boot-less?

From Dev

why use invoke helper rather than just call functor?

From Dev

Why use ContextImpl to implement Context rather than ContextWrapper in Android?

From Dev

Why use a Messenger rather than passing the reference to a Handler?

Related Related

  1. 1

    Why should we use classes rather than records, or vice versa?

  2. 2

    Why do some CLIs use ` and ' rather than ‛ and ’?

  3. 3

    Why use TimeSpan.CompareTo() rather than < > or =

  4. 4

    Why use a function rather than a reference to member?

  5. 5

    Why use ImageIcon rather than Image?

  6. 6

    Why to use ByteArrayInputStream rather than byte[] in Java

  7. 7

    Why use "nohup &" rather than "exec &"

  8. 8

    Why do some CLIs use ` and ' rather than ‛ and ’?

  9. 9

    Why use Python format rather than slicing?

  10. 10

    Why use regex finditer() rather than findall()

  11. 11

    Angular 6 - Why use @ngrx/store rather than service injection

  12. 12

    Why do STL numeric algorithms use 'op' rather than 'op='?

  13. 13

    Why use address of first element of struct, rather than struct itself?

  14. 14

    Why does Clojure Spels use quoted symbols, rather than keywords?

  15. 15

    Why do CDNs use different domain names rather than subdomains?

  16. 16

    Why should I use a pointer rather than the object itself?

  17. 17

    Why does Apple use assign rather than weak to store a delegate?

  18. 18

    Why does `libpq` use polling rather than notification for data fetch?

  19. 19

    Why do Microsoft use 14.667 rather than 14 as the ControlContentThemeFontSize?

  20. 20

    Why do *nix folk like to use ``...'' rather than the " character?

  21. 21

    Why is 'use strict' usually after an IIFE (rather than at the top of a script)?

  22. 22

    Why does Prolog use =< Rather than <= like most languages?

  23. 23

    Why does Go use ^ rather than ~ for unary bitwise-not?

  24. 24

    Why should I use simple arrays rather than container classes?

  25. 25

    Why does GWT use code generators rather than annotation processors?

  26. 26

    Why use Spring Boot rather than Spring Boot-less?

  27. 27

    why use invoke helper rather than just call functor?

  28. 28

    Why use ContextImpl to implement Context rather than ContextWrapper in Android?

  29. 29

    Why use a Messenger rather than passing the reference to a Handler?

HotTag

Archive