How can you get the canonical URL for a web page (Rails)?

Peter Nixey

I need to store a distinct URL for an external webpage

I need to put the URL into the database. I don't want to store the same page twice so I need to strip all fluff off the URL.

# if I have
url_1 = "http://scientificamerican.com/royal-baby/?utm_campaign=promo"

# and
url_2 = "http://scientificamerican.com/royal-baby/?utm_source=email"

# then they should map to:
url_canonical = "http://scientificamerican.com/royal-baby/"

...it's not as simple as just stripping query parameters though

In order to get a single canonical URL regardless of what was on it I tried stripping the query string. The problem is that there are still CMSs which use the query string.

e.g.

url_1 = "https://www.scientificamerican.com/article.cfm?id=obama-budget"

# strip the query string and it becomes
url_1 = "https://www.scientificamerican.com/article.cfm"

# which is obviously the same for all articles :(

Is there any Rails tool for getting a page's canonical URL?

This is obviously a problem that a number of people have had to solve, not least the search engines. How do you reduce the URL down such that all that remains is the data for the page?

Philip Hallstrom

できません。URLを区別するために必要なクエリパラメータを知る方法はありません。意図的に削除できるパラメータは明らかにたくさんありますが(つまり、utm_campaignなど)、すべてではありません。

最善の策は、ページのHTMLをロードして、正規リンク要素を探すことです。それが存在する場合は、正規URLを取得しています。

http://en.wikipedia.org/wiki/Canonical_link_element

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How to get current page URL in MVC 3

分類Dev

How to get WooCommerce terms and conditions page URL

分類Dev

How can i get this page's content?

分類Dev

How can I get web server information about a page I'm viewing (e.g. Apache or IIS, Windows or UNIX)?

分類Dev

How to recursively download a web page and its linked content from a URL?

分類Dev

Joomla Canonical Url Wrong

分類Dev

Rails: How can I write a path to the current page?

分類Dev

How do you maintain the page state, so that you can provide permalinks using emberjs?

分類Dev

Rails, show page url id

分類Dev

How to fetch a web page from a web site with URL using the web browser?

分類Dev

How can Apache Wicket be used to make a single page web app?

分類Dev

How to get mysql insert percentage from web page?

分類Dev

How to get Pdf.js working on a web page?

分類Dev

How to get specific data from mysql to a php web page?

分類Dev

How can you get the most recent business day in Python?

分類Dev

How to get the full URL including hash in Rails/Ruby

分類Dev

How can I download an array of url images from the web

分類Dev

How can I access the client URL in a Web Worker?

分類Dev

How to redirect the route with some id to human friendly url and handle canonical url in angular routes?

分類Dev

How can I get rid of the gap at the edge of the page?

分類Dev

How can I get back to previous page with passing params?

分類Dev

How can I evaluate a page retrieved by an url form a parent page in puppeteer`s evaluate function

分類Dev

Can you use javascript in a page action popup?

分類Dev

How to make Ruby's sleep method delay printing on a Rails web page?

分類Dev

How can I get a list of available methods in a WebAPI web service?

分類Dev

Can you assign a name to a Web Worker?

分類Dev

How can I get a route url pattern by route name in laravel?

分類Dev

Angularjs - How can I get parameters from URL

分類Dev

How can i get the url that raised the AssociationUriMapper in WP8?

Related 関連記事

  1. 1

    How to get current page URL in MVC 3

  2. 2

    How to get WooCommerce terms and conditions page URL

  3. 3

    How can i get this page's content?

  4. 4

    How can I get web server information about a page I'm viewing (e.g. Apache or IIS, Windows or UNIX)?

  5. 5

    How to recursively download a web page and its linked content from a URL?

  6. 6

    Joomla Canonical Url Wrong

  7. 7

    Rails: How can I write a path to the current page?

  8. 8

    How do you maintain the page state, so that you can provide permalinks using emberjs?

  9. 9

    Rails, show page url id

  10. 10

    How to fetch a web page from a web site with URL using the web browser?

  11. 11

    How can Apache Wicket be used to make a single page web app?

  12. 12

    How to get mysql insert percentage from web page?

  13. 13

    How to get Pdf.js working on a web page?

  14. 14

    How to get specific data from mysql to a php web page?

  15. 15

    How can you get the most recent business day in Python?

  16. 16

    How to get the full URL including hash in Rails/Ruby

  17. 17

    How can I download an array of url images from the web

  18. 18

    How can I access the client URL in a Web Worker?

  19. 19

    How to redirect the route with some id to human friendly url and handle canonical url in angular routes?

  20. 20

    How can I get rid of the gap at the edge of the page?

  21. 21

    How can I get back to previous page with passing params?

  22. 22

    How can I evaluate a page retrieved by an url form a parent page in puppeteer`s evaluate function

  23. 23

    Can you use javascript in a page action popup?

  24. 24

    How to make Ruby's sleep method delay printing on a Rails web page?

  25. 25

    How can I get a list of available methods in a WebAPI web service?

  26. 26

    Can you assign a name to a Web Worker?

  27. 27

    How can I get a route url pattern by route name in laravel?

  28. 28

    Angularjs - How can I get parameters from URL

  29. 29

    How can i get the url that raised the AssociationUriMapper in WP8?

ホットタグ

アーカイブ