有什么方法可以使网络爬虫/机器人清楚文章或部分中包含的内容与该文章无关吗?
<article>
<section>
<div>
<span>Amy Neville</span>
<img src="http://www.example.com/amy.png">
<span>Joined <time>5 Days</time> ago</span>
<span>41525 Points</span>
</div>
<p>Mary, the only surviving legitimate child of King James V of Scotland, was six days old when her father died and she acceded to the throne. She spent most of her childhood in France while Scotland was ruled by regents, and in 1558, she married the Dauphin of France, Francis. He ascended the French throne as King Francis II in 1559, and Mary briefly became queen consort of France, until his death in December 1560.</p>
</section>
</article>
在上面的示例中,我有一个论坛帖子。它的旁边是<div>
与发布者有关的一些不相关信息。不相关,但可能与实际文章内容混淆。
是否有任何标记或属性可以使这一点变得清楚?
一般来说,如果一个切片元素包含的信息是完全不相干的该部分的内容,你可以得到最接近的是一个<aside>
元素。
有关文章发表者的信息与该文章相关,因为它描述了该文章的作者。它不构成本文内容的一部分,但是仍然是相关的。
话虽如此,您可以使用<header>
或<footer>
标记sectioning元素内的作者信息。您甚至可以<footer>
在本节的开头添加-可能看起来很奇怪,但是完全没问题(请参阅描述<article>
element的规范)。
<article>
<section>
<footer>
<span>Amy Neville</span>
<img src="http://www.example.com/amy.png">
<span>Joined <time>5 Days</time> ago</span>
<span>41525 Points</span>
</footer>
<p>Mary, the only surviving legitimate child of King James V of Scotland, was six days old when her father died and she acceded to the throne. She spent most of her childhood in France while Scotland was ruled by regents, and in 1558, she married the Dauphin of France, Francis. He ascended the French throne as King Francis II in 1559, and Mary briefly became queen consort of France, until his death in December 1560.</p>
</section>
</article>
除了元素之外,没有用于标记作者信息的专用元素<address>
,但是<address>
用于联系信息。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句