crawler4j打印大量系统输出

debugcn 发表于 Dev

用户名

我开始使用Crawler4j，并在BasicCrawler示例中玩了一段时间。我删除了BasicCrawler.visit（）方法的所有输出。然后，我添加了一些已经进行过的网址处理。现在启动程序时，它突然打印出大量我实际上不需要的内部处理信息。见下面的例子

Auth cache not set in the context
Target auth state: UNCHALLENGED
Proxy auth state: UNCHALLENGED
Attempt 1 to execute request
Sending request: GET /section.aspx?cat=7 HTTP/1.1
"GET /section.aspx?cat=7 HTTP/1.1[\r][\n]"
>> "Accept-Encoding: gzip[\r][\n]"
>> "Host: www.dailytech.com[\r][\n]"
>> "Connection: Keep-Alive[\r][\n]"
>> "User-Agent: crawler4j (http://code.google.com/p/crawler4j/)[\r][\n]"
>> "Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM;     MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f[\r][\n]"
>> "[\r][\n]"
>> GET /section.aspx?cat=7 HTTP/1.1
>> Accept-Encoding: gzip
>> Host: www.dailytech.com
>> Connection: Keep-Alive
>> User-Agent: crawler4j (http://code.google.com/p/crawler4j/)
>> Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM;     MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f
<< "HTTP/1.1 200 OK[\r][\n]"
<< "Cache-Control: private[\r][\n]"
<< "Content-Type: text/html; charset=utf-8[\r][\n]"
<< "Content-Encoding: gzip[\r][\n]"
<< "Vary: Accept-Encoding[\r][\n]"
<< "Server: Microsoft-IIS/7.5[\r][\n]"
<< "X-AspNet-Version: 4.0.30319[\r][\n]"
<< "Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com; expires=Tue,     20-Nov-2018 11:16:54 GMT; path=/[\r][\n]"
<< "Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/[\r][\n]"
<< "X-UA-Compatible: IE=EmulateIE7[\r][\n]"
<< "Date: Wed, 20 Nov 2013 11:16:54 GMT[\r][\n]"
<< "Content-Length: 8235[\r][\n]"
<< "[\r][\n]"
Receiving response: HTTP/1.1 200 OK
<< HTTP/1.1 200 OK
<< Cache-Control: private
<< Content-Type: text/html; charset=utf-8
<< Content-Encoding: gzip
<< Vary: Accept-Encoding
<< Server: Microsoft-IIS/7.5
<< X-AspNet-Version: 4.0.30319
<< Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com;
expires=Tue,20-Nov-2018 11:16:54 GMT; path=/
<< Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/
<< X-UA-Compatible: IE=EmulateIE7
<< Date: Wed, 20 Nov 2013 11:16:54 GMT
<< Content-Length: 8235
Cookie accepted: "[version: 0][name: DTLASTVISITED][value: 11/20/2013 6:16:5
AM][domain:dailytech.com][path: /][expiry: Tue Nov 20 12:16:54 CET 2018]".
Cookie accepted: "[version: 0][name: DTLASTVISITEDSYS][value: 11/20/2013 6:16:48
AM][domain: dailytech.com][path: /][expiry: null]". 
Connection can be kept alive indefinitely
<< "[0x1f]"
<< "[0x8b]"
<< "[0x8]"
<< "[0x0]"
<< "[0x0][0x0][0x0][0x0][0x4][0x0]"
<< "[0xed][0xbd][0x7]`[0x1c]I[0x96]%&/m[0xca]{J[0xf5]J[0xd7][0xe0]t[0xa1]
[0x8][0x80]`[0x13]$[0xd8][0x90]@[0x10][0xec][0xc1][0x88][0xcd][0xe6][0x92][0xec]
[0x1d]iG#)[0xab]*[0x81][0xca]eVe]f[0x16]@[0xcc][0xed][0x9d][0xbc][0xf7][0xde]{[0xef]
[0xbd][0xf7][0xde]{[0xef][0xbd][0xf7][0xba];[0x9d]N'[0xf7][0xdf][0xff]?\fd[0x1]l[0xf6]
[0xce]J[0xda][0xc9][0x9e]![0x80][0xaa][0xc8][0x1f]?~|[0x1f]?"~[0xe3][0xe4]7N[0x1e]
[0xff][0xae]O[0xbf]<y[0xf3][0xfb][0xbc]<M[0xe7][0xed][0xa2]L_~[0xf5][0xe4][0xf9]

有没有办法禁用所有输出？还是有人知道是什么原因造成的？可能这甚至是我应该向社区发布的错误吗？

谢谢你的时间

用户名

我找到了问题的答案。我将方法名称从main（string [] args）更改为crawl（）。然后crawler4j开始打印ort调试内容。当我更改logger4j.properties时，它们消失了。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-4

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

crawler4j打印大量系统输出

crawler4j打印大量系统输出

改善crawler4j的性能

在crawler4j中停用RobotServer

crawler4j中WebCrawler的参数

Crawler4j缺少传出链接？

crawler4j中WebCrawler的参数

Groovy中的Crawler（JSoup VS Crawler4j）

crawler4j抓取网站无法正常工作

Crawler4j with Grails App throws error

如何为Crawler4J编写自己的异常处理？

Crawler4J为null，同时处理（链接）错误

Crawler4j与Jsoup在Java中进行页面爬取和解析

使用Grails应用程序的Crawler4j引发错误

Crawler4j可以在robots.txt中使用星号（*）解释通配符吗？

将URL限制为仅种子URL域crawler4j

网址的Crawler4j正则表达式模式

使用Grails应用程序的Crawler4j引发错误

Crawler4j可以从另一个类运行吗

Crawler4j可以在robots.txt中使用星号（*）来解释通配符吗？

Crawler4j、Jsoup 和 JavaScript：提取用 JavaScript 修改的属性值

使用crawler4j获取html页面中存在的所有iframe，base64代码

Crawler4j，某些URL可以毫无问题地进行爬网，而其他URL则完全不可以进行爬网

Ruby：打印系统实时输出？

打印输出中的打印系统信息

Crawler4j-许多URL被丢弃/未处理（输出丢失）

Crawler4j-许多URL被丢弃/未处理（输出丢失）

Crawler4j-NoSuchMethod getOutgoingUrls（）

同时保存和打印R系统调用输出？

execlp系统调用无法打印到标准输出？

Neo4j为大量节点建立索引