网站站长工具Api，出现1000多个抓取错误

debugcn 发表于 Dev

约翰尼斯

我使用新的网站管理员工具api获取我所有网站的抓取错误（+详细信息）。难受它只给我1000，但我却像10000。有没有办法让所有人都得到？

这是我使用的代码：

package main;

import com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.auth.oauth2.GoogleTokenResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;

import com.google.api.services.webmasters.Webmasters;
import com.google.api.services.webmasters.Webmasters.Urlcrawlerrorssamples;
import com.google.api.services.webmasters.model.SitesListResponse;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSample;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSamplesListResponse;
import com.google.api.services.webmasters.model.WmxSite;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;


public class WebmastersCommandLine {

  private static String CLIENT_ID = "...";
  private static String CLIENT_SECRET = "...";

  private static String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";

  private static String OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly";

  private static String PAGE_URL = "...";

  public static void main(String[] args) throws IOException {
    HttpTransport httpTransport = new NetHttpTransport();
    JsonFactory jsonFactory = new JacksonFactory();

    GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
        httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET, Arrays.asList(OAUTH_SCOPE))
        .setAccessType("online")
        .setApprovalPrompt("auto").build();

    String url = flow.newAuthorizationUrl().setRedirectUri(REDIRECT_URI).build();
    System.out.println("open URL:");
    System.out.println("  " + url);
    System.out.println("code:");
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    String code = br.readLine();

    GoogleTokenResponse response = flow.newTokenRequest(code).setRedirectUri(REDIRECT_URI).execute();
    GoogleCredential credential = new GoogleCredential().setFromTokenResponse(response);

    // Create a new authorized API client
    Webmasters service = new Webmasters.Builder(httpTransport, jsonFactory, credential)
        .setApplicationName("WebmastersCommandLine")
        .build();

    Webmasters.Urlcrawlerrorssamples.List req2 = service.urlcrawlerrorssamples().list(PAGE_URL, "notFound", "web");

    try
    {
        UrlCrawlErrorsSamplesListResponse urlList = req2.execute();

        System.out.println("start");

        for(UrlCrawlErrorsSample sample : urlList.getUrlCrawlErrorSample())
        {
            Webmasters.Urlcrawlerrorssamples.Get req3 = service.urlcrawlerrorssamples().get(PAGE_URL, sample.getPageUrl(), "notFound", "web");
            UrlCrawlErrorsSample details = req3.execute();

            System.out.println(sample.getPageUrl() + "," + details.getUrlDetails().getLinkedFromUrls());
        }

    }
    catch(IOException e)
    {
        System.out.println("An error occurred: " + e);
    }

    System.out.println("done");
  }

}

但是，这仅给我列出了1000个错误，但我需要所有10000个错误。有人知道这样做的方法吗？

约翰·穆勒

网站站长工具API URL抓取错误示例方法返回1000个抓取错误的示例。这并不是要返回完整的列表（您可以从服务器日志中进行编译）。如果您希望通过API获得更多示例，您可以做的一件事就是将这些错误标记为已修复，并在一天之内进行检查。然后，它将根据其余的爬网错误生成一组样本。

样本的顺序与用户界面中的顺序相同，因此更重要的将是您看到的第一个。这意味着随着您的前进，收益将递减，以后的爬虫错误要么类似于先前的爬虫错误，要么至少被视为不那么严重。原始博客文章中有更多关于优先级的信息：

我们会根据多种因素来确定这一点，其中包括您是否在站点地图中包含了URL，链接到该站点的位置数（以及其中的任何一个也在您的网站上），以及该URL最近是否获得了流量从搜索。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-9

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

网站站长工具Api，出现1000多个抓取错误

网站站长工具Api，出现1000多个抓取错误

如何从网站站长工具API中获取5000多个查询？

Google网站站长工具中的页面重复

从Google网站站长工具下载提交的站点地图

如何在网站站长工具中测试网站而不编制索引

网站站长工具不喜欢我的页面设置模式

使用OAuth 2.0无法与网站站长工具连接吗？

Google网站站长工具中的重复域及其对Google Analytics（分析）的影响

网站管理员工具Api，再抓取1000个抓取错误

覆盖网站站长以表格形式重新格式化？

Google网站站长对Google的印象突然下降

WordPress中的网站站长元标记问题

Coldfusion，使用cfqueryparam过滤发送给网站站长的电子邮件表单

提交给Google网站站长时，站点地图文件的名称应该是什么？

抓取并监视+1000个网站

使用beautifulsoup从网站抓取表格，最后出现错误

Python网站抓取工具UnicodeEncodeError

通过多个网站抓取

在多个网站上抓取网页

Maven 安装在 Spring 工具套件中出现多个错误

在python中通过scrapy抓取网站时出现以下错误：

使用scrapy抓取网站时出现错误响应“NoneType”对象不可迭代

从 api 网站下载 .json 时出现 ResponseStatusLine 错误

R Web抓取网站的多个级别

尝试抓取网站时连接拒绝错误

使用puppeteer进行JS网站抓取，出现此错误：（节点：12121）UnhandledPromiseRejectionWarning：TypeError：src.jsonValue不是函数

用于受密码保护的网站的nodejs Web抓取工具

网站抓取工具再上一个台阶

抓取数据时出现ASCII编码错误

尝试抓取表格时出现错误