从网站下载文件（游戏模块）

debugcn 发表于 Dev

我的目标：每天运行脚本或命令以获取在Transport Fever 2 mod部分中发布的最新mod。https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/

这是一个mod的示例，您可以下载的文件位于底部。https://www.transportfever.net/filebase/index.php?entry/5107-%C3%B6bb-%C3%A4ra-valousek-%C3%B6bb-1012-%C3%B6bb-1014-%C3%B6bb -1163 /

我已经玩过wget了，但是我只能下载index.php文件（我是Linux的初学者）。

我认为问题在于，它们将文件托管在第三方托管服务商上。

有谁知道我如何实现自己的目标？:)

提前致谢！

用户391836

https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/提供了指向最新文件的链接。可以使用来下载站点的html文档，通过curl管道输出以提取下载链接（grep使用下文以一种脆弱的方式完成），并使用命令替换，将此链接传递给第二个curl命令。

curl -OJ \
    $(curl -fs \
    'https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/' | \
    grep -om1 '[^"]*entry-download/[^"]*')

希望这会给您一些基础。

grep 使用的选项：

-o/--only-matching只输出匹配的模式，而不输出包含该模式的整行
-m 1/--max-count=1在包含匹配项的第一行之后停止搜索输入
匹配的模式[^"]*entry-download/[^"]*：：下载链接似乎都提供了href="https://www.transportfever.net/filebase/index.php?entry-download/<number><...>"–因此上述模式匹配似乎就足够了：除双引号之外的任何字符都为零或更多"，然后为entry-download/，再由零或其他以外的其他任何字符组成"

curl 使用的选项（首次通过–替换内）：

-f/--fail如果收到4/5xxhttp答复，则不输出任何内容–请求失败，我们不想grep表示失败的html文档
-s/--silent这是第一阶段，我们不希望看到进度条或其他任何东西

第二遍curl选项–这些下载链接使用content-disposition标头来告诉我们文件名，因此：

-O/--remote-name使用与远程文件相同的名称保存文件
-J/--remote-header-name允许该-O选项使用服务器指定的Content-Disposition文件名，而不是从URL中提取文件名

其实是有一个以上的entry-download/链接-下载所有链接，我们可以删除-m1从grep＆调整第二curl选择，使用--remote-name-all，就像这样：

curl --remote-name-all -J \
    $(curl -fs \
    'https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/' | \
    grep -o '[^"]*entry-download/[^"]*')

文件冲突检查：

如果我们想提前知道content-disposition标题描述的文件名，则需要执行额外的步骤。我们可以使用curl发送head请求：

# get first url from the page, storing it to
# the parameter 'url' so we can use it again later
url=$(curl -fs \
    'https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/' | \
    grep -om1 '[^" ]*entry-download/[^" ]*')

# head request to determine filename
filename=$(curl -Is "$url" | grep -iom1 '^content-disposition:.*filename="[^"]*' | grep -o '[^"]*$')

# 'if' statement using the 'test' / '[' command as the condition
if test -e "$filename"; then
    echo "$filename exists!"
else
    # a file named $filename doesn't exit,
    # so we'll download it
    curl -o "$filename" "$url"
fi

这是一个简单的示例，可在尝试下载之前检查有冲突的文件
并不是真正必要的，因为它curl -J不会覆盖现有文件，但是我怀疑您想检查是否在某些其他目录中或者是否在某些文本文件中是否存在"$filename"-也许没有.zip：："${filename%.zip}"

如果您想对所有提取的entry-download/网址执行此操作，请在上述内容的基础上进行以下操作：

# extract all urls, placing them in an array parameter 'urls'
urls=( $(curl -fs \
    'https://www.transportfever.net/filebase/index.php?filebase/80-transport-fever-2/' | \
    grep -o '[^" ]*entry-download/[^" ]*') )

# loop over extracted urls
for i in "${urls[@]}"; do
    # do filename extraction for "$i"
    # use filename to determine if you want to download "$i"
done

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。