我正在尝试访问只读存档中的一些卫星数据。我只对与我的研究领域相匹配的zip内.xml中列出的具有某些坐标的文件感兴趣。
一年中的每一天都有多个文件。目前,我专注于2015/07文件夹。其中每个月的每一天都有一个单独的文件夹。每天的文件夹都包含许多.zip文件和其他文件类型。
zip文件的命名约定/结构始终相同,因此所有包含的文件都使用.zip文件名-后缀/文件扩展名的更改如下:
$unzip -l S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip Archive: S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip
Length Date Time Name
--------- ---------- ----- ----
0 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/
16099 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/manifest.safe
0 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/measurement/
861899961 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/measurement/s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.tiff
0 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/
1685172 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
0 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/
1013267 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/calibration-s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
317418 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/noise-s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
0 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/
2437 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/product-preview.html
124584 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/quick-look.png
0 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/icons/
95280 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/icons/logo.png
1026 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/map-overlay.kml
20088 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE-report-20150701T155156.pdf
0 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/
440 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-product-preview.xsd
450 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-map-overlay.xsd
471 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-measurement.xsd
62654 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-object-types.xsd
469 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-quicklook.xsd
6427 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-calibration.xsd
147222 07-08-2015 15:04 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-product.xsd
3956 07-08-2015 15:05 S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-noise.xsd
因此,如果我选择一个月中的某一天,则可以使用以下命令检查每个.kml文件中的每个坐标:
unzip -p S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/map-overlay.kml`
要给出整个.kml文件的内容:
<?xml version="1.0" encoding="UTF-8"?>0_20150701T135135_006618_008D39_BE79.SAFE<kml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://wwsa.int/safe/sentinel-1.0/sentinel-1" xmlns:s1sar="http://www.esa.int/safe/sentia.int/safe/sentinel-1.0/sentinel-1/sar/level-2" xmlns:gx="http://www.google.com
<Document>
<name>Sentinel-1 Map Overlay</name>
<Folder>
<name>Sentinel-1 Scene Overlay</name>
<GroundOverlay>
<name>Sentinel-1 Image Overlay</name>
<Icon>
<href>quick-look.png</href>
</Icon>
<gx:LatLonQuad>
<coordinates>-115.928909,35.970608 -118.750404,36.374107 -118.459686,
</gx:LatLonQuad>
</GroundOverlay>
</Folder>
</Document>
</kml>
但是,我需要在2015年和2016年的每一天都这样做,所以我想做的是:遍历zip文件并打印.zip文件的名称和包含的.xml文件中包含行的行。 -坐标:
<coordinates>-115.928909,35.970608 -118.750404,36.374107 -118.459686,
</gx:LatLonQuad>
我不希望有人为我完全写这篇文章,但是一开始的帮助会有所帮助。
首先是这样的:
for zf in *.zip ; do
base=${zf/\.zip/}
echo "$zf"
unzip -p "$zf" "$base.SAFE/preview/map-overlay.kml" |
sed -ne '/<gx:/,/<\/gx:/p'
done
这会将每个.zip文件中的... / map-overlay.kml文件通过管道传输到sed
,该文件仅打印<gx:
和之间的行</gx:
。
或者,如果只需要该<coordinates>
行,请将sed
脚本更改为:
sed -ne '/<coordinates>/p'
但是请注意,尽管这些sed
脚本可用于示例数据,但是如果您使用正则表达式进行提取,则即使从XML文件中简单提取几行也容易失败。不说我是失职的:
不要用正则表达式解析XML或HTML 。这就是为什么它不起作用的原因。
使用xmlstarlet
会更好。一个perl
或python
脚本,使用他们的XML解析库之一就更好了。顺便说一句,这两个perl
和python
也有库模块与.zip文件工作....所以整个工作可能在任何一种语言来完成。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句