Using powershell to read html content

Tazugan

Sorry for limited knowledge with powershell. Here I try to read html content from a website, and output as csv file. Right now I can successful download whole html code with my powershell script:

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.yahoo.com&vhost=_&vaction=ping&ping=start";
$Path = "$env:userprofile\Desktop\test.txt"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

Get html code, something like this:

  ........
  <tr class="light-grey-bg">
  <td class="right-dotted-border">Stockholm, Sweden (sesto01):</td>
  <td class="right-dotted-border"><span id="cp20">Okay</span>
  </td>
  <td class="right-dotted-border"><span id="minrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="avgrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="maxrtt20">21.9</span>
  </td>
  <td><span id="ip20">2a00:1288:f00e:1fe::3001</span>
  </td>
  </tr>
  ........

But what i really want is get the content and output to csv file like this:

Stockholm Sweden (sesto01),Okay,21.8,21.8,21.9,2a00:1288:f00e:1fe::3001
........

What command can help me achieve this task?

JPBlanc

It was interresting for me too, thanks for the CA site. I wrote this on the corner of my desk, it needs improvments.

Here is a way using Html-Agility-Pack, in the following, I suppose that HtmlAgilityPack.dll is in Html-Agility-Pack directory of the directory script file.

# PingFromTheCloud.ps1

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.silogix.fr&vhost=_&vaction=ping&ping=start";
$Path = "c:\temp\Pingtest.htm"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

Add-Type -Path "$(Split-Path -parent $PSCommandPath)\Html-Agility-Pack\HtmlAgilityPack.dll"


$webGraber = New-Object -TypeName HtmlAgilityPack.HtmlWeb
$webDoc = $webGraber.Load("c:\temp\Pingtest.htm")
$Thetable = $webDoc.DocumentNode.ChildNodes.Descendants('table') | where {$_.XPath -eq '/div[3]/div[1]/div[5]/table[1]/table[1]'}

$trDatas = $Thetable.ChildNodes.Elements("tr")

Remove-Item "c:\temp\Pingtest.csv"

foreach ($trData in $trDatas)
{
  $tdDatas = $trData.elements("td")
  $line = ""
  foreach ($tdData in $tdDatas)
  {
    $line = $line + $tdData.InnerText.Trim() + ','
  }
  $line.Remove($line.Length -1) | Out-File -FilePath "c:\temp\Pingtest.csv" -Append
}

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

read_html(url) 和 read_html(content(GET(url), "text")) 的区别

来自分类Dev

Read xml files using Powershell XMLReader without locking the xml file

来自分类Dev

How to get the html content of a clicked element using jQuery

来自分类Dev

PowerShell解析HTML返回的search404Captions.content404Title

来自分类Dev

PowerShell的Get-Content问题

来自分类Dev

PowerShell输出到HTML

来自分类Dev

PowerShell无序HTML列表

来自分类Dev

熊猫:read_html

来自分类Dev

Powershell Script to Copy Folder and Content to remote location

来自分类Dev

PowerShell Get-Content:尝试{..}捕获{..}?

来自分类Dev

PowerShell Get-Content:尝试{..}捕获{..}?

来自分类Dev

Read data content from an URL in a main with Dart

来自分类Dev

Parsing xml using powershell

来自分类Dev

how to write a script to read double quotes in powershell

来自分类Dev

Using GDB to read MSRs

来自分类Dev

使用Powershell读取html内容

来自分类Dev

在Powershell中解析HTML实体

来自分类Dev

Powershell中的条件HTML样式

来自分类Dev

启动Powershell脚本的HTML网页

来自分类Dev

Powershell中的html对象错误

来自分类Dev

在Powershell中添加HTML列

来自分类Dev

Powershell 解析本地 HTML 文件

来自分类Dev

错误read_html R

来自分类Dev

get-content powershell命令集未返回

来自分类Dev

似乎无法脱离Powershell Get-Content命令

来自分类Dev

Powershell Get-Content特定于文本的内容

来自分类Dev

同步使用Add-Content的PowerShell后台作业

来自分类Dev

PowerShell 中的“Set-Content”是否保留文件访问权限?

来自分类Dev

在CSS Content属性内添加HTML标签