(The webrequest result)
   TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
Name              MemberType Definition
----              ---------- ----------
Equals            Method     bool Equals(System.Object obj)
GetHashCode       Method     int GetHashCode()
GetType           Method     type GetType()
ToString          Method     string ToString()
AllElements       Property   
Microsoft.PowerShell.Commands.WebCmdletElementC...
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
Content           Property   string Content {get;}
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollect...
Headers           Property   System.Collections.Generic.Dictionary[string,st...
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent        Property   string RawContent {get;}
RawContentLength  Property   long RawContentLength {get;}
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
StatusCode        Property   int StatusCode {get;}
StatusDescription Property   string StatusDescription {get;}
(The All Elements Property)
   TypeName: System.Management.Automation.PSCustomObject
Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
innerHTML   NoteProperty  innerHTML=null
innerText   NoteProperty  innerText=null
outerHTML   NoteProperty  outerHTML=null
outerText   NoteProperty  outerText=null
tagName     NoteProperty System.String tagName=!
Here we have a list of all tag elements on the page.  These can be as wide as <html> and everything in that, or down to a leaf element like <img>
Going back to parsing of well designed sites, I wanted to write something to check prayer times. In Malaysia there are a few sites that post them, one is www.e-solat.gov.my. In my experience, this site is broken frequently, and with its recent redesign, it looks too complicated to try to parse it. Www.bankislam.com.my on the other hand, labels various parts of their page, so its easy to pull the data. In the raw HTML, we have this
<label class="SolatTime">Solat Time, KL <img src="/_layouts/AtQuest/BankIslam/Images/greyarrow3.jpg" /> Imsak 5:59 | Subuh 6:09 | Syuruk 7:28 | Zuhur 1:29 | Asar 4:51 | Maghrib 7:27 | Isyak 8:39</label><br />
We can see here they have the data in a labelled class "SolatTime". So we can grab that and split up the results, returning a PSobject of times.
$BIsitedata = Invoke-WebRequest -Uri http://www.bankislam.com.my 
$htmldata = $biSitedata.allelements|where {$_.tagname -eq "Label" -and $_.innerhtml -match "SOLAT"}
$result = new-object psobject
$htmldata.innertext.split("|") | where {$_ -notmatch "Solat time" } |foreach {
 $entry = $_.split( )
 add-member -inputobject $result NoteProperty $entry[1] $entry[2]
}
$result
Subuh : 6:09
Syuruk : 7:28
Zuhur : 1:29
Asar : 4:51
Maghrib : 7:27
Isyak : 8:39
 
 
No comments:
Post a Comment