Tuesday, January 29, 2013

Invoke-webrequest example (Finding time for solat in Kuala Lumpur)

On and off over the last few years, I have had to script various html interactions to pull data from sites, navigate through them and parse results.  Previous I had done this with a variety of tools, from scripting Internet explorer with COM, .NET's webclient, and various PERL modules.  Each had a variety of limitations or difficults to overcome with parsing.  Recently I started playing around with Powershell's Invoke-Webrequest.  For simplicity, features and easy parsing, this looks quite promising.  Parsing is especially easy when sites are well designed and various components have unique names.  When you run the request, there is is collection of results under AllElements.

(The webrequest result)

   TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject


Name              MemberType Definition
----              ---------- ----------
Equals            Method     bool Equals(System.Object obj)
GetHashCode       Method     int GetHashCode()
GetType           Method     type GetType()
ToString          Method     string ToString()
AllElements       Property   
Microsoft.PowerShell.Commands.WebCmdletElementC...
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
Content           Property   string Content {get;}
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollect...
Headers           Property   System.Collections.Generic.Dictionary[string,st...
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent        Property   string RawContent {get;}
RawContentLength  Property   long RawContentLength {get;}
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementC...
StatusCode        Property   int StatusCode {get;}
StatusDescription Property   string StatusDescription {get;}



(The All Elements Property)

   TypeName: System.Management.Automation.PSCustomObject

Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
innerHTML   NoteProperty  innerHTML=null
innerText   NoteProperty  innerText=null
outerHTML   NoteProperty  outerHTML=null
outerText   NoteProperty  outerText=null
tagName     NoteProperty System.String tagName=!

Here we have a list of all tag elements on the page. These can be as wide as <html> and everything in that, or down to a leaf element like <img>

Going back to parsing of well designed sites, I wanted to write something to check prayer times.  In Malaysia there are a few sites that post them, one is www.e-solat.gov.my.  In my experience, this site is broken frequently, and with its recent redesign, it looks too complicated to try to parse it.  Www.bankislam.com.my on the other hand, labels various parts of their page, so its easy to pull the data.  In the raw HTML, we have this

<label class="SolatTime">Solat Time, KL <img src="/_layouts/AtQuest/BankIslam/Images/greyarrow3.jpg" /> Imsak 5:59 | Subuh 6:09 | Syuruk 7:28 | Zuhur 1:29 | Asar 4:51 | Maghrib 7:27 | Isyak 8:39</label><br />

We can see here they have the data in a labelled class "SolatTime".  So we can grab that and split up the results, returning a PSobject of times.


$BIsitedata = Invoke-WebRequest -Uri http://www.bankislam.com.my 
$htmldata = $biSitedata.allelements|where {$_.tagname -eq "Label" -and $_.innerhtml -match "SOLAT"}
$result = new-object psobject
$htmldata.innertext.split("|") | where {$_ -notmatch "Solat time" } |foreach {
 $entry = $_.split( )
 add-member -inputobject $result NoteProperty $entry[1] $entry[2]
}
$result

And our results:


Subuh   : 6:09
Syuruk  : 7:28
Zuhur   : 1:29
Asar    : 4:51
Maghrib : 7:27
Isyak   : 8:39

No comments:

Post a Comment