I googled web scraping and it returns a lot of hits using Python, VBScript, Twill?, etc. What I want to do is to pull HTML from webpages so I can parse them to find data I need to validate testing.
A former co-worker used COM in a VBScript :
Code:
Set objWindowsShell = CreateObject("Shell.Application")
For varObjectIndex = 0 To objWindowsShell.Windows.Count - 1
If objWindowsShell.Windows(varObjectIndex).HWND = varHwnd Then
Set objIe = objWindowsShell.Windows(varObjectIndex)
Exit For
End If
Next
If varFrameContext = "" Then
Set objDocument = objIe.Document
Else
Set objDocument = objIe.Document.Frames(varFrameContext).Document
End If
Set objTables = objDocument.All.Tags("TABLE")
Set objTable = objTables.Item(varItemId)
If Not objTable.Rows.Length < varRow Then
If Not objTable.Rows(varRowIndex).Cells.Length < varCol Then
Select Case varPropertyName
Case "innerText"
varHtmlTableCell = Trim(objTable.Rows(varRowIndex).Cells(varColIndex).innerText)
Case "Image.Name"
varHtmlTableCell = objTable.Rows(varRowIndex).Cells(varColIndex).Images(0).name
End Select
Else
If IsNumeric(varItemId) Then varItemId = varItemId + 1
varErrorDetail = """" & "Type=HTMLTable;Index=" & varItemId & """" & ", " & """" & "Row=" & varRow & ";Col=" & varCol & """" & vbCrLf & "Col not found."
If Not ErrorMessagePersist_True(ErrorMessage, constrProcedureName, mconlngAutomationError, varErrorDetail) Then Exit Function
Exit Function
End If
Else
If IsNumeric(varItemId) Then varItemId = varItemId + 1
varErrorDetail = """" & "Type=HTMLTable;Index=" & varItemId & """" & ", " & """" & "Row=" & varRow & ";Col=" & varCol & """" & vbCrLf & "Row not found."
If Not ErrorMessagePersist_True(ErrorMessage, constrProcedureName, mconlngAutomationError, varErrorDetail) Then Exit Function
Exit Function
End If
Set objArguments = Nothing
Set objWindowsShell = Nothing
Set objIe = Nothing
Set objDocument = Nothing
Set objTables = Nothing
I had a really tough time find out the properties and methods of the object in use. Can someone tell me if the above code works for what I need or tell me the best way to parse HTML using KiXtart? Thanks!