#162324 - 2006-05-23 08:42 PM
Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
I googled web scraping and it returns a lot of hits using Python, VBScript, Twill?, etc. What I want to do is to pull HTML from webpages so I can parse them to find data I need to validate testing.
A former co-worker used COM in a VBScript :
Code:
Set objWindowsShell = CreateObject("Shell.Application") For varObjectIndex = 0 To objWindowsShell.Windows.Count - 1 If objWindowsShell.Windows(varObjectIndex).HWND = varHwnd Then Set objIe = objWindowsShell.Windows(varObjectIndex) Exit For End If Next If varFrameContext = "" Then Set objDocument = objIe.Document Else Set objDocument = objIe.Document.Frames(varFrameContext).Document End If Set objTables = objDocument.All.Tags("TABLE") Set objTable = objTables.Item(varItemId)
If Not objTable.Rows.Length < varRow Then If Not objTable.Rows(varRowIndex).Cells.Length < varCol Then Select Case varPropertyName Case "innerText" varHtmlTableCell = Trim(objTable.Rows(varRowIndex).Cells(varColIndex).innerText) Case "Image.Name" varHtmlTableCell = objTable.Rows(varRowIndex).Cells(varColIndex).Images(0).name End Select Else If IsNumeric(varItemId) Then varItemId = varItemId + 1 varErrorDetail = """" & "Type=HTMLTable;Index=" & varItemId & """" & ", " & """" & "Row=" & varRow & ";Col=" & varCol & """" & vbCrLf & "Col not found." If Not ErrorMessagePersist_True(ErrorMessage, constrProcedureName, mconlngAutomationError, varErrorDetail) Then Exit Function Exit Function End If Else If IsNumeric(varItemId) Then varItemId = varItemId + 1 varErrorDetail = """" & "Type=HTMLTable;Index=" & varItemId & """" & ", " & """" & "Row=" & varRow & ";Col=" & varCol & """" & vbCrLf & "Row not found." If Not ErrorMessagePersist_True(ErrorMessage, constrProcedureName, mconlngAutomationError, varErrorDetail) Then Exit Function Exit Function End If
Set objArguments = Nothing Set objWindowsShell = Nothing Set objIe = Nothing Set objDocument = Nothing Set objTables = Nothing
I had a really tough time find out the properties and methods of the object in use. Can someone tell me if the above code works for what I need or tell me the best way to parse HTML using KiXtart? Thanks!
|
Top
|
|
|
|
#162326 - 2006-05-23 09:01 PM
Re: Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
Quote:
what do you really want? that code above is not the right way if you want to get the whole of the html of a page. for that, a simple 3 line kixtart script will do just fine.
Can you show me the three lines of code?
|
Top
|
|
|
|
#162331 - 2006-05-23 09:24 PM
Re: Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
Quote:
it was a single registry value... just can't remember which one.
Oh it can't be done with a property set for xmlhttp object?
|
Top
|
|
|
|
#162333 - 2006-05-23 10:17 PM
Re: Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
Quote:
no. but you can avoid the registry value by pulling always a different url. that is, add something extra to the end, like "?some=fake&values=here" so the above example becomes: Code:
$http=CreateObject("microsoft.xmlhttp") $http.Open("GET","http://www.kixtart.org/?some=fake&values=here",Not 1) $http.send $http.responsebody ?
anyways, the registry setting is the best choice. it's a per user setting and you can always reset it once you don't need it anymore.
I tried entering in fake values, but it didn't work. I think I may have a special case. The url contains a dll reference.
http:/[ipaddress]/[navigation]/[dllname]?Login
ex: http://10.10.10.1/kix/kix64.dll?Login
Edited by pearly (2006-05-23 10:18 PM)
|
Top
|
|
|
|
#162339 - 2006-05-25 09:24 PM
Re: Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
According to Scripting Guy ()
SyncMode5 can be set to one of the four possible values :
Every visit to the page 3 Every time you start Internet Explorer 2 Automatically 4 Never 0
I've tried all values, but none of them work. Here is my code :
Code:
Break On GetPage("http://10.10.10.1/kix/kix64.dll?Login") ? Sleep 10
Function GetPage($URL) Dim $HTML, $IECacheKey, $IECacheVal $IECacheKey = "HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings" $IECacheVal = ReadValue($IECacheKey, "SyncMode5") $IECacheVal ? If $IECacheVal <> 3 $nul = WriteValue($IECacheKey, "SyncMode5", "3", "REG_DWORD") EndIf $HTML = CreateObject("microsoft.XMLhttp") $HTML.Open("GET", $URL, Not 1) $HTML.Send If $HTML.Status = 200 $GetPage = $HTML.ResponseText ;or ResponseBody Else $GetPage = "HTTP Status Code: " + $HTML.Status + " (" + $HTML.StatusText + ")" Exit 1 EndIf $nul = WriteValue($IECacheKey, "SyncMode5", $IECacheVal, "REG_DWORD") EndFunction
|
Top
|
|
|
|
#162341 - 2006-05-25 10:34 PM
Re: Web scraping in KiX
|
pearly
Getting the hang of it
Registered: 2004-02-04
Posts: 92
|
Quote:
ok, let me ask, what is this dll? if you have the dll custom made there, why you need to pull out the html it produces? why can't you directly give it the info it wants or why don't you ask it directly?
Hmmm, good question. Unfortunately I have no idea what's inside the dll. I'm new in the QA team. The testing tool we use can get capture the HTML content, but I need KiX or some other third-party tool to do it, so I can run it w/o the need to install the testing tool and parse the HTML content to pull the version spec.
Is there any way to mimic programatically the feature for viewing the source in IE (View > Source)?
Edited by pearly (2006-05-25 10:35 PM)
|
Top
|
|
|
|
Moderator: Shawn, ShaneEP, Ruud van Velsen, Arend_, Jochen, Radimus, Glenn Barnas, Allen, Mart
|
0 registered
and 920 anonymous users online.
|
|
|