Jump to content



Photo

Exbrowser Scraping


  • Please log in to reply
4 replies to this topic

#1 gavner25

gavner25

    Member

  • Members
  • PipPip
  • 24 posts
  • OS:Windows 7
  • Total Memory:8Gb
  • Framework:v3.5
  • License:Standard Edition

Posted 07 July 2019 - 02:11 PM

Hi,

 

For some reason im stuck trying to scrape the urls from the blog h2 titles on this page using exbrowser

 

http://carsandauto.over-blog.com/

 

I have tried all of the different xpath variables with the href attribute using the Exbrowser scrape list elements attribute function.

 

e.g

x://h2/a 

x://h2 

x://*[@id="content"]/article[2]/h2/a

x://article[@class='post']

 

but i dont get any results in the debugger??

 

 



#2 HelloInsomnia

HelloInsomnia

    Advanced Member

  • Moderators
  • 3128 posts
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v4.5+, unsure
  • License:Developer Edition

Posted 07 July 2019 - 02:25 PM

This works for me

add list to list(%urls,$plugin function("ExBrowser.dll", "$ExBrowser Generic Xpath Parser", $plugin function("ExBrowser.dll", "$ExBrowser Document Text"), "//h2/a", "href", ""),"Delete","Global")


#3 gavner25

gavner25

    Member

  • Members
  • PipPip
  • 24 posts
  • OS:Windows 7
  • Total Memory:8Gb
  • Framework:v3.5
  • License:Standard Edition

Posted 07 July 2019 - 04:17 PM

ok, can you explain what the document text function is for ?



#4 HelloInsomnia

HelloInsomnia

    Advanced Member

  • Moderators
  • 3128 posts
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v4.5+, unsure
  • License:Developer Edition

Posted 07 July 2019 - 05:45 PM

ok, can you explain what the document text function is for ?

 

It grabs the source code of the page.



#5 gavner25

gavner25

    Member

  • Members
  • PipPip
  • 24 posts
  • OS:Windows 7
  • Total Memory:8Gb
  • Framework:v3.5
  • License:Standard Edition

Posted 08 July 2019 - 04:11 AM

ok great thanks for that.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users