Jump to content



Photo

How To Scrape Only Certain Part Of A Text String


  • Please log in to reply
4 replies to this topic

#1 spa3212

spa3212

    Advanced Member

  • Members
  • PipPipPip
  • 70 posts
  • OS:Windows 8
  • Total Memory:2Gb
  • Framework:v3.5
  • License:Community Edition

Posted 02 December 2018 - 09:06 AM

I want to scrape dates from a webpage, 

like 

https://www.etsy.com...ifeTees/reviews

 

under review there is date of review posted, I want to scrape the name and date, I could scrape name but not date can someone plz let me know how this can be done..

 

Thanks 

please let me know any regex code or other way..

 

Many thanks in advance



#2 HelloInsomnia

HelloInsomnia

    Advanced Member

  • Moderators
  • 2977 posts
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v4.5+, unsure
  • License:Developer Edition

Posted 02 December 2018 - 10:53 AM

In this case you could just do this:

set(#firstReviewer,$scrape attribute($element offset(<class="shop2-review-attribution">,0),"innertext"),"Global")
add list to list(%reviewerSplit,$list from text(#firstReviewer," on "),"Delete","Global")

But if you grab the whole thing and want the date you could also use this regex:

set(#date,$find regular expression(#firstReviewer,"[A-Z][a-z]+\\s\\d+,\\s20[0-9]+"),"Global")


#3 spa3212

spa3212

    Advanced Member

  • Members
  • PipPipPip
  • 70 posts
  • OS:Windows 8
  • Total Memory:2Gb
  • Framework:v3.5
  • License:Community Edition

Posted 09 December 2018 - 11:25 PM

Thanks this was really helpful..Thanks again..



#4 spa3212

spa3212

    Advanced Member

  • Members
  • PipPipPip
  • 70 posts
  • OS:Windows 8
  • Total Memory:2Gb
  • Framework:v3.5
  • License:Community Edition

Posted 11 December 2018 - 01:12 AM

OH, this works only once but now its not working regex..plz help..


Edited by spa3212, 11 December 2018 - 01:12 AM.


#5 HelloInsomnia

HelloInsomnia

    Advanced Member

  • Moderators
  • 2977 posts
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v4.5+, unsure
  • License:Developer Edition

Posted 11 December 2018 - 02:06 PM

OH, this works only once but now its not working regex..plz help..

 

It grabs the first offset so you can increment the offset to grab more.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users