Jump to content
UBot Underground

Missing Xpath Element And Cant Find The Offset


Recommended Posts

i am scraping 10 helpful votes from the following page using xpath

 

https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only

 

set(#_httpRequestSingleReview,$plugin function("HeopasCustom.dll", "$Heopas HTTP Get", "https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only","", "", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "", ""),"Global")

 

add item to list(%helpful1,$list from text($plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//div[@class=\"a-row a-spacing-small\"]/span[@class=\"a-size-base a-color-tertiary cr-vote-text\"]", "InnerText", "HTML"),$new line),"Don\'t Delete","Global")

 

 

As you can see one of the element is missing, and it gets only 9 items instead of 10 and skips the missing element because one of the vote element is not there

 

So i think the possible solution is  to find offset for this and scrape each item seperately, but how to get correct offset and avoid missing element because whenever i added offset it returned no  elements

 

Link to post
Share on other sites

There's a unique identifier used for each review, that is available in an id attribute. If we get those identifiers into a list, and then go through that list with a with each command, we'll be able to target each of those helpful vote sections individually with a complete class name -since the unique identifier is used in the class name for the element you want to scrape.

 

Not the prettiest way of explaining it so here's an example.

 

In the example below, 10 results are returned including the empty helpful vote, which is returned as the text "helpful vote" in the list.

set(#_httpRequestSingleReview,$plugin function("HeopasCustom.dll", "$Heopas HTTP Get", "https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only#", "", "", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept: */*", ""),"Global")
set(#Count,$plugin function("HeopasCustom.dll", "$Heopas Xpath Count", #_httpRequestSingleReview, "//*[@class=\"a-section review aok-relative\"]"),"Global")
set(#Count1,$plugin function("HeopasCustom.dll", "$Heopas Xpath Count", #_httpRequestSingleReview, "//div[@class=\"a-row a-spacing-small\"]/span[@class=\"a-size-base a-color-tertiary cr-vote-text\"]"),"Global")
set(#ReviewSections,$plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//*[@class=\"a-section review aok-relative\"]", "id", "HTML"),"Global")
add list to list(%ReviewSections,$list from text(#ReviewSections,$new line),"Delete","Global")
with each(%ReviewSections,#ReviewSections) {
    set(#Results,$plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//*[@id=\"customer_review-{#ReviewSections}\"]/div[7]/div/span[1]/div[1]/span", "InnerText", "HTML"),"Global")
    add item to list(%Results1,#Results,"Don\'t Delete","Global")
}
  • Like 2
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...