cob007 19 Posted December 17, 2019 Report Share Posted December 17, 2019 i am scraping 10 helpful votes from the following page using xpath https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only set(#_httpRequestSingleReview,$plugin function("HeopasCustom.dll", "$Heopas HTTP Get", "https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only","", "", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "", ""),"Global") add item to list(%helpful1,$list from text($plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//div[@class=\"a-row a-spacing-small\"]/span[@class=\"a-size-base a-color-tertiary cr-vote-text\"]", "InnerText", "HTML"),$new line),"Don\'t Delete","Global") As you can see one of the element is missing, and it gets only 9 items instead of 10 and skips the missing element because one of the vote element is not there So i think the possible solution is to find offset for this and scrape each item seperately, but how to get correct offset and avoid missing element because whenever i added offset it returned no elements Quote Link to post Share on other sites
cob007 19 Posted December 19, 2019 Author Report Share Posted December 19, 2019 here is more detail with image Quote Link to post Share on other sites
SourceUltra 10 Posted December 20, 2019 Report Share Posted December 20, 2019 There's a unique identifier used for each review, that is available in an id attribute. If we get those identifiers into a list, and then go through that list with a with each command, we'll be able to target each of those helpful vote sections individually with a complete class name -since the unique identifier is used in the class name for the element you want to scrape. Not the prettiest way of explaining it so here's an example. In the example below, 10 results are returned including the empty helpful vote, which is returned as the text "helpful vote" in the list. set(#_httpRequestSingleReview,$plugin function("HeopasCustom.dll", "$Heopas HTTP Get", "https://www.amazon.com/Girl-Wash-Your-Face-Believing/product-reviews/B077GZBL1Y/ref=cm_cr_arp_d_paging_btm_next_8?ie=UTF8&pageNumber=8&reviewerType=all_reviews&mediaType=media_reviews_only#", "", "", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept: */*", ""),"Global") set(#Count,$plugin function("HeopasCustom.dll", "$Heopas Xpath Count", #_httpRequestSingleReview, "//*[@class=\"a-section review aok-relative\"]"),"Global") set(#Count1,$plugin function("HeopasCustom.dll", "$Heopas Xpath Count", #_httpRequestSingleReview, "//div[@class=\"a-row a-spacing-small\"]/span[@class=\"a-size-base a-color-tertiary cr-vote-text\"]"),"Global") set(#ReviewSections,$plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//*[@class=\"a-section review aok-relative\"]", "id", "HTML"),"Global") add list to list(%ReviewSections,$list from text(#ReviewSections,$new line),"Delete","Global") with each(%ReviewSections,#ReviewSections) { set(#Results,$plugin function("HTTP post.dll", "$xpath parser", #_httpRequestSingleReview, "//*[@id=\"customer_review-{#ReviewSections}\"]/div[7]/div/span[1]/div[1]/span", "InnerText", "HTML"),"Global") add item to list(%Results1,#Results,"Don\'t Delete","Global") } 2 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.