charliefinale 5 Posted August 19, 2019 Report Share Posted August 19, 2019 Hi I like to load the HTML of a product from Amazon into a list and then extract the fields I want from that. It takes care of when fields are empty. So the HTML of the single product is a variable say #eachsection. It is easy to use xpath because I can set the input to be #eachsection. Is there a way to use scrape element on #eachsection without loading into the browser? It works fine if I load the variable as HTML into the browswer , it is just it slows down the whole bot by a factor of about 5. Here is the xpath example, I want to do the same with scrape attribute set(#price,$plugin function("XpathPlugin.dll", "$Generic Xpath Parser", #eachsection, "//span[1]/span[@class=\'a-offscreen\' and 1]", "innertext", "False"),"Global") add item to list(%price,#prices,"Don\'t Delete","Global") This also works but there is no way to supply an input other than the browser so it is slow - each #eachsection must be loaded in a loop. set(#link,$scrape attribute(<src=w"https://*">,"fullsrc"),"Global") add item to list(%link,#link,"Don\'t Delete","Global") Quote Link to post Share on other sites
HelloInsomnia 1103 Posted August 19, 2019 Report Share Posted August 19, 2019 Generally its better to use xpath when you can and in this situation it looks like you can so why not just use that? Sometimes you can use element parent, sibling, child but typically its going to be much more productive to just use xpath whenever possible. Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted August 20, 2019 Report Share Posted August 20, 2019 I am not sure if I understand you correctly. just use XPath/Regex from a variable $scrape attribute is taking from the browser, so no Quote Link to post Share on other sites
charliefinale 5 Posted August 20, 2019 Author Report Share Posted August 20, 2019 (edited) OK, thanks for the info. My problem is that I have a quirky scrape that does not seem to respond to xpath. It is Amazon deal of the day. I want to scrape the Name of the Product and the Product link. I have isolated the individual items. This works as a $scrape attribute...but nothing worked as an xpath - it always came up blank. Same for the link set(#product,$scrape attribute(<class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight">,"innertext"),"Global") add item to list(%product,#product,"Don\'t Delete","Global") set(#productlink,$scrape attribute(<class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight">,"href"),"Global") add item to list(%productlink,#productlink,"Don\'t Delete","Global") The isolated code looks like this below And these xpaths did not work for me //*[@id=dealTitle]/span//a[@id=dealTitle]/span[@class='a-declarative' and 1] Sorry for the layout below, but that is after formatting!! ... I am after "Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet..." And its link. <div class="a-row a-spacing-mini unitLineHeight"><a id="dealTitle" class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight" href="https://www.amazon.com/Android-Toddler-Childrens-Parental-Control/dp/B07RV14G3R/ref=gbps_tit_s-5_b5d1_53611b29?smid=A2PZE9JX4CB8YU&pf_rd_p=473a0caf-eecb-4c73-92a7-1e27f89fb5d1&pf_rd_s=slot-5&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=N37P7NE3CK9060DAK0Z8" style="width: 210px;"><span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{"actionType":"TITLE","position":"23","widgetID":"101","dealID":"53611b29"}">Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet...</span></a></div Edited August 20, 2019 by charliefinale Quote Link to post Share on other sites
charliefinale 5 Posted August 21, 2019 Author Report Share Posted August 21, 2019 (edited) Trying Regex now......I am wondering if find regular expression is buggy. Or is it me? I constructed one as follows (?s)(?<=class=\"\"><a href=\")(.*?)(.+?(?=\")) And tested/constructed it to the following HTML in the Ubot Regex Editor and it got what I wanted perfectly which is /ip/Fried-Green-Tomatoes-Anniversary-Edition-Extended-Version-DVD/4694735 But used in UBot on the same HTML in turns up nothing add item to list(%productlink,"https://walmart.com{$plugin function("XpathPlugin.dll", "$Generic Xpath Parser", #eachblock, "(?s)(?<=class=\\\"\\\"><a href=\\\")(.*?)(.+?(?=\\\"))", "", "False")}","Don\'t Delete","Global") style="height: 0px;"></div><div style="height: 200px;"><div class="search-result-productimage gridview"><span class="visuallyhidden">Product Image</span><div class=""><a href="/ip/Fried-Green-Tomatoes-Anniversary-Edition-Extended-Version-DVD/4694735" class="display-block"><img data-pnodetype="item-pimg" data-image-indicator="0" data-image-src="https://i5.walmartimages.com/asr/5dca2eaf-0b04-4a81-b1bf-c6135756a3e7_1.adbb4bc1cc8b4199d7b05c79b35e29ec.jpeg?odnWidth=200&odnHeight=200&odnBg=ffffff"src="https://i5.walmartimages.com/asr/5dca2eaf-0b04-4a81-b1bf-c6135756a3e7_1.adbb4bc1cc8b4199d7b05c79b35e29ec.jpeg? Edited August 21, 2019 by charliefinale Quote Link to post Share on other sites
HelloInsomnia 1103 Posted August 21, 2019 Report Share Posted August 21, 2019 Based on the code you gave this should work you probably don't even need the html decode but I threw it in there anyways: set(#html,"<div class=\"a-row a-spacing-mini unitLineHeight\"> <a id=\"dealTitle\" class=\"a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight\" href=\"https://www.amazon.com/Android-Toddler-Childrens-Parental-Control/dp/B07RV14G3R/ref=gbps_tit_s-5_b5d1_53611b29?smid=A2PZE9JX4CB8YU&pf_rd_p=473a0caf-eecb-4c73-92a7-1e27f89fb5d1&pf_rd_s=slot-5&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=N37P7NE3CK9060DAK0Z8\" style=\"width: 210px;\"> <span class=\"a-declarative\" data-action=\"gbdeal-actionrecord\" data-gbdeal-actionrecord=\"\{"actionType":"TITLE","position":"23","widgetID":"101","dealID":"53611b29"\}\"> Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet...</span></a> </div>","Global") set(#htmlDecode,$plugin function("HeopasCustom.dll", "$Heopas Text Encode/Decode", "HTML Decode", #html),"Global") set(#title,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", #htmlDecode, "//a[@id=\'dealTitle\']", "InnerText", ""),"Global") set(#link,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", #htmlDecode, "//a[@id=\'dealTitle\']", "href", ""),"Global") Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.