Jump to content
UBot Underground

Can I Use Scrape Attribute Without Loading Html?


Recommended Posts

Hi

 

I like to load the HTML of a product from Amazon into a list and then extract the fields I want from that. It takes care of when fields are empty. So the HTML of the single product is a variable say #eachsection. It is easy to use xpath because I can set the input to be #eachsection. Is there a way to use scrape element on #eachsection without loading into the browser? It works fine if I load the variable as HTML into the browswer , it is just it slows down the whole bot by a factor of about 5.

 

Here is the xpath example, I want to do the same with scrape attribute

 

    set(#price,$plugin function("XpathPlugin.dll", "$Generic Xpath Parser", #eachsection, "//span[1]/span[@class=\'a-offscreen\' and 1]", "innertext", "False"),"Global")
    add item to list(%price,#prices,"Don\'t Delete","Global")

 

This also works but there is no way to supply an input other than the browser so it is slow - each #eachsection must be loaded in a loop.

 

    set(#link,$scrape attribute(<src=w"https://*">,"fullsrc"),"Global")
    add item to list(%link,#link,"Don\'t Delete","Global")

Link to post
Share on other sites

Generally its better to use xpath when you can and in this situation it looks like you can so why not just use that?

 

Sometimes you can use element parent, sibling, child but typically its going to be much more productive to just use xpath whenever possible.

Link to post
Share on other sites

OK, thanks for the info. My problem is that I have a quirky scrape that does not seem to respond to xpath. It is Amazon deal of the day. I want to scrape the Name of the Product and the Product link. I have isolated the individual items.

 

This works as a $scrape attribute...but nothing worked as an xpath - it always came up blank. Same for the link

 

        set(#product,$scrape attribute(<class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight">,"innertext"),"Global")
        add item to list(%product,#product,"Don\'t Delete","Global")

 

        set(#productlink,$scrape attribute(<class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight">,"href"),"Global")
        add item to list(%productlink,#productlink,"Don\'t Delete","Global")

 

The isolated code looks like this below

 

And these xpaths did not work for me

 

//*[@id=dealTitle]/span

//a[@id=dealTitle]/span[@class='a-declarative' and 1]

 

Sorry for the layout below, but that is after formatting!! ... I am after "Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet..." And its link.

 

<div class="a-row a-spacing-mini unitLineHeight">
<a id="dealTitle" class="a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight" href="https://www.amazon.com/Android-Toddler-Childrens-Parental-Control/dp/B07RV14G3R/ref=gbps_tit_s-5_b5d1_53611b29?smid=A2PZE9JX4CB8YU&pf_rd_p=473a0caf-eecb-4c73-92a7-1e27f89fb5d1&pf_rd_s=slot-5&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=N37P7NE3CK9060DAK0Z8" style="width: 210px;">

<span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{"actionType":"TITLE","position":"23","widgetID":"101","dealID":"53611b29"}">
Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet...</span></a>
</div

Edited by charliefinale
Link to post
Share on other sites

Trying Regex now......I am wondering if find regular expression  is buggy. Or is it me? I constructed  one as follows

 

(?s)(?<=class=\"\"><a href=\")(.*?)(.+?(?=\"))

 

And tested/constructed  it to the following HTML in the Ubot Regex  Editor and it got what I wanted perfectly which is

 

/ip/Fried-Green-Tomatoes-Anniversary-Edition-Extended-Version-DVD/4694735

 

But used in UBot on the same HTML in turns up nothing

 

     add item to list(%productlink,"https://walmart.com{$plugin function("XpathPlugin.dll", "$Generic Xpath Parser", #eachblock, "(?s)(?<=class=\\\"\\\"><a href=\\\")(.*?)(.+?(?=\\\"))", "", "False")}","Don\'t Delete","Global")

 

style="height: 0px;"></div><div style="height: 200px;"><div class="search-result-productimage gridview"><span class="visuallyhidden">Product Image</span><div class=""><a href="/ip/Fried-Green-Tomatoes-Anniversary-Edition-Extended-Version-DVD/4694735" class="display-block"><img data-pnodetype="item-pimg" data-image-indicator="0" data-image-src="https://i5.walmartimages.com/asr/5dca2eaf-0b04-4a81-b1bf-c6135756a3e7_1.adbb4bc1cc8b4199d7b05c79b35e29ec.jpeg?odnWidth=200&odnHeight=200&odnBg=ffffff"src="https://i5.walmartimages.com/asr/5dca2eaf-0b04-4a81-b1bf-c6135756a3e7_1.adbb4bc1cc8b4199d7b05c79b35e29ec.jpeg?

Edited by charliefinale
Link to post
Share on other sites

Based on the code you gave this should work you probably don't even need the html decode but I threw it in there anyways:

set(#html,"<div class=\"a-row a-spacing-mini unitLineHeight\">
<a id=\"dealTitle\" class=\"a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight\" href=\"https://www.amazon.com/Android-Toddler-Childrens-Parental-Control/dp/B07RV14G3R/ref=gbps_tit_s-5_b5d1_53611b29?smid=A2PZE9JX4CB8YU&pf_rd_p=473a0caf-eecb-4c73-92a7-1e27f89fb5d1&pf_rd_s=slot-5&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=N37P7NE3CK9060DAK0Z8\" style=\"width: 210px;\">

<span class=\"a-declarative\" data-action=\"gbdeal-actionrecord\" data-gbdeal-actionrecord=\"\{"actionType":"TITLE","position":"23","widgetID":"101","dealID":"53611b29"\}\">
Kids Tablet 7 Android Kids Tablet Toddler Tablet Kids Edition Tablet...</span></a>
</div>","Global")
set(#htmlDecode,$plugin function("HeopasCustom.dll", "$Heopas Text Encode/Decode", "HTML Decode", #html),"Global")
set(#title,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", #htmlDecode, "//a[@id=\'dealTitle\']", "InnerText", ""),"Global")
set(#link,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", #htmlDecode, "//a[@id=\'dealTitle\']", "href", ""),"Global")
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...