Jump to content
UBot Underground

Problems With Amazon Scraping!? (Amazon Germany)


Recommended Posts

Hey Guys,

 

I am trying to build a Amazonscraper but everytime I try it Ubot just crashes or displays very weird Errors.

 

First Problem:

 

When Visting Amazon and going to a product Overview Site like this:

https://www.amazon.de/s/ref=sr_pg_6?rh=i%3Aaps%2Ck%3Awasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO 

 

I try to Scrape the URLs with "Scrape Element" and use the selectors, but then Ubot crashes. 

 

Problem 2:

I hired someone to Scrape the Product Links manually, now I have a big List of URLs from Amazon products.
What I want to do ist navigate to each page, an scrape a certain element. 

After Like 5 or 6 Product Pages Ubot throws an Error, telling me it expected an Boolean Value or that "JSON cant be deserialized". 
If I try the first 5 Product Pages again (which did work first) The Error now pops up for them as well.

 

I am quite confused... could it be that Amazon is blocking any Scraping actively? 

 

Link to post
Share on other sites

this should help you along

define scrapeAmazonHrefs(#url) {
    navigate(#url,"Wait")
    wait for browser event("DOM Ready","")
    wait for browser event("Everything Loaded","")
    add list to list(%hrefs,$scrape attribute(<(href=w"https://www.amazon.de/*" AND class="a-link-normal s-access-detail-page  a-text-normal")>,"href"),"Delete","Local")
    loop($list total(%hrefs)) {
        navigate($next list item(%hrefs),"Wait")
        wait for browser event("DOM Ready","")
        wait for browser event("Everything Loaded","")
        comment("put your scrape code here")
    }
}
scrapeAmazonHrefs("https://www.amazon.de/s/ref=sr_pg_6?rh=i:aps%2Ck:wasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO")

  • Like 1
Link to post
Share on other sites

Hey Deliter,

 

I testet your Code, the Bot just works perfekt, it opens the product overview Page and then navigates to each product detail Page. 

 

One thing is very strange, when I look into the debugger it states that the list %href is empfy...
But this can´t be true, because it navigates to every scraped URL. 
I don´t understand this behaviour, and it would be greate so save the scraped URLs as well. 

 

this should help you along

define scrapeAmazonHrefs(#url) {
    navigate(#url,"Wait")
    wait for browser event("DOM Ready","")
    wait for browser event("Everything Loaded","")
    add list to list(%hrefs,$scrape attribute(<(href=w"https://www.amazon.de/*" AND class="a-link-normal s-access-detail-page  a-text-normal")>,"href"),"Delete","Local")
    loop($list total(%hrefs)) {
        navigate($next list item(%hrefs),"Wait")
        wait for browser event("DOM Ready","")
        wait for browser event("Everything Loaded","")
        comment("put your scrape code here")
    }
}
scrapeAmazonHrefs("https://www.amazon.de/s/ref=sr_pg_6?rh=i:aps%2Ck:wasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO")

Link to post
Share on other sites

post your code.

 

Here is what I have so far, I load a list of URLs I got by manual collection:

set(#Loopcounter,0,"Global")
loop($list total(%Produkt URLS)) {
    navigate($list item(%Produkt URLS,#Loopcounter),"Wait")
    wait for browser event("Page Loaded","")
    set(#ProduktURL,$scrape attribute(<id="merchant-info">,"innerhtml"),"Global")
    set(#TEST,$plugin function("File Management.dll", "$Find Regex First", #ProduktURL, "(?<=href=\").*?(?=\">)"),"Global")
    set(#TEST,$replace(#TEST,"amp;",""),"Global")
    navigate("https://www.amazon.de{#TEST}","Wait")
    wait for browser event("Page Loaded","")
    set(#DATEN,$page scrape("Verkäuferinformationen","Aktuelles Feedback:"),"Global")
    set(#DATEN,$replace regular expression(#DATEN,"<.*?>",""),"Global")
    set table cell(&Impressum Amazon,#Loopcounter,0,"https://www.amazon.de{#TEST}")
    set table cell(&Impressum Amazon,#Loopcounter,1,#DATEN)
    increment(#Loopcounter)
}

This Code works for product pages, but after a while it stops working and even gets me an Error on Pages that worked before. 

Link to post
Share on other sites

A better way is to use the Product ID.
Instead of full url

https://www.amazon.de/dp/Product ID

sample

https://www.amazon.de/dp/B01DPQ6JDM

get id from url

alert($find regular expression("https://www.amazon.de/dp/B01BNJKU3I?ref_=ams_ad_dp_asin_2","(?<=dp\\/)[0-9A-Z]+"))

this sample scrap

clear all data
navigate("https://www.amazon.de","Wait")
Wait()
change attribute(<name="field-keywords">,"value","lg uhd")
click(<value="Los">,"Left Click","No")
Wait()
add list to list(%URL,$scrape attribute(<class="a-link-normal s-access-detail-page  a-text-normal">,"fullhref"),"Delete","Global")
comment("add list to list(%Title,$scrape attribute(<class=\"a-link-normal s-access-detail-page  a-text-normal\">,\"innertext\"),\"Delete\",\"Global\")")
set(#ListPostion,0,"Global")
loop($subtract($list total(%URL),#ListPostion)) {
    set(#URL,$list item(%URL,#ListPostion),"Global")
    navigate(#URL,"Wait")
    Wait()
    set(#Title,$scrape attribute(<id="productTitle">,"innertext"),"Global")
    set(#Produktinformation,$scrape attribute(<(tagname="td" AND class="bucket")>,"innertext"),"Global")
    set table cell(&Impressum Amazon,#ListPostion,0,#Title)
    set table cell(&Impressum Amazon,#ListPostion,1,#URL)
    set table cell(&Impressum Amazon,#ListPostion,2,#Produktinformation)
    increment(#ListPostion)
}
define Wait {
    wait for browser event("Everything Loaded","")
    wait(1)
}

  • Like 1
Link to post
Share on other sites

Hey pash,

 

thank you very much! Works just perfect! 

 

I didn´t know until now that you could use "AND" in the Scrape Attribute funcion! Great to know.
I think I can get my project done now. 

Many thanks!

  • Like 1
Link to post
Share on other sites

Hey Deliter,

 

I testet your Code, the Bot just works perfekt, it opens the product overview Page and then navigates to each product detail Page. 

 

One thing is very strange, when I look into the debugger it states that the list %href is empfy...

But this can´t be true, because it navigates to every scraped URL. 

I don´t understand this behaviour, and it would be greate so save the scraped URLs as well. 

 

 

These are local variables,if you edit the list you will see the variable is local rather than Global

 

a variable has a scope of either global or local,your debugger only shows global variables

 

local variables scope is within the function/command they were called from,while new keep to Global,but as you get better try to use more custom commands/functions,and localize the variables where poaaible,the advantage is your debuuger is smaller and cleaner so debugging bigger bots is easier

 

 

and best advantage is its easier to script your bot to be..well I cant describe it kind of dynamic,you might make a bot with no functions in the middle of a loop inside of another 5 loops you see a problem say a captcha,then your adding more code,and because of the captcha,you might want to solve the captcha and the bot to go back to where it was before it started scraping,quickly becomes a total mess,with functions once each operation within the bot is defined,you can make it go backwards and forwards etc

  • Like 1
Link to post
Share on other sites

Hey Deliter,

 

thanks for that Advice! 
I still have some Problems when Building more complex Bots. 
 

I didn´t know that local variables don´t show up in the debugger, but makes sense now. 
Many thanks!

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...