Tibret 5 Posted October 11, 2016 Report Share Posted October 11, 2016 Hey Guys, I am trying to build a Amazonscraper but everytime I try it Ubot just crashes or displays very weird Errors. First Problem: When Visting Amazon and going to a product Overview Site like this:https://www.amazon.de/s/ref=sr_pg_6?rh=i%3Aaps%2Ck%3Awasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO I try to Scrape the URLs with "Scrape Element" and use the selectors, but then Ubot crashes. Problem 2:I hired someone to Scrape the Product Links manually, now I have a big List of URLs from Amazon products.What I want to do ist navigate to each page, an scrape a certain element. After Like 5 or 6 Product Pages Ubot throws an Error, telling me it expected an Boolean Value or that "JSON cant be deserialized". If I try the first 5 Product Pages again (which did work first) The Error now pops up for them as well. I am quite confused... could it be that Amazon is blocking any Scraping actively? Quote Link to post Share on other sites
pash 504 Posted October 11, 2016 Report Share Posted October 11, 2016 post your code. Quote Link to post Share on other sites
deliter 203 Posted October 12, 2016 Report Share Posted October 12, 2016 this should help you along define scrapeAmazonHrefs(#url) { navigate(#url,"Wait") wait for browser event("DOM Ready","") wait for browser event("Everything Loaded","") add list to list(%hrefs,$scrape attribute(<(href=w"https://www.amazon.de/*" AND class="a-link-normal s-access-detail-page a-text-normal")>,"href"),"Delete","Local") loop($list total(%hrefs)) { navigate($next list item(%hrefs),"Wait") wait for browser event("DOM Ready","") wait for browser event("Everything Loaded","") comment("put your scrape code here") } } scrapeAmazonHrefs("https://www.amazon.de/s/ref=sr_pg_6?rh=i:aps%2Ck:wasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO") 1 Quote Link to post Share on other sites
Tibret 5 Posted October 12, 2016 Author Report Share Posted October 12, 2016 Hey Deliter, I testet your Code, the Bot just works perfekt, it opens the product overview Page and then navigates to each product detail Page. One thing is very strange, when I look into the debugger it states that the list %href is empfy...But this can´t be true, because it navigates to every scraped URL. I don´t understand this behaviour, and it would be greate so save the scraped URLs as well. this should help you along define scrapeAmazonHrefs(#url) { navigate(#url,"Wait") wait for browser event("DOM Ready","") wait for browser event("Everything Loaded","") add list to list(%hrefs,$scrape attribute(<(href=w"https://www.amazon.de/*" AND class="a-link-normal s-access-detail-page a-text-normal")>,"href"),"Delete","Local") loop($list total(%hrefs)) { navigate($next list item(%hrefs),"Wait") wait for browser event("DOM Ready","") wait for browser event("Everything Loaded","") comment("put your scrape code here") } } scrapeAmazonHrefs("https://www.amazon.de/s/ref=sr_pg_6?rh=i:aps%2Ck:wasserbett+konditionierer&page=6&keywords=wasserbett+konditionierer&ie=UTF8&qid=1476185127&spIA=B001JBYKB4,B00QQ8Q2XO,B002P55ZTE,B001RW5B7C,B00P94GZGK,B00OW7FBDS,B00PYEUIWM,B00S7Q67SI,B00MJ09CAI,B01GTS3FAI,B00P93REN4,B00P9598OK,B00P92PJDC,B00P965KD2,B01IPNV8P4,B00P80EMCO") Quote Link to post Share on other sites
Tibret 5 Posted October 12, 2016 Author Report Share Posted October 12, 2016 post your code. Here is what I have so far, I load a list of URLs I got by manual collection: set(#Loopcounter,0,"Global") loop($list total(%Produkt URLS)) { navigate($list item(%Produkt URLS,#Loopcounter),"Wait") wait for browser event("Page Loaded","") set(#ProduktURL,$scrape attribute(<id="merchant-info">,"innerhtml"),"Global") set(#TEST,$plugin function("File Management.dll", "$Find Regex First", #ProduktURL, "(?<=href=\").*?(?=\">)"),"Global") set(#TEST,$replace(#TEST,"amp;",""),"Global") navigate("https://www.amazon.de{#TEST}","Wait") wait for browser event("Page Loaded","") set(#DATEN,$page scrape("Verkäuferinformationen","Aktuelles Feedback:"),"Global") set(#DATEN,$replace regular expression(#DATEN,"<.*?>",""),"Global") set table cell(&Impressum Amazon,#Loopcounter,0,"https://www.amazon.de{#TEST}") set table cell(&Impressum Amazon,#Loopcounter,1,#DATEN) increment(#Loopcounter) } This Code works for product pages, but after a while it stops working and even gets me an Error on Pages that worked before. Quote Link to post Share on other sites
pash 504 Posted October 12, 2016 Report Share Posted October 12, 2016 A better way is to use the Product ID.Instead of full url https://www.amazon.de/dp/Product IDsample https://www.amazon.de/dp/B01DPQ6JDMget id from url alert($find regular expression("https://www.amazon.de/dp/B01BNJKU3I?ref_=ams_ad_dp_asin_2","(?<=dp\\/)[0-9A-Z]+"))this sample scrap clear all data navigate("https://www.amazon.de","Wait") Wait() change attribute(<name="field-keywords">,"value","lg uhd") click(<value="Los">,"Left Click","No") Wait() add list to list(%URL,$scrape attribute(<class="a-link-normal s-access-detail-page a-text-normal">,"fullhref"),"Delete","Global") comment("add list to list(%Title,$scrape attribute(<class=\"a-link-normal s-access-detail-page a-text-normal\">,\"innertext\"),\"Delete\",\"Global\")") set(#ListPostion,0,"Global") loop($subtract($list total(%URL),#ListPostion)) { set(#URL,$list item(%URL,#ListPostion),"Global") navigate(#URL,"Wait") Wait() set(#Title,$scrape attribute(<id="productTitle">,"innertext"),"Global") set(#Produktinformation,$scrape attribute(<(tagname="td" AND class="bucket")>,"innertext"),"Global") set table cell(&Impressum Amazon,#ListPostion,0,#Title) set table cell(&Impressum Amazon,#ListPostion,1,#URL) set table cell(&Impressum Amazon,#ListPostion,2,#Produktinformation) increment(#ListPostion) } define Wait { wait for browser event("Everything Loaded","") wait(1) } 1 Quote Link to post Share on other sites
Tibret 5 Posted October 12, 2016 Author Report Share Posted October 12, 2016 Hey pash, thank you very much! Works just perfect! I didn´t know until now that you could use "AND" in the Scrape Attribute funcion! Great to know.I think I can get my project done now. Many thanks! 1 Quote Link to post Share on other sites
deliter 203 Posted October 12, 2016 Report Share Posted October 12, 2016 Hey Deliter, I testet your Code, the Bot just works perfekt, it opens the product overview Page and then navigates to each product detail Page. One thing is very strange, when I look into the debugger it states that the list %href is empfy...But this can´t be true, because it navigates to every scraped URL. I don´t understand this behaviour, and it would be greate so save the scraped URLs as well. These are local variables,if you edit the list you will see the variable is local rather than Global a variable has a scope of either global or local,your debugger only shows global variables local variables scope is within the function/command they were called from,while new keep to Global,but as you get better try to use more custom commands/functions,and localize the variables where poaaible,the advantage is your debuuger is smaller and cleaner so debugging bigger bots is easier and best advantage is its easier to script your bot to be..well I cant describe it kind of dynamic,you might make a bot with no functions in the middle of a loop inside of another 5 loops you see a problem say a captcha,then your adding more code,and because of the captcha,you might want to solve the captcha and the bot to go back to where it was before it started scraping,quickly becomes a total mess,with functions once each operation within the bot is defined,you can make it go backwards and forwards etc 1 Quote Link to post Share on other sites
Tibret 5 Posted October 13, 2016 Author Report Share Posted October 13, 2016 Hey Deliter, thanks for that Advice! I still have some Problems when Building more complex Bots. I didn´t know that local variables don´t show up in the debugger, but makes sense now. Many thanks! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.