Josh 37 Posted October 6, 2012 Report Share Posted October 6, 2012 I'm trying to figure out if this is even possible.... I'm trying to scrape a page like this...http://www.majesticseo.com/reports/site-explorer/summary/google.com I'm pretty much scraping the numbered results. The only problem is that the only attribute to scrape is <tagname="b"> which doesn't work because you would need to use the element offset and the page doesn't always contain the same data. For example: http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com Does Anyone have any ideas? Quote Link to post Share on other sites
Atd 1 Posted October 6, 2012 Report Share Posted October 6, 2012 Try with Scrape Page, example: navigate("http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com", "Wait") set(#referringdomains, $page scrape(" Referring Domains </p> <p style=\"font-size: 150%;\"> <b>", "</b>"), "Global") set(#referringipaddresses, $page scrape(" <p>Referring <b>IP</b> addresses: <b>", "</b> </p>"), "Global") set(#externalbacklinks, $page scrape(" External Backlinks </p> <p style=\"font-size: 150%;\"> <b>", "</b>"), "Global") load html("Referring Domains: {#referringdomains}<br> Referring IP Addresses: {#referringipaddresses}<br> External Backlinks: {#externalbacklinks}") 017-majesticseoscrape.ubot Quote Link to post Share on other sites
Josh 37 Posted October 6, 2012 Author Report Share Posted October 6, 2012 Try with Scrape Page, Thanks! I tried with page scrape and it didn't work. I didn't think to scrape the html. Thanks again. Quote Link to post Share on other sites
Legend 181 Posted October 7, 2012 Report Share Posted October 7, 2012 try: set(#data, $scrape attribute(<tagname="td">, "innertext"), "Global") Quote Link to post Share on other sites
k1lv9h 76 Posted October 7, 2012 Report Share Posted October 7, 2012 Hi, Sample code:set(#urls, "http://www.majesticseo.com/reports/site-explorer/summary/google.com http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com", "Global") clear list(%urls) add list to list(%urls, $list from text(#urls, $new line), "Delete", "Global") loop($list total(%urls)) { if($comparison($list position(%urls), "<", $list total(%urls))) { then { set(#urlfordata, $next list item(%urls), "Global") navigate(#urlfordata, "Wait") wait for browser event("Everything Loaded", 30) getmajesticseodata() wait(30) } else { } } } define getmajesticseodata { set(#referringdomainswonl, $replace($scrape attribute(<outerhtml=w"<td width=\"60%\"> <p> Referring Domains </p> <p style=\"font-size: 150%;\"> <b>*</b> </p> <p style=\"margin:20px 20px 0px;\"> <b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"> *</a></b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"></a> in the last 5 years. </p> </td>">, "innerhtml"), $new line, " "), "Global") set(#referringdomains, $replace regular expression($replace regular expression(#referringdomainswonl, "<\\/b>.*", $nothing), ".*<b>", $nothing), "Global") set(#referringipaddresses, $replace regular expression($replace regular expression($scrape attribute(<outerhtml=w"<p>Referring <b>IP</b> addresses: <b>*</b> </p>">, "innerhtml"), ".*<b>", $nothing), "<\\/b>.*", $nothing), "Global") set(#externalbacklinkswonl, $replace($scrape attribute(<outerhtml=w"<td width=\"40%\"> <p> External Backlinks </p> <p style=\"font-size: 150%;\"> <b>*</b> </p> <p style=\"margin:20px 20px 0px;\"> <b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"> *</a></b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"></a> in the last 5 years. </p> </td>">, "innerhtml"), $new line, " "), "Global") set(#externalbacklinks, $replace regular expression($replace regular expression(#externalbacklinkswonl, "<\\/b>.*", $nothing), ".*<b>", $nothing), "Global") load html("Data from url: {#urlfordata}<br><br> Referring Domains: {#referringdomains}<br> Referring IP Addresses: {#referringipaddresses}<br> External Backlinks: {#externalbacklinks}") } sample-majesticseo-scrape-001.ubot Kevin Quote Link to post Share on other sites
Josh 37 Posted October 7, 2012 Author Report Share Posted October 7, 2012 Thanks kevin. The page scrape worked perfect and was able to use if then statements so that if the variable didn't exists on the page it simlpy added 0 to the list. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.