Jump to content
UBot Underground

Recommended Posts

I'm trying to figure out if this is even possible....

 

I'm trying to scrape a page like this...

http://www.majesticseo.com/reports/site-explorer/summary/google.com

 

I'm pretty much scraping the numbered results. The only problem is that the only attribute to scrape is <tagname="b"> which doesn't work because you would need to use the element offset and the page doesn't always contain the same data. For example: http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com

 

Does Anyone have any ideas?

Link to post
Share on other sites

Try with Scrape Page,

 

example:

navigate("http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com", "Wait")
set(#referringdomains, $page scrape(" Referring Domains
</p>
<p style=\"font-size: 150%;\">

<b>", "</b>"), "Global")
set(#referringipaddresses, $page scrape("		 <p>Referring <b>IP</b> addresses: <b>", "</b> </p>"), "Global")
set(#externalbacklinks, $page scrape(" External Backlinks
</p>
<p style=\"font-size: 150%;\">
<b>", "</b>"), "Global")
load html("Referring Domains: {#referringdomains}<br>
Referring IP Addresses: {#referringipaddresses}<br>
External Backlinks: {#externalbacklinks}")

017-majesticseoscrape.ubot

Link to post
Share on other sites

Try with Scrape Page,

 

Thanks! I tried with page scrape and it didn't work. I didn't think to scrape the html. Thanks again.

Link to post
Share on other sites

Hi,

 

Sample code:

set(#urls, "http://www.majesticseo.com/reports/site-explorer/summary/google.com
http://www.majesticseo.com/reports/site-explorer/summary/joshmccann.com", "Global")
clear list(%urls)
add list to list(%urls, $list from text(#urls, $new line), "Delete", "Global")
loop($list total(%urls)) {
if($comparison($list position(%urls), "<", $list total(%urls))) {
 then {
	 set(#urlfordata, $next list item(%urls), "Global")
	 navigate(#urlfordata, "Wait")
	 wait for browser event("Everything Loaded", 30)
	 getmajesticseodata()
	 wait(30)
 }
 else {
 }
}
}
define getmajesticseodata {
set(#referringdomainswonl, $replace($scrape attribute(<outerhtml=w"<td width=\"60%\">
<p>
Referring Domains
</p>
<p style=\"font-size: 150%;\">

<b>*</b>

</p>

<p style=\"margin:20px 20px 0px;\">
<b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\">
*</a></b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"></a> in the last 5 years.
</p>

</td>">, "innerhtml"), $new line, " "), "Global")
set(#referringdomains, $replace regular expression($replace regular expression(#referringdomainswonl, "<\\/b>.*", $nothing), ".*<b>", $nothing), "Global")
set(#referringipaddresses, $replace regular expression($replace regular expression($scrape attribute(<outerhtml=w"<p>Referring <b>IP</b> addresses: <b>*</b> </p>">, "innerhtml"), ".*<b>", $nothing), "<\\/b>.*", $nothing), "Global")
set(#externalbacklinkswonl, $replace($scrape attribute(<outerhtml=w"<td width=\"40%\">
<p>
External Backlinks
</p>
<p style=\"font-size: 150%;\">
<b>*</b>
</p>

<p style=\"margin:20px 20px 0px;\">
<b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\">
*</a></b><a href=\"/reports/site-explorer/summary/*?oq=*&IndexDataSource=H\"></a> in the last 5 years.
</p>

</td>">, "innerhtml"), $new line, " "), "Global")
set(#externalbacklinks, $replace regular expression($replace regular expression(#externalbacklinkswonl, "<\\/b>.*", $nothing), ".*<b>", $nothing), "Global")
load html("Data from url: {#urlfordata}<br><br>
Referring Domains: {#referringdomains}<br>
Referring IP Addresses: {#referringipaddresses}<br>
External Backlinks: {#externalbacklinks}")
}

 

sample-majesticseo-scrape-001.ubot

 

Kevin

Link to post
Share on other sites

Thanks kevin. The page scrape worked perfect and was able to use if then statements so that if the variable didn't exists on the page it simlpy added 0 to the list.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...