Jump to content
UBot Underground

Not scraping the whole url?


Recommended Posts

Im scraping url's from google search.

 

Choose by attribute

outerhtml

<A class=l onmousedown="returnclk*</A>

wildcards

 

But this thing gives me the whole url, and I don't want the www infront of it. Anyone got any idea how to just scrape the url without the www?

Link to post
Share on other sites

UBot needs more string manipulation functions.

 

In the mean time, here's a solution:

 

execute this javascript before scraping the page:

var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++)
{
  links[i].href = links[i].href.replace("http://www.","http://");
}

:)

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...