Alexpte 0 Posted March 24, 2010 Report Share Posted March 24, 2010 Hi, I'm trying to make a bot that works out how many different domains are linking to a particular site rather than the number of backlinks in total. I'm having a hard time trying to figure out how to scrape a URL from yahoo site explorer and then trim it to root (i.e. bbc.co.uk/sport would end up as bbc.co.uk/). I then need to remove any duplicates from the list - is there any way to do this? I've been searching for a solution for a while so sorry if this has already been posted. Thanks Quote Link to post Share on other sites
alcr 135 Posted March 24, 2010 Report Share Posted March 24, 2010 Exact duplicates? Loop through them and in each loop add to list -> $next list item "Remove duplicates? Yes" But a javascript eval function would be needed for that, I don't know any javascript though - so you'd have to wait for someone who does. Quote Link to post Share on other sites
Alexpte 0 Posted March 24, 2010 Author Report Share Posted March 24, 2010 Thanks, will try that. The main problem now is scraping just the domain name rather than the entire URL, any suggestions for how to do that? I've tried using wildcards and replace to try and strip the rest of the URL leaving just the domain name but can't get it to work unfortunately. Quote Link to post Share on other sites
bluegoat 24 Posted March 25, 2010 Report Share Posted March 25, 2010 Thanks, will try that. The main problem now is scraping just the domain name rather than the entire URL, any suggestions for how to do that? I've tried using wildcards and replace to try and strip the rest of the URL leaving just the domain name but can't get it to work unfortunately.Take a look at the fix I used on turbolapp's bot here: http://ubotstudio.com/forum/index.php?/topic/3147-keyword-ranking-bot/page__view__findpost__p__10399 You should be able to modify the javascript to do what you want, but I think you would need a list of every TLD since it only works for single dot TLDs (.com, .net, .info, .ca, etc) and not intl TLDs like .co.uk etc with multiple dots. Quote Link to post Share on other sites
webautomationlab 21 Posted March 26, 2010 Report Share Posted March 26, 2010 When I trim to root, I have done the following. http://www.domain.co.au/monkeyfaces.cfm First, change // to ::Then strip /.*Then change ::: to :// done. That will also work for http and https, because it doesn't touch that portion of the url if it exists. Then once you have your domains stripped down, I think ubot has a delete dupes function. Quote Link to post Share on other sites
Alexpte 0 Posted March 27, 2010 Author Report Share Posted March 27, 2010 Great, thanks for the replies. I'll give it a go and let you know how it goes. Quote Link to post Share on other sites
bluegoat 24 Posted March 28, 2010 Report Share Posted March 28, 2010 When I trim to root, I have done the following. http://www.domain.co.au/monkeyfaces.cfm First, change // to ::Then strip /.*Then change ::: to :// done. That will also work for http and https, because it doesn't touch that portion of the url if it exists. Then once you have your domains stripped down, I think ubot has a delete dupes function.Thanks webautomationlab, I just made a bot with you example... worked great. Here it is:trim to root.ubot 1 Quote Link to post Share on other sites
Abs* 12 Posted May 8, 2010 Report Share Posted May 8, 2010 Thanks webautomationlab, I just made a bot with you example... worked great. Here it is: Hi Bluegoat Ive looked over your code and have tried to implement it on a bot created and shared by Turbolapp - For the life of me i cant get it to work - the domain is a .co.uk so i need the coding you have used but for some reason it doesnt recognize it or im adding it incorrectly - not too sure. I have added the file below hoping that maybe you or someone else could take a look and then advise. Thanks alotgooglerankchecker.ubot Quote Link to post Share on other sites
Abs* 12 Posted May 10, 2010 Report Share Posted May 10, 2010 Can anyone help please thanks Quote Link to post Share on other sites
Abs* 12 Posted May 11, 2010 Report Share Posted May 11, 2010 I would appreciate any help guys - I honestly cant get my head around it thanks Quote Link to post Share on other sites
Abs* 12 Posted May 16, 2010 Report Share Posted May 16, 2010 Guys im still trying to get this to work and have tried playing with it on a number of occasions - I could really do with some help if anyone wouldnt mind taking out some time thanks abs Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.