zharfan 0 Posted March 23, 2015 Report Share Posted March 23, 2015 Hello everybody, i have a link like this https://www.blogger.com/profile/03248869042676796945 , and then there are many link in that link and how to scrape all link in there with HTTP Post Plugin Aymen ?? Thanks Please Help me Quote Link to post Share on other sites
deliter 203 Posted March 23, 2015 Report Share Posted March 23, 2015 set(#aa,$replace regular expression($find regular expression($find regular expression($document text,"dir=\"ltr\"><a\\shref=\".+?\""),"href=\".+?\""),"href=\"|\"",""),"Global") my regex sucks bad,but you might learn from it since it is so basic ha replace $document text with your http get node Quote Link to post Share on other sites
zharfan 0 Posted March 23, 2015 Author Report Share Posted March 23, 2015 set(#aa,$replace regular expression($find regular expression($find regular expression($document text,"dir=\"ltr\"><a\\shref=\".+?\""),"href=\".+?\""),"href=\"|\"",""),"Global") my regex sucks bad,but you might learn from it since it is so basic ha replace $document text with your http get node thanks for your help and i dont understand what is http get node? Quote Link to post Share on other sites
deliter 203 Posted March 23, 2015 Report Share Posted March 23, 2015 load the code above in ubot open it up and remove the box document text,go to your toolbox,type in http get,and drag the http get to where the document text was edit here is the full code below set(#aa,$replace regular expression($find regular expression($find regular expression($plugin function("HTTP post.dll", "$http get", "https://www.blogger.com/profile/03248869042676796945", "", "", "", ""),"dir=\"ltr\"><a\\shref=\".+?\""),"href=\".+?\""),"href=\"|\"",""),"Global") Quote Link to post Share on other sites
zharfan 0 Posted March 23, 2015 Author Report Share Posted March 23, 2015 load the code above in ubot open it up and remove the box document text,go to your toolbox,type in http get,and drag the http get to where the document text was edit here is the full code below set(#aa,$replace regular expression($find regular expression($find regular expression($plugin function("HTTP post.dll", "$http get", "https://www.blogger.com/profile/03248869042676796945", "", "", "", ""),"dir=\"ltr\"><a\\shref=\".+?\""),"href=\".+?\""),"href=\"|\"",""),"Global")ok i will try it.. and 1 question , i have this script clear list(%Link Blogspot Profile)ui block text("Keyword", #Keyword)ui stat monitor("Total Link Profile", $list total(%Link Blogspot Profile))add list to list(%Keyword, $list from text(#Keyword, $new line), "Delete", "Global")ui save file("Save File Location (*.txt)", #Save File Location)clear list(%Increment Page)ui drop down("Footprint Blog", "site:blogger.com/profile,site:blogger.com/profile admin,site:blogger.com/profile keyword", #Footprint Blog)add list to list(%Increment Page, $list from text("0102030405060708090100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450460470480490500510520530540550", $new line), "Delete", "Global")loop while($comparison($list total(%Keyword), ">", 0)) { set(#google_results, $plugin function("HTTP post.dll", "$http get", "https://www.google.com/search?q={#Footprint Blog} \"{#Keyword}\"&start={$list item(%Increment Page, 0)}", $plugin function("HTTP post.dll", "$http useragent string", "Firefox 27.0 Win7 64-bit"), "", "", ""), "Global") add list to list(%Link Blogspot Profile, $plugin function("HTTP post.dll", "$xpath parser", #google_results, "//div//h3//a[contains(@onmousedown,\'rwt\')]", "href", "HTML"), "Delete", "Global") remove from list(%Increment Page, 0) set(#google_results, $nothing, "Global") if($comparison($list total(%Increment Page), "=", 0)) { then { add list to list(%Increment Page, $list from text("0102030405060708090100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450460470480490500510520530540550", $new line), "Delete", "Global") } else { } } save to file(#Save File Location, %Link Blogspot Profile)} I want to scrape link in all page in google but the script didn't work when the %increment page greater than the start=40 , any solution? Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.