mdc101 15 Posted February 16, 2012 Report Share Posted February 16, 2012 Hi Guys I have done a search using the variable #SearchString = "content nation" I added a dash between the word to make the search easier using a filter #SearchStringDashed = "content-nation" I have scraped a bunch of question urls. #QQulr: /Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation/Where-do-I-find-content-nation?q=content+nation/How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation/The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation...## Next I add them to the list "%QquestionUrls" What I am trying to achieve is to look through the first part of the question string before the "?q=content+nation" which is found at the end of the url string. It seems that the "?q=content+nation" is added to all results returned but I need to find the urls which hold the exact match of the variable #SearchStringDashed (content-nation) within the string before the "?q=content+nation" part. I have written the following but do not know how to ignore the "?q=content+nation" part of the string.Could someone assist me in refining my if statement to achive this. i am sure advanced Regex could do this. here is my code so far. set(#SearchString, "{$next list item(%QSeedKW)} ", "Global") set(#SearchString, $trim(#SearchString), "Global") set(#SearchStringDashed, $replace(#SearchString, " ", "-"), "Global") set(#SearchStringDashed, $trim(#SearchStringDashed), "Global") set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global") add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global") set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global") loop(#FirststLevelQ_total) { if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) { then { remove from list(%QquestionUrls, #FirststLevelQ) } else { } } if($not($contains($list item(%QquestionUrls, #FirststLevelQ), $find regular expression(#SearchStringDashed, "\\.*{#SearchStringDashed}.\\*")))) { then { remove from list(%QquestionUrls, #FirststLevelQ) } else { } } if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) { then { increment(#FirststLevelQ) } else { } } } I believe that if the "?q=content+nation" if ignored the statement will work Quote Link to post Share on other sites
ugakebet 4 Posted February 16, 2012 Report Share Posted February 16, 2012 can you post up your ubot here, its a bit complex to understand, why cant you use the $substring parameter to search for content-nation (it will match exactly in the urls saved) Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted February 16, 2012 Report Share Posted February 16, 2012 Try thisclear list(%list) add list to list(%list, $list from text("/Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation /Where-do-I-find-content-nation?q=content+nation /How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation /The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation", " "), "Delete", "Global") loop($list total(%list)) { set(#question, $replace($find regular expression($next list item(%list), "\\/\\w.+\\?"), "/", ""), "Global") add item to list(%question, #question, "Delete", "Global") load html($replace($replace($list from text(%question, " "), " ", "<br>"), "-", " ")) } Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 unfortunately I cant post the bot To simply the question I need to check a url If the url contains my keyword (content-nation) I keep it else I delete it if it does not (remove from list). The challenge I have is that each url has the keyword added as ?q=content+nation maybe this is not important. However my if statement is not working. How would one do a if statement that looks for * content-nation * with wildcards on each side?I gues that is a simplified questions within the urls. Out of this data set only one url qualifies. See number 2, this would be the only one to keep and the rest are to be deleted. All I want is the urls with "content-nation" in the string but am not able to get this to work.I am assuming my code is incorrect. 1) /Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation 2) /Where-do-I-find-content-nation?q=content+nation 3) /How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation 4) /The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 Hi KreatusThanks for the code. I am not trying to remove the ?q=content+nation as it is reqired to load the page when the url is called. I am trying to find my keyword in the full url string /Where-do-I-find-content-nation?q=content+nation If the keyword is there I want to keep it else if it does not exist as the phrase "content-nation" then I will remove from list Thanks again for the suggestion Quote Link to post Share on other sites
ugakebet 4 Posted February 16, 2012 Report Share Posted February 16, 2012 in that case, why cant u do if($contains($list item(%QquestionUrls, #FirststLevelQ), "content-nation")) {then { }else {remove from list(%QquestionUrls, #FirststLevelQ)} and loop through this Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 Hi gakebet Your suggestion workedHow do you manage the loop because the list is shrinking every time a row is removed?I get exceeded range... after a few runs set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global") add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global") set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global") set(#FirststLevelQ, 0, "Global") loop(#FirststLevelQ_total) { if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) { then { remove from list(%QquestionUrls, #FirststLevelQ) } else { } } if($contains($list item(%QquestionUrls, #FirststLevelQ), #SearchStringDashed)) { then { } else { remove from list(%QquestionUrls, #FirststLevelQ) } } if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) { then { increment(#FirststLevelQ) } else { } } } Quote Link to post Share on other sites
JohnB 255 Posted February 16, 2012 Report Share Posted February 16, 2012 The best way to mange the list and avid the error is to set a variable to 0 before the loop. use list item (with the position of the variable you set) when you REMOVE an item from the list (remove lit item at position (variable) you need to decrement the variable and then at the bottom of the loop increment it again. This may sound weird but you need to think if it this way: If you remove, say, position 5, then position six BECOMES position 5. If you don't decrement to position 4 and then increment to position 5 you will increment to position 6 and miss the list item that used to be position 6... Make sense? lol John Another way is to iterate through the list backwards (from the end), but I don't do it that way so I am not going to provide an example (since I have never tested it...it was suggested to me by Eddie way back) Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 Thanks,Am running the code and all of a sudden the add list to list is not working when I press run.It works when I press run node but not when I run the script??? Has this happened to anyone? Quote Link to post Share on other sites
JohnB 255 Posted February 16, 2012 Report Share Posted February 16, 2012 No...is it the code listed above? or has it changed? John Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 Hmm watching the code run it seems to hop over the add list to list which then causes the exceed error. here is the code comment("Clear all lists") clear list(%QquestionList) clear list(%QquestionUrls) clear list(%QSeedKW) clear list(%QISDNA_categories) clear list(%QISDNAList) clear list(%QTopAnswersBy) clear list(%QISDNAList) clear list(%QISDNAListViewed) clear list(%QISDNAmonitor) clear list(%QISDNAQuestionFollowers) clear list(%QISDNAQuestionStatFolloweringQuestion) comment("Define variables - set lookup KW and list positions") add list to list(%QSeedKW, $list from text(#QSeedKW, $new line), "Delete", "Global") set list position(%QSeedKW, 0) set list position(%QquestionUrls, 0) comment("start url collection process") loop($list total(%QSeedKW)) { comment("Take seed keywords and build up list of urls") set(#SearchString, "{$next list item(%QSeedKW)} ", "Global") set(#SearchString, $trim(#SearchString), "Global") set(#SearchStringDashed, $replace(#SearchString, " ", "-"), "Global") set(#SearchStringDashed, $trim(#SearchStringDashed), "Global") wait for browser event("Everything Loaded", "") click(<id=w"__*_*_input">, "Left Click", "No") change attribute(<id=w"__*_*_input">, "value", "") type text(<id=w"__*_*_input">, #SearchString, "Standard") wait for browser event("Everything Loaded", "") wait for element(<class="lil_action_button submit_button col">, "", "Appear") wait(1) comment("grab urls for questions") set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global") wait for element(<class="lil_action_button submit_button col">, "", "Appear") wait(1)comment("Start ---------- not setting list") add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global") set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global") set(#FirststLevelQ, 0, "Global")comment("End ---------- not setting list") loop(#FirststLevelQ_total) { if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) { then { remove from list(%QquestionUrls, #FirststLevelQ) } else { } } if($contains($list item(%QquestionUrls, #FirststLevelQ), #SearchStringDashed)) { then { } else { remove from list(%QquestionUrls, #FirststLevelQ) decrement(#FirststLevelQ) } } if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) { then { increment(#FirststLevelQ) } else { } } } if($comparison($list position(%QquestionUrls), "=", 0)) { then { save to file("{$special folder("Desktop")}\\{#DWSProjectFolder}\\{#ProjectFolder}\\ISDNA-Q-questions\\{#SearchString}-Q-questions-do-not-exist-{#profileid}.txt", "Big Opportunity! No questions have been found for the phrase \"{#SearchString}\". See training on what to do next!") } else { } } Quote Link to post Share on other sites
mdc101 15 Posted February 16, 2012 Author Report Share Posted February 16, 2012 Is there a way to do a check for the different variations of a keyword with cases?at the momoment I have the following challenges 1)add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global") is not adding the scraped data to a list when i press run. It only adds when I press Run Node 2) I am still gettig the exceed range of list even after applying Johns suggestion. 3) What I have noticed is if the keyword is in the list but has CAPS or Proper case or one of the words has a capital letter, the row is removed from the list as it does not match the keyword "curation-nation" exactly.It should look at the following versions "curation-nation", "Curation-nation" "Curation-Nation" "curation-Nation" How is this done? Quote Link to post Share on other sites
JohnB 255 Posted February 17, 2012 Report Share Posted February 17, 2012 Maybe it's just me, but you are using a variable to add content to your list but I don't see where that variable exists anywhere in that code. It's loading nothing because there's nothing to load...unless you've left out some of the code. John Quote Link to post Share on other sites
mdc101 15 Posted February 17, 2012 Author Report Share Posted February 17, 2012 Hi John Here is the code that grabs the data and is supposed to set it #QQulr is the scrape add list to list(%QquestionUrls... is the part that adds the data to the list.This block is in the middle of the code. wait(1)comment("grab urls for questions")set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global")wait for element(<class="lil_action_button submit_button col">, "", "Appear")wait(1)comment("Start ---------- not setting list") add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global")set(#FirststLevelQ, 0, "Global")comment("End ---------- not setting list") Quote Link to post Share on other sites
mdc101 15 Posted February 17, 2012 Author Report Share Posted February 17, 2012 Any ideas guys? Quote Link to post Share on other sites
JohnB 255 Posted February 17, 2012 Report Share Posted February 17, 2012 Well I am still struggling with your code... I have loaded all your code into UBot. Take a look at this short video and see if you see what I do. You are trying to add a list to list using a variable that has not yet been set. There is no way your list will contain anything when the variable doesn't exist yet... http://screencast.com/t/n1VwjLi9q John Quote Link to post Share on other sites
mdc101 15 Posted February 21, 2012 Author Report Share Posted February 21, 2012 Hi John I got the issue fixed. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.