Jump to content
UBot Underground

Expert regex help needed


Recommended Posts

Hi Guys

 

I have done a search using the variable

 

#SearchString = "content nation"

 

I added a dash between the word to make the search easier using a filter

 

#SearchStringDashed = "content-nation"

 

I have scraped a bunch of question urls.

 

#QQulr:

 

/Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation

/Where-do-I-find-content-nation?q=content+nation

/How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation

/The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation

...

#

#

 

Next I add them to the list "%QquestionUrls"

 

What I am trying to achieve is to look through the first part of the question string before the "?q=content+nation" which is found at the end of the url string. It seems that the "?q=content+nation" is added to all results returned but I need to find the urls which hold the exact match of the variable #SearchStringDashed (content-nation) within the string before the "?q=content+nation" part.

 

I have written the following but do not know how to ignore the "?q=content+nation" part of the string.

Could someone assist me in refining my if statement to achive this.

 

i am sure advanced Regex could do this.

 

here is my code so far.

 

set(#SearchString, "{$next list item(%QSeedKW)} ", "Global")

set(#SearchString, $trim(#SearchString), "Global")

set(#SearchStringDashed, $replace(#SearchString, " ", "-"), "Global")

set(#SearchStringDashed, $trim(#SearchStringDashed), "Global")

 

set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global")

add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")

set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global")

loop(#FirststLevelQ_total) {

if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) {

then {

remove from list(%QquestionUrls, #FirststLevelQ)

}

else {

}

}

if($not($contains($list item(%QquestionUrls, #FirststLevelQ), $find regular expression(#SearchStringDashed, "\\.*{#SearchStringDashed}.\\*")))) {

then {

remove from list(%QquestionUrls, #FirststLevelQ)

}

else {

}

}

if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) {

then {

increment(#FirststLevelQ)

}

else {

}

}

}

 

I believe that if the "?q=content+nation" if ignored the statement will work

Link to post
Share on other sites

Try this

clear list(%list)
add list to list(%list, $list from text("/Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation
/Where-do-I-find-content-nation?q=content+nation
/How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation
/The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation", "
"), "Delete", "Global")
loop($list total(%list)) {
   set(#question, $replace($find regular expression($next list item(%list), "\\/\\w.+\\?"), "/", ""), "Global")
   add item to list(%question, #question, "Delete", "Global")
   load html($replace($replace($list from text(%question, "
"), "
", "<br>"), "-", " "))
}

Link to post
Share on other sites

unfortunately I cant post the bot

 

To simply the question I need to check a url

 

If the url contains my keyword (content-nation) I keep it else I delete it if it does not (remove from list).

 

The challenge I have is that each url has the keyword added as ?q=content+nation maybe this is not important.

 

However my if statement is not working.

 

How would one do a if statement that looks for * content-nation * with wildcards on each side?

I gues that is a simplified questions within the urls.

Out of this data set only one url qualifies.

See number 2, this would be the only one to keep and the rest are to be deleted.

 

All I want is the urls with "content-nation" in the string but am not able to get this to work.

I am assuming my code is incorrect.

 

1) /Why-has-Google-trouble-indexing-foreign-language-content-contained-at-a-national-top-level-domain-What-are-the-solutions?q=content+nation

 

2) /Where-do-I-find-content-nation?q=content+nation

 

3) /How-do-hyperlocal-sites-(e.g.-Craigslist)-architect-their-technology-to-support-hyperlocalities-as-well-as-sitewide-nation-wide-global-content?q=content+nation

 

4) /The-École-Nationale-dAdministration-does-not-provide-its-content-to-the-public-is-this-hindering-the-democratic-process-in-France?q=content+nation

Link to post
Share on other sites

Hi Kreatus

Thanks for the code.

 

I am not trying to remove the ?q=content+nation as it is reqired to load the page when the url is called.

 

I am trying to find my keyword in the full url string

 

/Where-do-I-find-content-nation?q=content+nation

 

If the keyword is there I want to keep it else if it does not exist as the phrase "content-nation" then I will remove from list

 

Thanks again for the suggestion

Link to post
Share on other sites

in that case, why cant u do

 

 

if($contains($list item(%QquestionUrls, #FirststLevelQ), "content-nation")) {

then {

 

}

else {

remove from list(%QquestionUrls, #FirststLevelQ)

}

 

and loop through this

Link to post
Share on other sites

Hi gakebet

 

Your suggestion worked

How do you manage the loop because the list is shrinking every time a row is removed?

I get exceeded range... after a few runs

 

set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global")

add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")

set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global")

set(#FirststLevelQ, 0, "Global")

loop(#FirststLevelQ_total) {

if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) {

then {

remove from list(%QquestionUrls, #FirststLevelQ)

}

else {

}

}

if($contains($list item(%QquestionUrls, #FirststLevelQ), #SearchStringDashed)) {

then {

}

else {

remove from list(%QquestionUrls, #FirststLevelQ)

}

}

if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) {

then {

increment(#FirststLevelQ)

}

else {

}

}

}

Link to post
Share on other sites

The best way to mange the list and avid the error is to set a variable to 0 before the loop.

 

use list item (with the position of the variable you set)

 

when you REMOVE an item from the list (remove lit item at position (variable) you need to decrement the variable and then at the bottom of the loop increment it again.

 

This may sound weird but you need to think if it this way:

 

If you remove, say, position 5, then position six BECOMES position 5. If you don't decrement to position 4 and then increment to position 5 you will increment to position 6 and miss the list item that used to be position 6...

 

Make sense? lol

 

 

John

 

 

Another way is to iterate through the list backwards (from the end), but I don't do it that way so I am not going to provide an example (since I have never tested it...it was suggested to me by Eddie way back)

Link to post
Share on other sites

Thanks,

Am running the code and all of a sudden the add list to list is not working when I press run.

It works when I press run node but not when I run the script???

 

Has this happened to anyone?

Link to post
Share on other sites

Hmm watching the code run it seems to hop over the add list to list which then causes the exceed error.

 

here is the code

 

comment("Clear all lists")

clear list(%QquestionList)

clear list(%QquestionUrls)

clear list(%QSeedKW)

clear list(%QISDNA_categories)

clear list(%QISDNAList)

clear list(%QTopAnswersBy)

clear list(%QISDNAList)

clear list(%QISDNAListViewed)

clear list(%QISDNAmonitor)

clear list(%QISDNAQuestionFollowers)

clear list(%QISDNAQuestionStatFolloweringQuestion)

comment("Define variables - set lookup KW and list positions")

add list to list(%QSeedKW, $list from text(#QSeedKW, $new line), "Delete", "Global")

set list position(%QSeedKW, 0)

set list position(%QquestionUrls, 0)

comment("start url collection process")

loop($list total(%QSeedKW)) {

comment("Take seed keywords and build up list of urls")

set(#SearchString, "{$next list item(%QSeedKW)} ", "Global")

set(#SearchString, $trim(#SearchString), "Global")

set(#SearchStringDashed, $replace(#SearchString, " ", "-"), "Global")

set(#SearchStringDashed, $trim(#SearchStringDashed), "Global")

wait for browser event("Everything Loaded", "")

click(<id=w"__*_*_input">, "Left Click", "No")

change attribute(<id=w"__*_*_input">, "value", "")

type text(<id=w"__*_*_input">, #SearchString, "Standard")

wait for browser event("Everything Loaded", "")

wait for element(<class="lil_action_button submit_button col">, "", "Appear")

wait(1)

comment("grab urls for questions")

set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global")

wait for element(<class="lil_action_button submit_button col">, "", "Appear")

wait(1)

comment("Start ---------- not setting list")

 

add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")

set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global")

set(#FirststLevelQ, 0, "Global")

comment("End ---------- not setting list")

 

 

loop(#FirststLevelQ_total) {

if($contains($list item(%QquestionUrls, #FirststLevelQ), "#")) {

then {

remove from list(%QquestionUrls, #FirststLevelQ)

}

else {

}

}

if($contains($list item(%QquestionUrls, #FirststLevelQ), #SearchStringDashed)) {

then {

}

else {

remove from list(%QquestionUrls, #FirststLevelQ)

decrement(#FirststLevelQ)

}

}

if($comparison(#FirststLevelQ, "<", #FirststLevelQ_total)) {

then {

increment(#FirststLevelQ)

}

else {

}

}

}

if($comparison($list position(%QquestionUrls), "=", 0)) {

then {

save to file("{$special folder("Desktop")}\\{#DWSProjectFolder}\\{#ProjectFolder}\\ISDNA-Q-questions\\{#SearchString}-Q-questions-do-not-exist-{#profileid}.txt", "Big Opportunity! No questions have been found for the phrase \"{#SearchString}\". See training on what to do next!")

}

else {

}

}

Link to post
Share on other sites

Is there a way to do a check for the different variations of a keyword with cases?

at the momoment I have the following challenges

 

1)add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")

is not adding the scraped data to a list when i press run. It only adds when I press Run Node

 

2) I am still gettig the exceed range of list even after applying Johns suggestion.

 

3) What I have noticed is if the keyword is in the list but has CAPS or Proper case or one of the words has a capital letter, the row is removed from the list as it does not match the keyword "curation-nation" exactly.

It should look at the following versions "curation-nation", "Curation-nation" "Curation-Nation" "curation-Nation"

 

How is this done?

Link to post
Share on other sites

Maybe it's just me, but you are using a variable to add content to your list but I don't see where that variable exists anywhere in that code. It's loading nothing because there's nothing to load...unless you've left out some of the code.

 

 

John

Link to post
Share on other sites

Hi John

 

Here is the code that grabs the data and is supposed to set it

 

#QQulr is the scrape

 

add list to list(%QquestionUrls... is the part that adds the data to the list.

This block is in the middle of the code.

 

 

 

wait(1)

comment("grab urls for questions")

set(#QQulr, $scrape attribute(<class="result_item">, "href"), "Global")

wait for element(<class="lil_action_button submit_button col">, "", "Appear")

wait(1)

comment("Start ---------- not setting list")

 

add list to list(%QquestionUrls, $list from text(#QQulr, $new line), "Delete", "Global")

set(#FirststLevelQ_total, $list total(%QquestionUrls), "Global")

set(#FirststLevelQ, 0, "Global")

comment("End ---------- not setting list")

Link to post
Share on other sites

Well I am still struggling with your code... I have loaded all your code into UBot. Take a look at this short video and see if you see what I do. You are trying to add a list to list using a variable that has not yet been set. There is no way your list will contain anything when the variable doesn't exist yet...

 

http://screencast.com/t/n1VwjLi9q

 

 

 

John

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...