Jump to content
UBot Underground

exceding the range of the list problem...


Recommended Posts

i cant see where the problem is..

 

basically i created a variable called #itempos and i increment it everytime i want to go to the next list item, so everytime it saves one article, it can go to the next url and so on..

 

I am still a real beginner in ubot and this is just a part of my learning experience, but i have some programming experience in vb.net and c++ (basic stuff) but i know the logics well.

there are some wait statements randomly there but that doesnt even matter..

 

a really strange thing is that it is giving errors when #itempos is at 15 or so... ( i tested with 3 kws so it was 30 urls to scrape (10 results each url x 3 urls)

 

anyways here is my code. If anybody knows why it is doing this please help me if you can..

navigate("ezinearticles.com", "Wait")
clear list(%search_terms)
ui open file("Keyword list", #kws)
add list to list(%search_terms, $list from file(#kws), "Delete", "Global")
set(#itempos, $list position(%search_terms), "Global")
wait(1)
wait(5)
clear list(%urls)
loop($list total(%search_terms)) {
    type text(<id="gcse-head-input">, $list item(%search_terms, #itempos), "Standard")
    increment(#itempos)
    click(<name="sa">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(1)
    add list to list(%urls, $scrape attribute(<data-ctorig=w"*">, "fullhref"), "Delete", "Global")
}
set(#itempos, 0, "Global")
loop($list total(%urls)) {
    navigate($list item(%urls, #itempos), "Wait")
    increment(#itempos)
    wait(1)
    clear list(%articles)
    add list to list(%articles, $scrape attribute(<id="article-content">, "innertext"), "Delete", "Global")
    save to file("C:\\Users\\user\\Desktop\\ubot\\articles\\{$random text(10)}.txt", $list item(%articles, 0))
}
Edited by KardoseR
Link to post
Share on other sites

See if this helps it uses next list item because you don't really need to use #itempos and if you do then start it at 0.

 

By the way Ezinearticles blocks bots now so if you are trying to scrape from there you may run into trouble pretty quickly.

navigate("ezinearticles.com", "Wait")
clear list(%search_terms)
ui open file("Keyword list", #kws)
add list to list(%search_terms, $list from file(#kws), "Delete", "Global")
wait(1)
wait(5)
clear list(%urls)
loop($list total(%search_terms)) {
    type text(<id="gcse-head-input">, $next list item(%search_terms), "Standard")
    click(<name="sa">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(1)
    add list to list(%urls, $scrape attribute(<data-ctorig=w"*">, "fullhref"), "Delete", "Global")
}
loop($list total(%urls)) {
    navigate($next list item(%urls), "Wait")
    wait(1)
    clear list(%articles)
    add list to list(%articles, $scrape attribute(<id="article-content">, "innertext"), "Delete", "Global")
    save to file("C:\\Users\\user\\Desktop\\ubot\\articles\\{$random text(10)}.txt", $list item(%articles, 0))
}

Link to post
Share on other sites

 

See if this helps it uses next list item because you don't really need to use #itempos and if you do then start it at 0.

 

By the way Ezinearticles blocks bots now so if you are trying to scrape from there you may run into trouble pretty quickly.

navigate("ezinearticles.com", "Wait")
clear list(%search_terms)
ui open file("Keyword list", #kws)
add list to list(%search_terms, $list from file(#kws), "Delete", "Global")
wait(1)
wait(5)
clear list(%urls)
loop($list total(%search_terms)) {
    type text(<id="gcse-head-input">, $next list item(%search_terms), "Standard")
    click(<name="sa">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(1)
    add list to list(%urls, $scrape attribute(<data-ctorig=w"*">, "fullhref"), "Delete", "Global")
}
loop($list total(%urls)) {
    navigate($next list item(%urls), "Wait")
    wait(1)
    clear list(%articles)
    add list to list(%articles, $scrape attribute(<id="article-content">, "innertext"), "Delete", "Global")
    save to file("C:\\Users\\user\\Desktop\\ubot\\articles\\{$random text(10)}.txt", $list item(%articles, 0))
}

thanks for taking your time bro, but it still doesnt work and exceeds the list

it goes untill the article 15 ( i counted how many files were in the folder)

 

any idea?

 

as for the bot protection they have i dont care, its just a learning exercise :D

Edited by KardoseR
Link to post
Share on other sites

I haven't actually run into the bot detection now, maybe its gone now :D

Try this:

navigate("ezinearticles.com", "Wait")
clear list(%search_terms)
ui open file("Keyword list", #kws)
add list to list(%search_terms, $list from file(#kws), "Delete", "Global")
wait(1)
wait(5)
clear list(%urls)
loop($list total(%search_terms)) {
    type text(<id="gcse-head-input">, $next list item(%search_terms), "Standard")
    click(<name="sa">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(1)
    add list to list(%urls, $scrape attribute(<class="gs-per-result-labels">, "url"), "Delete", "Global")
}
loop($list total(%urls)) {
    comment("set this variable so we can use it over and over")
    set(#next_url, $next list item(%urls), "Global")
    comment("make sure there is actually a url there")
    if($comparison($trim(#next_url), "!=", $nothing)) {
        then {
            comment("make sure the url isnt a category")
            if($contains(#next_url, "?cat=")) {
                then {
                }
                else {
                    navigate(#next_url, "Wait")
                    wait(1)
                    clear list(%articles)
                    add list to list(%articles, $scrape attribute(<id="article-content">, "innertext"), "Delete", "Global")
                    comment("only try to save if there is something to save")
                    if($comparison($list total(%articles), ">", 0)) {
                        then {
                            save to file("C:\\Users\\Nick\\Desktop\\TEST\\{$random text(10)}.txt", $list item(%articles, 0))
                        }
                        else {
                        }
                    }
                }
            }
        }
        else {
        }
    }
}

Link to post
Share on other sites

 

I haven't actually run into the bot detection now, maybe its gone now :D

 

Try this:

navigate("ezinearticles.com", "Wait")
clear list(%search_terms)
ui open file("Keyword list", #kws)
add list to list(%search_terms, $list from file(#kws), "Delete", "Global")
wait(1)
wait(5)
clear list(%urls)
loop($list total(%search_terms)) {
    type text(<id="gcse-head-input">, $next list item(%search_terms), "Standard")
    click(<name="sa">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(1)
    add list to list(%urls, $scrape attribute(<class="gs-per-result-labels">, "url"), "Delete", "Global")
}
loop($list total(%urls)) {
    comment("set this variable so we can use it over and over")
    set(#next_url, $next list item(%urls), "Global")
    comment("make sure there is actually a url there")
    if($comparison($trim(#next_url), "!=", $nothing)) {
        then {
            comment("make sure the url isnt a category")
            if($contains(#next_url, "?cat=")) {
                then {
                }
                else {
                    navigate(#next_url, "Wait")
                    wait(1)
                    clear list(%articles)
                    add list to list(%articles, $scrape attribute(<id="article-content">, "innertext"), "Delete", "Global")
                    comment("only try to save if there is something to save")
                    if($comparison($list total(%articles), ">", 0)) {
                        then {
                            save to file("C:\\Users\\Nick\\Desktop\\TEST\\{$random text(10)}.txt", $list item(%articles, 0))
                        }
                        else {
                        }
                    }
                }
            }
        }
        else {
        }
    }
}

wow you are fast

 

this worked really well, even though it has more advanced things that i can handle now i will study your code.

Many thanks HelloInsomnia. Have a good day man.

 

 

I haven't actually run into the bot detection now, maybe its gone now  :D

 

yeah i didnt run into it also. but even if there is its nothing impossible to bypass :D

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...