okuma31 0 Posted March 18, 2013 Report Share Posted March 18, 2013 (edited) I have a list of 100k+ urls in a text file and what I'd like to do is pull a URL from that txt file using a specific string (the root url). I have likely 15-20 urls matching the root but would like to select one at random. I'm new at this, but the only thing I've come up with thus far is to run them through a text field until i found one, but obviously this would destroy any speed the program has. Does anyone have any ideas on how to do this, I'm pulling my hair out trying to figure it out. Edited March 18, 2013 by okuma31 Quote Link to post Share on other sites
bestmacros 60 Posted March 18, 2013 Report Share Posted March 18, 2013 loop throw whole list, compare and create new list with links you need, then use new link for your activity. list with 100k might be too big to handle so split it to lists with 50k links each Quote Link to post Share on other sites
okuma31 0 Posted March 18, 2013 Author Report Share Posted March 18, 2013 loop throw whole list, compare and create new list with links you need, then use new link for your activity. list with 100k might be too big to handle so split it to lists with 50k links eachThat's what I was planning on doing, but I see two issues with this 1) Lets say it takes 15 seconds to find the url, that's going to kill my links per minute.2) In theory it will always select the first target because that's the first link it will uncover. Quote Link to post Share on other sites
wilriv21 16 Posted March 18, 2013 Report Share Posted March 18, 2013 (edited) Okuma have you used the variable function $common list items? This variable function returns a new list containing the common items between the first list (root url) and second list (100+K urls text file). Possibly using this UBot variable function can help with efficiency? Once you have the new list containing common items you can then select randomly. Edited March 18, 2013 by wilriv21 1 Quote Link to post Share on other sites
okuma31 0 Posted March 18, 2013 Author Report Share Posted March 18, 2013 Okuma have you used the variable function $common list items? This variable function returns a new list containing the common items between the first list (100+K urls text file) and the second list (root url). Possibly using this UBot variable function can help with efficiency? Once you have the new list containing common items you can then select randomly.This is a great idea, I'm playing with this now, but it appears to only scrape exact match urls. As an example The item I'll use is a root domain http://aol.com/ Now my list of urls is filled with all kinds of crazy stuff, but the specific 15 or so I'm looking for contain extra strings on the root such as... http://aol.com/311234http://aol.com/33424http://aol.com/76876http://aol.com/12321http://aol.com/8978978 Regardless, I feel like this is getting closer, if this ends up working be sure and send your paypal and I'll send you a fiver for a beer on me. =] Quote Link to post Share on other sites
wilriv21 16 Posted March 18, 2013 Report Share Posted March 18, 2013 I edited the first list to be the much smaller root urls and the second list to be the much larger 100+ K. This change should make the process more efficient. Quote Link to post Share on other sites
okuma31 0 Posted March 18, 2013 Author Report Share Posted March 18, 2013 clear list(%root domain urls) clear list(%master url list) add list to list(%master url list, $list from file("C:\\Users\\Tdub\\Desktop\\extracted from thoughthappiness.txt"), "Delete", "Global") add item to list(%root domain urls, "http://7minutegarden.com/", "Delete", "Global") add list to list(%final output list, $common list items(%root domain urls, %master url list), "Delete", "Global") save to file("C:\\Users\\Tdub\\Desktop\\the final output list.txt", %final output list) The root domain list contains one url, as you can see in the above code, but when I open the final output list.txt it contains 0 items. But when I add in an exact match url from both lists then it will cotain the single matching url, it's a bit bizarre seeing as how in my mind the root domain would be contained in all of the items in the master list. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.