Grey Hat 4 Posted June 5, 2015 Report Share Posted June 5, 2015 Hi, So, I've figured out how to exclude a bunch of characters and junk I don't want in my output using $replace regular expression. The issue is that I'm replacing a lot of junk code with a "," When I used replace it with nothing it mushed all the data together with no commas at all and that defeated my add to list. Now my data looks like this (sticking with the replace junk with comma idea): ,,results,,,,,query,,,baby,girl,clothes,,,search_types,,,,h Every time I run the bot it will have different data but the commas will always be the same. So I am attempting to accomplishing two things: 1) Come up with a way to trim the extra commas out leaving only one comma after each word2) Every time I run the bot it gives me the words "results, query, and search_types" over and over and I wish to remove them completely and leave all the other data. So I'm wondering with the second item if there is a way or specific generic code in Regex that you can use that says basically "Hey, when you see this word, exclude it and move on. Do that every time you see this word in the results as it repeats." I'm thinking this has to do with the Regex look-ahead command but at that point, I'm a noob to Regex and am stuck. I've used Rubular.com and Regexone.com and I still can't figure it out. Any help here would be greatly appreciated. Thank you! Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted June 5, 2015 Report Share Posted June 5, 2015 I think this is what you want.. set(#data, "This is example text I want to search and replace real cool like a champ.", "Global") set(#replace words, "text|search|cool", "Global") alert($replace regular expression(#data, #replace words, $new line)) comment("a fuction example") define $Replace with New Line(#input text, #New deliniter, #Items to replace) { return($replace regular expression(#input text, #Items to replace, #New deliniter)) } alert($Replace with New Line(#data, $new line, #replace words)) alert($Replace with New Line(#data, $new line, $text from list($list from text("text search cool", $new line), "|"))) comment("you can incorporate the \"items to replace: into the function so it can just take the list. so all you would need to supply is a list") If not then should help you in the right direction. CD Quote Link to post Share on other sites
Grey Hat 4 Posted June 5, 2015 Author Report Share Posted June 5, 2015 Hi CD, Thank you! I'm working with the code you provided above and getting very close. You're reply is very much appreciated and has me going in the right direction. Thank you again! Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 5, 2015 Report Share Posted June 5, 2015 If you can post an example of the text that you are applying this to it would be helpful and we can more than likely post a solution for you. Quote Link to post Share on other sites
Grey Hat 4 Posted June 5, 2015 Author Report Share Posted June 5, 2015 Thanks! So when I use the GET request the data comes in like this: {"results":[{"query":"baby girl clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","in Vintage<\/b>"]},{"query":"baby clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","inVintage<\/b>"]},{"query":"baby boy clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","inVintage<\/b>"]},{"query":"american girl doll clothes","search_types":["handmade"],"search_type_names":["in Handmade<\/b>"]},{"query":"clothes","search_types":["handmade"],"search_type_names":["in Handmade<\/b>"]},{"query":"dog clothes","search_types":["handmade"],"search_type_names":["in Handmade<\/b>"]},{"query":"hippie clothes","search_types":["handmade"],"search_type_names":["inHandmade<\/b>"]},{"query":"doll clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","inVintage<\/b>"]},{"query":"barbie clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","inVintage<\/b>"]},{"query":"kids clothes","search_types":["handmade","category_tags_vintage"],"search_type_names":["in Handmade<\/b>","inVintage<\/b>"]},{"query":"workout clothes","search_types":["handmade"],"search_type_names":["in Handmade<\/b>"]},{"query":"find shop names containing <\/span>clothes<\/span>","link":"\/search\/shops?search_query=clothes","search_types":[],"search_type_names":[]}],"count":12,"experiment":"off"} So, that's when I used Regex and used "\W" and replaced the misc characters with $nothing. That gets me 95% of the way . It's pretty cool, Regex is cool! Then I'm left with too many commas and some non-keyword words I want to remove (e.g., results, query, category_tags_vintage, stand alone letter "b"): ,,results,,,,,query,,,baby,girl,clothes,,,search_types,,,,handmade,,,category_tags_vintage,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,in,,b,Vintage,,,b,,,,,,,query,,,baby,clothes,,,search_types,,,,handmade,,,category_tags_vintage,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,in,,b,Vintage,,,b,,,,,,,query,,,baby,boy,clothes,,,search_types,,,,handmade,,,category_tags_vintage,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,in,,b,Vintage,,,b,,,,,,,query,,,american,girl,doll,clothes,,,search_types,,,,handmade,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,,,,query,,,clothes,,,search_types,,,,handmade,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,,,,query,,,dog,clothes,,,search_types,,,,handmade,,,,search_type_names,,,,in,,b,Handmade,,,b,,,,,,,query,,,hippie,clothes,,,search_ etc... So that's where I left off. The keywords will always change but the "junk" will be constant from the GET so I have to figure out the best possible filter. Maybe my initial Regex "\W" could have been more elaborate? That's where my lack of knowledge comes in. Thanks! Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted June 5, 2015 Report Share Posted June 5, 2015 I think this is what you want... set(#json, "\{\"results\":[\{\"query\":\"baby girl clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"in Vintage<\\/b>\"]\},\{\"query\":\"baby clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"inVintage<\\/b>\"]\},\{\"query\":\"baby boy clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"inVintage<\\/b>\"]\},\{\"query\":\"american girl doll clothes\",\"search_types\":[\"handmade\"],\"search_type_names\":[\"in Handmade<\\/b>\"]\},\{\"query\":\"clothes\",\"search_types\":[\"handmade\"],\"search_type_names\":[\"in Handmade<\\/b>\"]\},\{\"query\":\"dog clothes\",\"search_types\":[\"handmade\"],\"search_type_names\":[\"in Handmade<\\/b>\"]\},\{\"query\":\"hippie clothes\",\"search_types\":[\"handmade\"],\"search_type_names\":[\"inHandmade<\\/b>\"]\},\{\"query\":\"doll clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"inVintage<\\/b>\"]\},\{\"query\":\"barbie clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"inVintage<\\/b>\"]\},\{\"query\":\"kids clothes\",\"search_types\":[\"handmade\",\"category_tags_vintage\"],\"search_type_names\":[\"in Handmade<\\/b>\",\"inVintage<\\/b>\"]\},\{\"query\":\"workout clothes\",\"search_types\":[\"handmade\"],\"search_type_names\":[\"in Handmade<\\/b>\"]\},\{\"query\":\"find shop names containing <\\/span>clothes<\\/span>\",\"link\":\"\\/search\\/shops?search_query=clothes\",\"search_types\":[],\"search_type_names\":[]\}],\"count\":12,\"experiment\":\"off\"\} ", "Global")set(#find, $find regular expression(#json, "(?<=query\":\").*?(?=\",\")"), "Global") This is a JSON response from your get so ideally you want an JSON parser. there is a JSON plugin that is free to use but is not reliable. Alternatively, you can use ubot 5's python and use that or use the above regex. not sure if you want the last line but I think you can handle that. CD Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted June 5, 2015 Report Share Posted June 5, 2015 I will make an example of the U5 JSON parser in python here in a minute. Quote Link to post Share on other sites
Grey Hat 4 Posted June 5, 2015 Author Report Share Posted June 5, 2015 Thank you Code Docta! Amazing! I appreciate the time you've spent on this and solutions presented. Thank you again! Quote Link to post Share on other sites
Grey Hat 4 Posted June 6, 2015 Author Report Share Posted June 6, 2015 Okay, So almost there. Sorry to keep being a pest but anyone struggling with Regex is going to learn a lot from this thread. So here's my code: plugin command("SocketCommands.dll", "socket container") { plugin command("HTTP post.dll", "http max redirects", 15) set(#get,$plugin function("HTTP post.dll", "$http get", "https://etsy.com/suggestions_ajax.php?extras=\{\"autosuggest_expt\":\"off\",\"autosuggest_lang\":\"en-US\"\}&version=10_12672349415_2&search_query=clothes&search_type=all", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0", "etsy.com", "", ""),"Global")}set(#strip,$find regular expression(#get,"(?<=query\\\":\\\").*?(?=\\\",\\\")"),"Global")load html(#strip) And I used your Regex (which does the job 99.9%) The output is this: baby girl clothesbaby clothesbaby boy clothesetc...and then<\/span>clothes<\/span> The search terms will always be different so is there a way in Regex formula to take out that last <\/span> and <\/span>? Thanks Code Docta! 1 Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted June 6, 2015 Report Share Posted June 6, 2015 tada! set(#strip, $replace regular expression($find regular expression(#get, "(?<=query\\\":\\\").*?(?=\\\",\\\")"), "<\\\\/span>.*<\\\\/span>", $nothing), "Global")alert($replace regular expression("<\\/span>clothes<\\/span>", "<\\\\/span>.*<\\\\/span>", $nothing)) here is what I use to test in http://regexhero.net/tester/ my regex bible (?=ABC) - Positive lookahead. Matches a group after your main expression without including it in the result. (?!ABC) - Negative lookahead. Specifies a group that can not match after your main expression (ie. if it matches, the result is discarded). (?<=ABC) - Positive lookbehind. Matches a group before your main expression without including it in the result. (?<!ABC) - Negative lookbehind. Specifies a group that can not match before your main expression (ie. if it matches, the result is discarded). (?<=ABC).*?(?=ABC) - Extracts the text between specified groups. was found on forum long ago, cant remember from who Helloinsomnia has some great regex stuffs too CD 1 Quote Link to post Share on other sites
Grey Hat 4 Posted June 6, 2015 Author Report Share Posted June 6, 2015 Thank You!!! Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted June 6, 2015 Report Share Posted June 6, 2015 No Problem Glad I can help Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.