Brutal 164 Posted August 29, 2014 Report Share Posted August 29, 2014 Not sure how to go about this... I have a #var filled with text info1info2info3info-i-don't-wantinfo1info2info3info-i-don't-wantinfo1info2info3info-i-don't-want The problem is that there is no way for me to know in advance if that unwanted line of text will be in the #var or not, and if it is there, I have no way to know what the actual text might be. I was looking for a regex solution to basically delete all lines after info3 but end before info1 but couldn't find anything that would work for me Any ideas? Am I over complicating this? Quote Link to post Share on other sites
arunner26 51 Posted August 29, 2014 Report Share Posted August 29, 2014 Not sure how to go about this... I have a #var filled with text info1info2info3info-i-don't-wantinfo1info2info3info-i-don't-wantinfo1info2info3info-i-don't-want The problem is that there is no way for me to know in advance if that unwanted line of text will be in the #var or not, and if it is there, I have no way to know what the actual text might be. I was looking for a regex solution to basically delete all lines after info3 but end before info1 but couldn't find anything that would work for me Any ideas? Am I over complicating this? Software, I'm not too clear on what information you have that you can zero in on to test. It also seems like your info is in a list and not a var. Given your example: You could use a $substring function to get position 5 of each row and if position 5 of the row is not 1-3 then delete the row. Hope that helps. Andy (Arunner26) Quote Link to post Share on other sites
stanf 43 Posted August 29, 2014 Report Share Posted August 29, 2014 if i understand correctlythis is what you got---------------------------------------------------------------------------------------------info1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo4 -i-don't-wantinfo5 -i-don't-wantinfo6 -i-don't-wantinfo1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo4 -i-don't-wantinfo5 -i-don't-wantinfo6 -i-don't-wantinfo7 -i-don't-want555info8 -i-don't-want4444info5 -i-don't-wantyyyinfo6 -i-don't-want4444info1 555info2 rwrjwjwje w ekhwb6666info3 kke nfiejrrjoeerkrkr88888-----------------------------------------------------------------------------------this what you want info1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo1 555info2 rwrjwjwje w ekhwb6666info3 kke nfiejrrjoeerkrkr88888-----------------------------------------------------------------use find regexinfo[123].* Quote Link to post Share on other sites
Brutal 164 Posted August 29, 2014 Author Report Share Posted August 29, 2014 Software, I'm not too clear on what information you have that you can zero in on to test. It also seems like your info is in a list and not a var. Given your example: You could use a $substring function to get position 5 of each row and if position 5 of the row is not 1-3 then delete the row. Hope that helps. Andy (Arunner26)Thanks for jumping in Andy - No, it's not in a list, it's in a variable at that point.... I need to clear out the unwanted stuff before i add it into a list because the list isnt a one-per-line type of list. Quote Link to post Share on other sites
Brutal 164 Posted August 29, 2014 Author Report Share Posted August 29, 2014 if i understand correctlythis is what you got---------------------------------------------------------------------------------------------info1 eferojeerjrmolwwkr info2 rwrjwjwje w ekhwb info3 kke nfiejrrjoeerkrkr info4 -i-don't-want info5 -i-don't-wantinfo6 -i-don't-wantinfo1 eferojeerjrmolwwkr info2 rwrjwjwje w ekhwb info3 kke nfiejrrjoeerkrkr info4 -i-don't-want info5 -i-don't-wantinfo6 -i-don't-want info7 -i-don't-want555info8 -i-don't-want4444info5 -i-don't-wantyyyinfo6 -i-don't-want4444info1 555 info2 rwrjwjwje w ekhwb6666 info3 kke nfiejrrjoeerkrkr88888-----------------------------------------------------------------------------------this what you want info1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo1 eferojeerjrmolwwkrinfo2 rwrjwjwje w ekhwbinfo3 kke nfiejrrjoeerkrkrinfo1 555info2 rwrjwjwje w ekhwb6666info3 kke nfiejrrjoeerkrkr88888-----------------------------------------------------------------use find regexinfo[123].*yeah man that basically what i want to achieve - but i dont understand your regex Quote Link to post Share on other sites
UBotDev 276 Posted August 29, 2014 Report Share Posted August 29, 2014 yeah man that basically what i want to achieve - but i dont understand your regexI think he is assuming that your text starts with "info", but as you said, the text changes, so I suppose he misunderstood. I believe your best bet is to try and scrape the HTML that surrounds your text as well, and then use regex on that (usually if text is of different type than the text that you want to scrape, there should be a difference in HTML surrounding it). That's why you should first post the HTML (or give us a URL) so we can help you.... In case I'm wrong and you get that as string (no HTML), you'll have to find some differences between "infoX" and "infoX-i-don't-want" texts, and than use those findings in regex to recognize bad lines. If you even can't do that, then I think you'll have a hard time. However, for this approach you would need to post the actual text that's there, else it's impossible to help you. Quote Link to post Share on other sites
stanf 43 Posted August 29, 2014 Report Share Posted August 29, 2014 im guessing that the line starts with info so that is a constant ( it isnt going to change)the [123] means that it wiil look for any thing that is found between the bracketsthe , is any characterint this case its any character after info[123]the * after the . means that there is 0 or more characters in the string before it finds a blank spaceKeep in mind that this is not a perfect solutionthe [123] wil match 1,2,3 or 23.or 31 or any variation of 123if what follows info[123] is a sentence that has letters numbers and spaces i would replace the [123] with this[d\w\s\'\"\.\-\,\;\:\&\!\?]*it will match any string until it gets to the end of the linei struggle with regex tosearch the users for HelloInsomniahe has a tool for regex it helps me out a lot Quote Link to post Share on other sites
Brutal 164 Posted August 29, 2014 Author Report Share Posted August 29, 2014 Lets go at it differently.... lets say i have a bunch of random text in the #var, but 2 constants.... Sales: Confirmed ... this is constant 1<modified> ... this is constant 2. There are several sets of the 2 constants scatter among the text in the #var. like this.... <modified> bunch of text for me to keep Sales: Confirmed and another bunch of crap i dont need <modified> I tried using this regex but didn't help.(?<=Sales: Confirmed).*?(?=<modified>)basically just trying to delete everything AFTER constant #1, and BEFORE constant #2 Quote Link to post Share on other sites
stanf 43 Posted August 29, 2014 Report Share Posted August 29, 2014 can you post a couple lines of the data? Quote Link to post Share on other sites
UBotDev 276 Posted August 29, 2014 Report Share Posted August 29, 2014 Yep, as I also said, provide the data or URL, else we can ping around for days... Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted August 31, 2014 Report Share Posted August 31, 2014 Your Regex is backwards (?<=<modified>).*?(?=Sales: Confirmed) TC Quote Link to post Share on other sites
Brutal 164 Posted August 31, 2014 Author Report Share Posted August 31, 2014 Thanks guys - VERY Appreciated. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.