Frank 177 Posted June 24, 2011 Report Share Posted June 24, 2011 There are any times when I want to capture text between known characters in some text. Here's how it's done. Let's say I have this: <td>I want this text here.</td> I only want the text inbetween the html td tags but not the tags. You should be able to do a pre and post search for information around the tags like this: (?<=<td>).*?(?=</td>) The (?=...) is the presearcher and the (?=...) is the post searcher. The stuff in the middle, .*? just tells the regex to grab everything BUT don't be greedy. Once it hits the first end tag, it's done when we specify not to be greedy. Frank 2 Quote Link to post Share on other sites
JohnB 255 Posted June 24, 2011 Report Share Posted June 24, 2011 Hey Frank, thank you! Is there a way to capture multiple instances of text between like tags? John Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted June 24, 2011 Report Share Posted June 24, 2011 Hey Frank, thank you! Is there a way to capture multiple instances of text between like tags? John Couldn't you just use a while or loop nodesearch page (regex)and capture it in the while or loop TJ Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted June 24, 2011 Report Share Posted June 24, 2011 Nice Frank! This code will be very useful in the future. Quote Link to post Share on other sites
AutoIM 5 Posted June 24, 2011 Report Share Posted June 24, 2011 Thanks Frank, another useful regex snippet to put in the scrapbook. I did a bit of research on google for guides and tutorials on regex a while ago, and was surprised to find out that there isn't just one 'standard' regex format, but it comes in several flavours, a bit like subtle differences in other programming languages coming from different companies. No wonder it's not always possible to copy and paste a piece of regex code and expect it to work first time out of the box.I also found a website where you could paste in the text from say a page scrape, highlight the part of it you want, and the regex code was generated automatically as you did that. I'll have a rummage around and see if I can find it and post the link here. Phil Quote Link to post Share on other sites
Frank 177 Posted June 24, 2011 Author Report Share Posted June 24, 2011 Hey Frank, thank you! Is there a way to capture multiple instances of text between like tags? John I'm pretty sure that you can use the find regular expression and save it to a list. Just use the 'find regular expression' and select the list option and make sure you are adding to a list. Frank Quote Link to post Share on other sites
Frank 177 Posted June 24, 2011 Author Report Share Posted June 24, 2011 Thanks Frank, another useful regex snippet to put in the scrapbook. I did a bit of research on google for guides and tutorials on regex a while ago, and was surprised to find out that there isn't just one 'standard' regex format, but it comes in several flavours, a bit like subtle differences in other programming languages coming from different companies. No wonder it's not always possible to copy and paste a piece of regex code and expect it to work first time out of the box.I also found a website where you could paste in the text from say a page scrape, highlight the part of it you want, and the regex code was generated automatically as you did that. I'll have a rummage around and see if I can find it and post the link here. Phil You hit the nail on the head Phil. If there's one way to accomplish a task - then there is 10 ways, lol. Quote Link to post Share on other sites
rumen 3 Posted June 24, 2011 Report Share Posted June 24, 2011 Frank thanks for expression. Till now I used find regular end replace regular for this. Yours is much quick.Thanks again. Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted June 25, 2011 Report Share Posted June 25, 2011 I actually used this on a link likehttp://www.fiverr.com/users/wpbuzz/gigs/invite regex(?<=http://www.fiverr.com/users/).*?(?=/gigs/) Wanting to pull the username "wpbuzz" However on the saved list it actually removed the username, and saved everything to the left and right.. Am I doing something wrong here? TJ Quote Link to post Share on other sites
Pete 122 Posted June 25, 2011 Report Share Posted June 25, 2011 You are using http://www.fiverr try http://fiverr Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted June 25, 2011 Report Share Posted June 25, 2011 You are using http://www.fiverr try http://fiverr thats just the first portion of the regex.when the purpose is to pull the user name out of the entire string only. Quote Link to post Share on other sites
Pete 122 Posted June 25, 2011 Report Share Posted June 25, 2011 thats just the first portion of the regex.And the also the reason your regex is failing htaccess settings(?<=http://fiverr.com/users/).*?(?=/gigs/) Quote Link to post Share on other sites
JohnB 255 Posted June 25, 2011 Report Share Posted June 25, 2011 And the also the reason your regex is failing htaccess settings(?<=http://fiverr.com/users/).*?(?=/gigs/) That wasn't the reason at all. The name of the gigs have non-word characters that were not accounted for in the regex. Once adjusted the regex worked fine. John 1 Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted June 26, 2011 Report Share Posted June 26, 2011 John helped me out on it. Thank you... Quote Link to post Share on other sites
Pete 122 Posted June 26, 2011 Report Share Posted June 26, 2011 That wasn't the reason at all lol than go to http://www.fiverr.com and tell me what url you land on? so you have two options change it or remove it Quote Link to post Share on other sites
JohnB 255 Posted June 26, 2011 Report Share Posted June 26, 2011 That would be great if the idea was to navigate to a url. We were parsing text from within a file...As far as i know .htaccess can't block text parsing in a file. Quote Link to post Share on other sites
Pete 122 Posted June 26, 2011 Report Share Posted June 26, 2011 I actually used this on a link likehttp://www.fiverr.co...uzz/gigs/invite regex(?<=http://www.fiverr.com/users/).*?(?=/gigs/) Wanting to pull the username "wpbuzz" However on the saved list it actually removed the username, and saved everything to the left and right.. Am I doing something wrong here? TJ Sorry I’m not so good at mind reading I was going by the information posted Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted June 26, 2011 Report Share Posted June 26, 2011 Sorry I’m not so good at mind reading I was going by the information posted Guess you wouldnt need to be with my statementHowever on the saved list it actually removed the username, and saved everything to the left and right.. Never the less no big deal its been taken care of. Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted June 30, 2011 Report Share Posted June 30, 2011 Hi Frank I got a question. How to make this code work if I want to scrape a code with line breaks.. For example: <div class="content">I wrote a really good college application essay. It's under a page and a half, yet it manages to use 407 more words than the suggested maximum. I'm attempting to trim fat, but there isn't much. I fear that if I become to obsessed with length, the content will suffer. <br> <br> Any advice?</div> I want to get the content inside <div class="content"> and </div> Want to get thisI wrote a really good college application essay. It's under a page and a half, yet it manages to use 407 more words than the suggested maximum. I'm attempting to trim fat, but there isn't much. I fear that if I become to obsessed with length, the content will suffer. <br> <br> Any advice? Thanks! Quote Link to post Share on other sites
Pete 122 Posted June 30, 2011 Report Share Posted June 30, 2011 The only way I know is by removing them Kreatus but the regex I use for that I have never tested in ubot Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted June 30, 2011 Report Share Posted June 30, 2011 The only way I know is by removing them Kreatus but the regex I use for that I have never tested in ubotThat wont work zap since I need to scrape from a page.. This page http://answers.yahoo.com/question/index;_ylt=AswTHU808AuRWqCqhKCoK5cjzKIX;_ylv=3?qid=20080407124721AAA2kow to be specific. Thanks Quote Link to post Share on other sites
Pete 122 Posted June 30, 2011 Report Share Posted June 30, 2011 Is this what you are after? if not it may give you a idea how to get what your afterYahoo.ubot 2 Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted June 30, 2011 Report Share Posted June 30, 2011 Is this what you are after? if not it may give you a idea how to get what your afterThats right zap! Thanks for the workaround! +1 for you. Quote Link to post Share on other sites
Bob The Builder 62 Posted August 21, 2011 Report Share Posted August 21, 2011 There are any times when I want to capture text between known characters in some text. Here's how it's done. Let's say I have this: <td>I want this text here.</td> I only want the text inbetween the html td tags but not the tags. You should be able to do a pre and post search for information around the tags like this: (?<=<td>).*?(?=</td>) The (?=...) is the presearcher and the (?=...) is the post searcher. The stuff in the middle, .*? just tells the regex to grab everything BUT don't be greedy. Once it hits the first end tag, it's done when we specify not to be greedy. Frank How do you go about making the search for <td> to be case insensitive? I couldn't figure out where to the 'i' as it doesn't seem to accept it anywhere in the () I think I have it figured out, (?i) at the beginning seems to do the trick. Quote Link to post Share on other sites
itexspert 47 Posted December 10, 2014 Report Share Posted December 10, 2014 Uhh How many times i was stuck in the <td> Tags thanks for the solution mate! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.