Whats the Regex code for capturing text between two words

daveconor · February 11, 2014

Trying to scrape text that appears between an <address> and </address> but when I use <address>.*</address> it finds the text and duplicates it into another line below. I tested it with another tag "</a>" and it does the same thing, seems like it only duplicates it when I use tags, is it the brackets that throw it off?

thanks in advance

Edited February 11, 2014 by daveconor

Steve · February 11, 2014

it's probably displayed 2 times in the text that you are scraping. You can use regex and add the scraped item to a list, and have the list delete the duplicates, this way you are left with 1 unique item.

Bot-Factory · February 11, 2014

(?<=\<address\>).*(?=\<\/address\>)

You have to use a look ahead and look behind if you want to get the stuff in between something.

Dan

UBotDev · February 11, 2014

Also use .*? (non-greedy) instead of .*, because currently it matches first <address> and last </address> in the string because the operator is greedy.

Although it might work well is you always have only 1 address in the input string.

Sign In

Whats the Regex code for capturing text between two words

Recommended Posts

daveconor 1

Link to post

Share on other sites

Steve 30

Link to post

Share on other sites

Bot-Factory 602

Link to post

Share on other sites

UBotDev 276

Link to post

Share on other sites

Join the conversation

Browse

Activity