daveconor 1 Posted February 11, 2014 Report Share Posted February 11, 2014 (edited) Trying to scrape text that appears between an <address> and </address> but when I use <address>.*</address> it finds the text and duplicates it into another line below. I tested it with another tag "</a>" and it does the same thing, seems like it only duplicates it when I use tags, is it the brackets that throw it off? thanks in advance Edited February 11, 2014 by daveconor Quote Link to post Share on other sites
Steve 30 Posted February 11, 2014 Report Share Posted February 11, 2014 it's probably displayed 2 times in the text that you are scraping. You can use regex and add the scraped item to a list, and have the list delete the duplicates, this way you are left with 1 unique item. Quote Link to post Share on other sites
Bot-Factory 602 Posted February 11, 2014 Report Share Posted February 11, 2014 (?<=\<address\>).*(?=\<\/address\>) You have to use a look ahead and look behind if you want to get the stuff in between something. Dan Quote Link to post Share on other sites
UBotDev 276 Posted February 11, 2014 Report Share Posted February 11, 2014 Also use .*? (non-greedy) instead of .*, because currently it matches first <address> and last </address> in the string because the operator is greedy. Although it might work well is you always have only 1 address in the input string. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.