dyvel 20 Posted January 2, 2015 Report Share Posted January 2, 2015 Hi I'm looking for advise on solving this little problem. I want to extract "Firstname Lastname" from the html example below. I'm no regex expert - tried this (?<=<h3>).*?(?=</h3>) - also tried using wildcard, but I think it's the linebreaks/whitespace that's throwing me off... <h3> <a href='/person/Firstname+Lastname/1234-City+A?what=12345678&n=1&page=1&sid=a*%5D%5DJEM%25%5CT0%22'> Firstname Lastname </a> </h3> Thanks Quote Link to post Share on other sites
Bill 7 Posted January 2, 2015 Report Share Posted January 2, 2015 This worked set(#a,"<h3> <a href=\'/person/Firstname+Lastname/1234-City+A?what=12345678&n=1&page=1&sid=a*%5D%5DJEM%25%5CT0%22\'> Firstname Lastname </a> </h3>","Global")alert($replace($find regular expression(#a,"(?<=\\w+\\/).*?(?=\\/\\d)"),"+",$new line)) Quote Link to post Share on other sites
dyvel 20 Posted January 2, 2015 Author Report Share Posted January 2, 2015 Hi Bill Thank you! I appreciate your help. It works if I just grab the H3 tag with an offset and use that to extract the name. I'm trying to get it more robust now and use an ID tag and scrape the inner HTML from that. But that results in 3 instances of a href that matches your regex, so I in return gets the name 4 times. I can't wrap my head around how to narrow it to only look within the H3 tag - is it possible? If it's any help - heres the code I'm working with The first set(#phone,20302030,"Global") is just to have a valid number to work with. set(#phone,20302030,"Global") ui text box("Phone",#phone) ui stat monitor("Name: ",#name) ui stat monitor("Address: ",#address) ui button("Check") { navigate("http://118.tdc.dk/search/go?what={#phone}","Wait") wait for browser event("DOM Ready","") set(#listing0,$scrape attribute(<id="listing0">,"innerhtml"),"Global") set(#name,$replace($find regular expression(#listing0,"(?<=\\w+\\/).*?(?=\\/\\d)"),"+",$new line),"Global") set(#address,$scrape attribute(<tagname=r"address">,"innertext"),"Global") } Quote Link to post Share on other sites
Bill 7 Posted January 3, 2015 Report Share Posted January 3, 2015 See if this works for you ui text box("Phone",#phone)ui stat monitor("Name: ","{%first} {%lsname}")ui stat monitor("Address: ",%address)ui button("Check") { navigate("http://118.tdc.dk/search/go?what={#phone}","Wait") wait for browser event("DOM Ready","") clear list(%first) clear list(%lsname) clear list(%address) set(#listing0,$scrape attribute(<href=w"/person/*">,"fullhref"),"Global") set(#lastname,$find regular expression(#listing0,"(?<=\\+).*?(?=\\/)"),"Global") add list to list(%first,$find regular expression(#listing0,"(?<=person\\/).*?(?=\\+)"),"Delete","Global") add list to list(%lsname,$list from text($replace(#lastname,"+"," "),""),"Delete","Global") add list to list(%address,$scrape attribute(<tagname="address">,"innertext"),"Delete","Global")} Quote Link to post Share on other sites
dyvel 20 Posted January 3, 2015 Author Report Share Posted January 3, 2015 Thank you Bill for taking the time helping me out. I really appreciate it!Your code actually worked, but I had some borderline cases, that gave me problems - e.g. if the number belongs to a company, then the href is different, or the result list was more than 1. I ended up with this code that seems to do the trick in every case ui text box("Phone",#phone) ui stat monitor("Name: ",#name) ui stat monitor("Address: ",#address) ui button("Check") { navigate("http://118.tdc.dk/search/go?what={#phone}","Wait") wait for browser event("Page Loaded","") set(#listing0,$scrape attribute(<id="listing0">,"innerhtml"),"Global") set(#name,$find regular expression(#listing0,"(?<=<h3>)(?s).*?(?=</h3>)"),"Global") set(#name,$trim($find regular expression(#name,"(?<=<a .*?>)(?s).*?(?=</a>)")),"Global") set(#address,$find regular expression(#listing0,"(?<=<address .*?>)(?s).*?(?=</address>)"),"Global") } So I ended up with a solution to scrape id listing0 and perform regex on it to get the information out, and trim the name. I'm a total rookie when it comes to regex so it's like learning latin to me Quote Link to post Share on other sites
deliter 203 Posted February 1, 2015 Report Share Posted February 1, 2015 (?s) put this in your regex,it activates the dot,or mutiline matching Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.