vader 0 Posted August 30, 2010 Report Share Posted August 30, 2010 I'm trying to scrape some addresses and I'm running into problems b/c the target site doesn't really have any detailed labeling in the code. For example: </strong></a></div> <div> 100 Broadway<br /> Everett, MA 02149<br /> (617)381-9000<br /><strong> 2.7 miles away</strong> </div> </div> So, I can get the street address (100 Broadway), but I can't figure out how to isolate City, State, Zip, & Phone. I really want to pull all of these elements separately (as opposed to chucking the entire address into 1 column of my CSV). Any tips are greatly appreciated! Thanks. Quote Link to post Share on other sites
JohnB 255 Posted August 30, 2010 Report Share Posted August 30, 2010 I've run across this numerous times. Your first step in solving this is changing the attribute type and seeing where the different variables are isolated. For example, the default type might be A, but if you change it to Table, for example, you will get different attribute selections in the parameters popup. I hope that makes sense. When you click in the browser, just above the "Choose by attribute"you will see the A tag, or Div tag, or TD, tag, etc...THAT is what you want to change in order to see different variations of the attribute. 9 times out of ten I have found a solution this way. John Quote Link to post Share on other sites
IRobot 43 Posted August 30, 2010 Report Share Posted August 30, 2010 Hi vader, AFAIK the only way to do this (if the data is always in the same format) is to use javascript string handling. Quote Link to post Share on other sites
vader 0 Posted August 30, 2010 Author Report Share Posted August 30, 2010 I've run across this numerous times. Your first step in solving this is changing the attribute type and seeing where the different variables are isolated. For example, the default type might be A, but if you change it to Table, for example, you will get different attribute selections in the parameters popup. I hope that makes sense. When you click in the browser, just above the "Choose by attribute"you will see the A tag, or Div tag, or TD, tag, etc...THAT is what you want to change in order to see different variations of the attribute. 9 times out of ten I have found a solution this way. John Thanks for the reply. I stepped back a couple of DIVs and this is what I get: <DIV class=dealer><DIV class=dealerorder>1</DIV> <DIV class=dealerinfo> <DIV class=dealerdetail><A onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" href="results.aspx?cs=2&dealer=206944"><STRONG>Acme Cars of Boston </STRONG></A></DIV> <DIV>100 Broadway<BR>Everett, MA 02149<BR>(617)381-9000<BR><STRONG>2.7 miles away</STRONG> </DIV></DIV> <DIV class=action> <DIV class=dealerlinklist><A id=dealerinfolink onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" href="javascript:gotolink('results.aspx?cs=2&dealer=206944&position=1');"><IMG id=btn_dealerinfo206944 onmouseover="javascript:document.getElementById('btn_dealerinfo206944').src='/images/tools/dealer-locator/btn_info_on.gif';" onmouseout="javascript:document.getElementById('btn_dealerinfo206944').src='/images/tools/dealer-locator/btn_info.gif';" border=0 src="/images/tools/dealer-locator/btn_info.gif"></A></DIV> <DIV class=dealerlinklist><A onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" onclick="TrackDealerResultRAQClick('206944')" href="/tools/price-quote.aspx?Dealernumber=206944"><IMG id=btn_request206944 onmouseover="javascript:document.getElementById('btn_request206944').src='/images/tools/dealer-locator/btn_requestquote_on.gif';" onmouseout="javascript:document.getElementById('btn_request206944').src='/images/tools/dealer-locator/btn_requestquote.gif';" border=0 src="/images/tools/dealer-locator/btn_requestquote.gif"></A></DIV></DIV> <DIV class=attribute> <DIV class=Attroff>Express Service</DIV> <DIV class=Attroff>Certified Used Dealer</DIV> <DIV class=Attroff>Internet Certified</DIV></DIV></DIV> ....however, I'm still not sure what I can leverage to isolate the City, State, Zip, & Phone.Thanks. Quote Link to post Share on other sites
Seth Turin 223 Posted August 30, 2010 Report Share Posted August 30, 2010 Thanks for the reply. I stepped back a couple of DIVs and this is what I get: <DIV class=dealer><DIV class=dealerorder>1</DIV> <DIV class=dealerinfo> <DIV class=dealerdetail><A onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" href="results.aspx?cs=2&dealer=206944"><STRONG>Acme Cars of Boston </STRONG></A></DIV> <DIV>100 Broadway<BR>Everett, MA 02149<BR>(617)381-9000<BR><STRONG>2.7 miles away</STRONG> </DIV></DIV> <DIV class=action> <DIV class=dealerlinklist><A id=dealerinfolink onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" href="javascript:gotolink('results.aspx?cs=2&dealer=206944&position=1');"><IMG id=btn_dealerinfo206944 onmouseover="javascript:document.getElementById('btn_dealerinfo206944').src='/images/tools/dealer-locator/btn_info_on.gif';" onmouseout="javascript:document.getElementById('btn_dealerinfo206944').src='/images/tools/dealer-locator/btn_info.gif';" border=0 src="/images/tools/dealer-locator/btn_info.gif"></A></DIV> <DIV class=dealerlinklist><A onmouseover="window.status=''; return true;" onmouseout="window.status=''; return true;" onclick="TrackDealerResultRAQClick('206944')" href="/tools/price-quote.aspx?Dealernumber=206944"><IMG id=btn_request206944 onmouseover="javascript:document.getElementById('btn_request206944').src='/images/tools/dealer-locator/btn_requestquote_on.gif';" onmouseout="javascript:document.getElementById('btn_request206944').src='/images/tools/dealer-locator/btn_requestquote.gif';" border=0 src="/images/tools/dealer-locator/btn_requestquote.gif"></A></DIV></DIV> <DIV class=attribute> <DIV class=Attroff>Express Service</DIV> <DIV class=Attroff>Certified Used Dealer</DIV> <DIV class=Attroff>Internet Certified</DIV></DIV></DIV> ....however, I'm still not sure what I can leverage to isolate the City, State, Zip, & Phone.Thanks. for those parts, try $replacing with $new line. then you can do $list from text, with $newline as a delimiter, and you'll have each item as a separate list item. Quote Link to post Share on other sites
vader 0 Posted August 31, 2010 Author Report Share Posted August 31, 2010 for those parts, try $replacing <BR> with $new line. then you can do $list from text, with $newline as a delimiter, and you'll have each item as a separate list item. Seth, Thanks for the tips. I "think" I follow what you're saying...I'm still a big newb, but will play around w/ what you suggested. One question though, how can I separate the City/State/Zip (ex. Everett, MA 02149) since there aren't any <BR>s in between those? I really need to have that data in separate columns of my final DB. I guess if worst comes to worst I could do some creative Excel cleanup (text to columns) after, but I was hoping to avoid that. Thanks. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.