Ghoast 0 Posted May 29, 2012 Report Share Posted May 29, 2012 Need to scrape a URL out of a large block of text so I thought the easiest way would be to find it using regular expressions since the block of text is huge and I can't find the URL as it's own element.. The first part of the url which always stays the same is: https://jigsy.com/account/activate?u= Then there is a part after which might change in length which is just made up of random numbers and letters for example: 1124290&k=8d8b6f77c0855c6936544174d77098a0a601c6b2 What would the regex be for that URL? Or maybe there's an easier way to scrape it.. The text that surrounds it is: Return-Path: srs0=efi1eu=eb=jigsy.com=feedback@XXXXXXXXXXXXXXXX.comReceived: from mip.hushmail.com (LHLO smtp5.hushmail.com) (65.39.178.78) by server with LMTP; Tue, 29 May 2012 16:51:17 +0000 (UTC)Received: from smtp5.hushmail.com (localhost.localdomain [127.0.0.1]) by smtp5.hushmail.com (Postfix) with SMTP id 51EBB50149 for <XXXXXXXXXXXXXXXX@hush.com>; Tue, 29 May 2012 16:51:17 +0000 (UTC)Received: from m1.dnsix.com (m1.dnsix.com [66.11.225.176]) by smtp5.hushmail.com (Postfix) with ESMTP for <XXXXXXXXXXXXXXXX@hush.com>; Tue, 29 May 2012 16:51:08 +0000 (UTC)Received: from [65.39.176.60] (helo=viviti.com) by m1.dnsix.com with esmtp (Exim 4.72) (envelope-from <feedback@jigsy.com>) id 1SZPdU-0005CN-HT for preciousverse41@XXXXXXXXXXXXXXXXXX.com; Tue, 29 May 2012 09:51:08 -0700Date: Tue, 29 May 2012 09:51:07 -0700From: feedback@jigsy.comTo: preciousverse41@XXXXXXXXXXXXXXXXXXX.comMessage-Id: <4fc4fe7bd6dcf_2531159386d6bf0881a@electra.vc.bravenet.com.tmail>Subject: Welcome to JigsyMime-Version: 1.0Content-Type: multipart/alternative; boundary=mimepart_4fc4fe7bd7628_2531159386d6bf08983 Welcome to Jigsy! Thanks for choosing to build your website with us! We hope you enjoy the experience,and would love to hear any feedback you might have. You can get in touch with us as wellas other members on our message forums at https://forums.jigsy.com. --------------------------------------------------------------------- In order to activate your account, please follow the link below: https: //jigsy.com/account/activate?u=1124290&k=8d8b6f77c0855c6936544174d77098a0a601c6b2 If you did not request this, somebody else did using your email address. If so,we apologize for the mailing! ------------ I've blanked out some addresses with XXXXXXXXXXXX for privacy reasons..Also I've added a space after https: otherwise the forum was shortening the URL once it hyperlinked it.. Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted May 29, 2012 Report Share Posted May 29, 2012 why use regex? just use regular scrape instead <innertext=w"https://jigsy.com/account/activate?u=*&k=*"> Quote Link to post Share on other sites
Ghoast 0 Posted May 29, 2012 Author Report Share Posted May 29, 2012 Hey - that sounds like an awesome solution. Sorry to sound ignorant but what is a 'regular scrape'.. I'd already looked at page scrape and scrape attribute and couldn't seem to get them to work with that.. Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted May 29, 2012 Report Share Posted May 29, 2012 scrape attribute Be easier if I had the original html page displayed that your scraping from. Quote Link to post Share on other sites
Pete 121 Posted May 29, 2012 Report Share Posted May 29, 2012 This is hushmail so try an add to list page scrape for clear list: xxx addto list: xxxLeftside: open?http://Rightside: " That should pull all the outbound links on the page Then loop the list total and use a regex to find the one you needhttp(.*)account(.*)\d Then clear the list for the next one Then remember hushnail reformat some link like http://whateverdomain/&uid=whatevertextWhen the real address is http://whateverdomain/uid=whatevertextSo you may have to use a replace text as well EDIT Your post below is not showing the body text code as its inside a iframe Last Edit Easier to make it then explain itclear list(%temp)add list to list(%temp, $page scrape("open?http://", "\""), "Delete", "Global")loop($list total(%accounts)) { add item to list(%accounts, $find regular expression($next list item(%temp), "http(.*)account(.*)\\d"), "Delete", "Global")} Quote Link to post Share on other sites
Ghoast 0 Posted May 29, 2012 Author Report Share Posted May 29, 2012 Well this is what I've got so far: $scrape attribute(<readonly=1>, "<value=w\"https://jigsy.com/account/activate?u=*&k=*\">") I changed innertext for value as the innertext of the element is blank.. The outerhtml and innerhtml return all the text as well as value however none of these seemed to work when I tried them.. Here's the whole thing: in HTML: <textarea readonly="1" style="width:100%;" rows="37"">Return-Path: srs0=efi1eu=eb=jigsy.com=feedback@XXXXXXXXXXXX.comReceived: from mip.hushmail.com (LHLO smtp5.hushmail.com) (65.39.178.78) by server with LMTP; Tue, 29 May 2012 16:51:17 +0000 (UTC)Received: from smtp5.hushmail.com (localhost.localdomain [127.0.0.1]) by smtp5.hushmail.com (Postfix) with SMTP id 51EBB50149 for <XXXXXXXXXXXXXXX@hush.com>; Tue, 29 May 2012 16:51:17 +0000 (UTC)Received: from m1.dnsix.com (m1.dnsix.com [66.11.225.176]) by smtp5.hushmail.com (Postfix) with ESMTP for <XXXXXXXXXXXXXX@hush.com>; Tue, 29 May 2012 16:51:08 +0000 (UTC)Received: from [65.39.176.60] (helo=viviti.com) by m1.dnsix.com with esmtp (Exim 4.72) (envelope-from <feedback@jigsy.com>) id 1SZPdU-0005CN-HT for preciousverse41@XXXXXXXXXXXXXXX.com; Tue, 29 May 2012 09:51:08 -0700Date: Tue, 29 May 2012 09:51:07 -0700From: feedback@jigsy.comTo: preciousverse41@XXXXXXXXXXXXXX.comMessage-Id: <4fc4fe7bd6dcf_2531159386d6bf0881a@electra.vc.bravenet.com.tmail>Subject: Welcome to JigsyMime-Version: 1.0Content-Type: multipart/alternative; boundary=mimepart_4fc4fe7bd7628_2531159386d6bf08983 Welcome to Jigsy! Thanks for choosing to build your website with us! We hope you enjoy the experience,and would love to hear any feedback you might have. You can get in touch with us as wellas other members on our message forums at https://forums.jigsy.com. --------------------------------------------------------------------- In order to activate your account, please follow the link below: https://jigsy.com/account/activate?u=1124290&k=8d8b6f77c0855c6936544174d77098a0a601c6b2 If you did not request this, somebody else did using your email address. If so,we apologize for the mailing!</textarea> hope that helps Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted May 29, 2012 Report Share Posted May 29, 2012 you do realize you can scrape it with the &: and that in the url and when navigated to, it decodes the url encoding? Quote Link to post Share on other sites
Ghoast 0 Posted May 29, 2012 Author Report Share Posted May 29, 2012 you do realize you can scrape it with the &: and that in the url and when navigated to, it decodes the url encoding? I don't know what you mean by that.. not sure if it's a suggestion or a question.. Why wouldn't this be giving me results: $scrape attribute(<readonly=1>, "<value=w\"https://jigsy.com/account/activate?u=*&k=*\">") Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted May 29, 2012 Report Share Posted May 29, 2012 feel free to Private message me, let me know the url to see it directly while in ubot, and ill work out an easy way to scrape what your needing. Quote Link to post Share on other sites
Pete 121 Posted May 29, 2012 Report Share Posted May 29, 2012 you do realize you can scrape it with the &: and that in the url and when navigated to, it decodes the url encoding? That’s correct TJ most times But I’ve had many fail also due to poor hosts or badly installed scripts Quote Link to post Share on other sites
Ghoast 0 Posted May 29, 2012 Author Report Share Posted May 29, 2012 That’s correct TJ most times But I’ve had many fail also due to poor hosts or badly installed scripts Thanks for the help Zap - but I'm not quite getting what you're saying.. I'm sending you a PM TJ - thanks! Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted May 29, 2012 Report Share Posted May 29, 2012 That’s correct TJ most times But I’ve had many fail also due to poor hosts or badly installed scripts Zap i have a ubot script here in the tips and tutorials area, on how to encode, or decode urls as well. With a function i built in ubot with javascript.http://ubotstudio.com/forum/index.php?/topic/9828-url-encoding-decoding-quick-code-sample Quote Link to post Share on other sites
Ghoast 0 Posted May 30, 2012 Author Report Share Posted May 30, 2012 TJ got this sorted for me in about 5 minutes. Thanks Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.