Ptrick125 45 Posted October 17, 2013 Report Share Posted October 17, 2013 I am working on a url scraper, and the website tends to include the referall source, but I cannot include that when saving it to a file later on. I was successful in getting it to grab everything after the "/". testwebsite.com/ubotstudio?ref=referral This is what I have so far: [^/]*$testwebsite.com/ubotstudio?ref=referral How should I have it match everything before the "?" sign? 1 Quote Link to post Share on other sites
HelloInsomnia 1103 Posted October 17, 2013 Report Share Posted October 17, 2013 The simplest way to do it is this: .*(?=\?) But it would be nice to know if the format will always be like that so you can come up with something better. For example if the format is always like that (url without www or http) then you can also use something more specific: [a-zA-Z0-9]+\.[a-zA-Z]{2,4}[a-zA-Z0-9\/\.-_+%!]+(?=\?) 1 Quote Link to post Share on other sites
Ptrick125 45 Posted October 17, 2013 Author Report Share Posted October 17, 2013 The simplest way to do it is this: .*(?=\?) But it would be nice to know if the format will always be like that so you can come up with something better. For example if the format is always like that (url without www or http) then you can also use something more specific: [a-zA-Z0-9]+\.[a-zA-Z]{2,4}[a-zA-Z0-9\/\.-_+%!]+(?=\?) http://i.imgur.com/EeVAAOq.pngIt works! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.