Big Jay 35 Posted November 6, 2011 Report Share Posted November 6, 2011 How can I discover other pages of a replicated website? I'm referring to websites used by MLM companys, where each user is given a webpage that is identical to the others... except that the members picture and email are displayed on their page. I am trying to create a bot that will visit the various pages a grab specific data. I know how to scrape a page, I am just stumped as to how I should go about discovering the other self replicated pages. Any ideas? Quote Link to post Share on other sites
Bob The Builder 62 Posted November 6, 2011 Report Share Posted November 6, 2011 How can I discover other pages of a replicated website? I'm referring to websites used by MLM companys, where each user is given a webpage that is identical to the others... except that the members picture and email are displayed on their page. I am trying to create a bot that will visit the various pages a grab specific data. I know how to scrape a page, I am just stumped as to how I should go about discovering the other self replicated pages. Any ideas? Google search using something extremely unique to that page that isn't likely to be changed on a site by site basis? Quote Link to post Share on other sites
Big Jay 35 Posted November 6, 2011 Author Report Share Posted November 6, 2011 Google search using something extremely unique to that page that isn't likely to be changed on a site by site basis? Good idea except with the exception of the following text: Returning Visitors - Welcome Back!Please enter your email address to continue. Everything on the page is a image. Quote Link to post Share on other sites
Bob The Builder 62 Posted November 6, 2011 Report Share Posted November 6, 2011 Good idea except with the exception of the following text: Returning Visitors - Welcome Back!Please enter your email address to continue. Everything on the page is a image. You can use the image filename if it is very unique.You can programmatically enter a bogus email to continue to something more unique if possible.It is likely Google indexed stuff behind it that you can use to find it.Would have to see it, but there is probably a way. Quote Link to post Share on other sites
celavey 1 Posted November 6, 2011 Report Share Posted November 6, 2011 Google search using the site command to see all replicated pages, but this will only tell the subpages for one site. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.