awesome sauce 0 Posted February 16, 2016 Report Share Posted February 16, 2016 I was wondering how you would combine multiple data sets to ensure that the data actually matches? For example, say you were scraping retail websites for prices and wanted to make a price comparison website. How would you combine the data so that "Hasbro Hulk Action Figure" and "6-inch Hulk Figure" match if they are the same product, just named differently on the different websites? I don't really need specific code, I was more just wondering how one would go about combining multiple data sets without doing it manually? Quote Link to post Share on other sites
hipman87 1 Posted February 16, 2016 Report Share Posted February 16, 2016 I have been really trying to get my head around this idea the last few day. I do a lot of data manipulation etc as a day job and this requires a mapping table of sorts. I would love to know if someone more experienced has any way of doing this.I know it's very unlikely to be able to get a complete match but to at least get the lions share would save massive amounts of time. Quote Link to post Share on other sites
nichewebstrategies 12 Posted February 16, 2016 Report Share Posted February 16, 2016 Are the UPC or item stock numbers available and consistent across all sites? That would be one way that you could link the products together without having to match on product title which is going to be extremely hard to do. Quote Link to post Share on other sites
awesome sauce 0 Posted February 17, 2016 Author Report Share Posted February 17, 2016 No. There is nothing unique like that on each product that is the same across all sites. Quote Link to post Share on other sites
deliter 203 Posted February 17, 2016 Report Share Posted February 17, 2016 Post html code Quote Link to post Share on other sites
HelloInsomnia 1103 Posted February 17, 2016 Report Share Posted February 17, 2016 If there is other data available on each site that you can match up it would help a lot. They each have the common words of "hulk" and "figure" in their title (in your example). If they were each by the same manufacturer and also the same dimensions then you can guess that they are probably the same thing. Sometimes websites have this info in scrapeable fields. I guess what I'm saying is that you probably need more info than just the title and can match things up as best as you can. Assuming this job is too large to do manually you can always have a button on your site to report an issue if the products are not the same. Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted February 18, 2016 Report Share Posted February 18, 2016 scrape a website that already does this? but this gives me an idea but there is away to see if titles are similar (my idea) but as mentioned above the more info the better use $find regex or $containsto see if title contains the words or phrasethen dig deeperwith like attributes CD Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.