Jump to content
UBot Underground

Combining Multiple Data Sets


Recommended Posts

I was wondering how you would combine multiple data sets to ensure that the data actually matches? For example, say you were scraping retail websites for prices and wanted to make a price comparison website. How would you combine the data so that "Hasbro Hulk Action Figure" and "6-inch Hulk Figure" match if they are the same product, just named differently on the different websites?

 

I don't really need specific code, I was more just wondering how one would go about combining multiple data sets without doing it manually?

Link to post
Share on other sites

I have been really trying to get my head around this idea the last few day. I do a lot of data manipulation etc as a day job and this requires a mapping table of sorts. I would love to know if someone more experienced has any way of doing this.

I know it's very unlikely to be able to get a complete match but to at least get the lions share would save massive amounts of time.

Link to post
Share on other sites

If there is other data available on each site that you can match up it would help a lot. They each have the common words of "hulk" and "figure" in their title (in your example). If they were each by the same manufacturer and also the same dimensions then you can guess that they are probably the same thing. Sometimes websites have this info in scrapeable fields. I guess what I'm saying is that you probably need more info than just the title and can match things up as best as you can. Assuming this job is too large to do manually you can always have a button on your site to report an issue if the products are not the same.

Link to post
Share on other sites

scrape a website that already does this?

 

but this gives me an  idea

 

but there is away to see if titles are similar (my idea)
 
but as mentioned above the more info the better

 

 

use $find regex or $contains

to see if title contains the words or phrase
then dig deeper

with like attributes

 

 

CD

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...