Press Enter

PROJECT TITLE

    Press Enter

    NOTIFICATIONS

    PROJECT TITLE
    vertexmedia

    Webiste Scraper

    • 1 Comment
    • 2 bids
    • $600.00 Avg Bid (USD)
    • Open 3 months ago

    Project description:

    I have a great ubotter, but we seem to be running into some issues scraping a site.  Font error, scraping legal symbols, and even apostrophes within the texts.I need multi-bot modular bots to help with speed and efficiency. the First BOT:

    1: Scrape articles HG.org: https://www.hg.org/articles-for-260-areas-of-law.asp

    2: User input: Articles By Area of Practice: Estate Planning etc…

    3: All scraped data goes into an XAMMP db.

    4: When scraping the components of the article must be stripped and separated. (the website has <script> mixed into for ads and whatnot, they need to be removed)

    Components for DB:
    A: scraped Url
    B: <Title>
    C: description
    D: H2
    E: h2-Article body
    F: H2 — Bot needs to make this an H3
    G: h3-Article Body
    H: H2 — Bot needs to make this an H4
    I: h4-Article Body
    J: H2 —- — Bot needs to make this an H5
    K: h5-Article Body
    L: H2 – —- — Bot needs to make this an H6
    M: h6-article body

    ** any articles that go past this just keep them at H6.
    ***** If possible, some article do not have any H2 tags etc…

    Make the bot smart enough, if the Text is 8 words or less, with a double <br><br>  Make that an H tag in the corresponding order.

    Other known issues: This website, blocks IP address after 100 page views. So we need to swap proxies after that.  The Ip will be dead for 30 days after 100 page views.  I have plenty of proxies to use.

    ** GETTING THE DATA PROPERLY INTO
    THE DB IS THE HIGHEST PRIORITY **

    Have questions before you bid? Click here to open a private conversation with the Employer, or leave a public comment below.

    1 Comment

    1. Whiztech

      I can make it please come on the skype: Sharp Botmaker

    Comment

    FREELANCER BIDDING (2)
    REPUTATION
    BID
    Charles Fairbairn
    Developer and CEO
    In Process
    Peter Kim
    Programmer
    In Process
    ABOUT EMPLOYER

    vertexmedia

    Member Since December 5, 2016
    • Location: Orange County
    • Earning: $0.00
    • Project complete: 0