How to Research Your Competition by Scraping Yelp Pages

Being one of the largest business catalogs on the Internet, Yelp offers a lot of information about your competition and how people see your business. If you’ve never heard of Yelp, just know that you can find a lot of information about restaurants, hotels, dentists, gyms, landscapers, even plumbers on the platform. It’s a service-driven amusement park.

The benefits of web scraping on Yelp
Web scraping in action
1. Find the information you need
2. Inspect the source code
3. Setting up the project
4. Making the request
5. Formatting the collected data
The Internet has changed the way we view competitors

The benefits of web scraping on Yelp

Yelp data can be precious if used the right way. But how can we quickly gather it if Yelp does not own any information export feature? I believe you already know the answer to this question. I am going to show how you an example of how a coffee place can extract data from Yelp to learn more about their competitors, but besides that, there are many other benefits to gathering intelligence from this website:

  • Scraping reviews to improve user experience
  • Marketing communication verification
  • Monitoring consumer sentiment
  • Competition research
  • Lead generation

Web scraping in action

We are used to believing that the Internet is this simple, magical place, and we fully understand it. In reality, the web is much more complex. Taking this into consideration, when trying to gather large chunks of information, we have to put in sustained effort, especially if we are going to repeat the process.

Web scraping is all about collecting data snippets from web pages and exporting them to a readable format. If you are one of the people that makes his decision based on large chunks of data, you will find great value in scraping the web.

This can be done by building your scraper, or you can use existing software. Either way, you can get the same results. I will use WebScrapingAPI for this tutorial because I’m already pretty used to it, and it’s going to save us a lot of time.

1. Find the information you need

Let’s assume we represent a coffee place in Amsterdam, Netherlands, and we are interested in building a list of competitors that contains information like the business’s name, address, reviews, pricing rank, and phone number.

First, we will have to access the Yelp platform and search for coffee in Amsterdam, Netherlands.

Yelp will show us all the places that serve coffee and their location on the map. The URL we will be scraping is the following: https://www.yelp.ie/search?find_desc=Coffee&find_loc=Amsterdam.

After we create a new WebScrapingAPI account, we are going to be redirected towards the application’s dashboard. WebScrapingAPI offers a free plan with 1000 requests to test the application. That is more than enough for the process we are going to get through.

Navigate to the “Use API Playground” page, where you will see a page that looks like this:

This is the command center we will use to extract the data. We can make a test request by providing the URL and choosing the proper parameters.

You can do it too by including this URL in the input: https://www.yelp.ie/search?find_desc=Coffee&find_loc=Amsterdam, selecting “Yes” for the Render JS parameter, “Desktop” as the device, “Datacenter” for the Proxy Type, and “Networkidle2” for the Wait Until parameter. After clicking on the “Send API Request” button, your page will look something like this:

Now let’s get to the following steps, where we will inspect the source page and write the actual code that will gather the data.

2. Inspect the source code

For this step, just return to the Yelp page we have previously visited and right-click on the first company’s name, then click “Inspect.”

The page will show you another window that looks something like this:

We will get all the information we need by selecting the DOM elements that contain the data we are looking for. In the picture presented above, we can easily see that the element containing the name of the coffee place has a CSS class of css-166la90. To get all the names of our competitors, we have to select all the elements on the page with that class, and we will do the same with all the details on the page that pique our interest.

3. Setting up the project

After you have created a folder for the project, run the following commands:

npm init -y
npm install got jsdom

To make the requests, we will install the got module, and for our HTML parsing needs, we will use the jsdom package.

Create a new file called “index.js” and open it up.

4. Making the request

Let’s set the parameters and make the request and parse the HTML. Write the following lines in the previously created file:

const {JSDOM} = require(“jsdom”)
const got = require(“got”)
(async () => { const params = {
api_key: “YOUR_API_KEY”,
url: “https://www.yelp.ie/search?find_desc=Coffee&find_loc=Amsterdam"
}
const response = await got(‘https://api.webscrapingapi.com/v1', {searchParams: params}) const {document} = new JSDOM(response.body).window const competitors = document.querySelectorAll(‘li) console.log(competitors)
})()

By running these lines of code, I request WebScrapingAPI to get the page HTML. I collect all DOM elements that contain information about my competitors by using the querySelectorAll() method and log it to the screen using the console.log() function. Be careful to replace the API key string with the one you get from the WebScrapingAPI dashboard.

5. Formatting the collected data

From here on, we will dig deeper to get the specific elements containing the address, reviews, pricing rank, and phone numbers.

After the previously presented lines of code, copy the following:

competitors.forEach((competitor) => {
if (competitor) {
const name = competitor.querySelector(‘.css-166la90’)
if (name) competitor.name = name.innerHTML
const reviewScore = competitor.querySelector(‘.reviewCount__09f24__EUXPN’)
if (reviewScore) competitor.review_score = `${reviewScore.innerHTML}/100`
const detailsContainer = competitor.querySelector(‘.container__09f24__1fWZl’)if (detailsContainer) {
competitor.phone = competitor.querySelector(‘.css-8jxw1i’).innerHTML
competitor.address = competitor.querySelector(‘.raw__09f24__3Obuy’).innerHTML
}
const priceRange = competitor.querySelector(‘.priceRange__09f24__2O6le’)if (priceRange) competitor.price_range = priceRange.innerHTML
results.push(competitor)
}
})
console.log(results)

As you can see, for each competitor we get on the first page, we fetch its name, the pricing range, its address, its phone number, and review score. In the end, we will have an array of objects, and each of them will contain every element in this list.

As you can see, scraping Yelp data using WebScrapingAPI is pretty simple:

  1. Make a request to WebScrapingAPI using the vital parameters: the API key and the URL you need to scrape data from.
  2. Load the DOM using JSDOM.
  3. Select all the competitors by finding the specific class.
  4. For each competitor, get the name, pricing range, address, phone number, and review score.
  5. Add every competitor to a new array.
  6. Log the newly created results array to the screen.

One of the limitations that we are currently facing is that we can only scrape the information from our search page. Using some type of headless browser, like Puppeteer, can fix this issue. This solution will help us do most of the things we can manually do in a web browser, like completing a form, clicking a button, or getting to the following pages.

The Internet has changed the way we view competitors

At this point, just about any company worth its salt has some form of online presence. Staying off the Internet is not really an option, since you’d be turning your back to a huge group of possible customers.

The side effect is that your competitors have a much easier time trying to learn more about you, or you about them. Web scraping is just the next step in a natural progression — adding automation to the data gathering process. The beauty of web scraping is how versatile it can be, plenty of business activities become easier with the right tool.

CEO & Co-Founder @Knoxon, Full Stack Developer