Jobs

Settings

In the Jobs section, when creating a new job, you have the opportunity to configure the controller used, email notifications, and a convenient scheduling automation tool – the Scheduler. You can reconfigure these settings in the Settings section

Additionally, you can completely delete a job if the need arises.
More details can be found in the Jobs section.
Scheduler within a Job in ScrapeSuite is a feature that provides precise control over the timing of your parsing tasks. This tool is designed to ensure flexibility and accuracy in scheduling the operation of your parsers.

In the Scheduler, you can:
  • Set specific times and start dates for parsing tasks.
  • Establish periodic intervals for the automatic execution of tasks at specific times.
  • View the preview of the next start dates.
Enabling the Scheduler:
Configuring the Scheduler. Set the time (hours/minutes):
Setting the frequency. Choose the month or leave it as the default, which is every month.
Configuring the frequency of running Jobs by days. By default, the Jobs run every day. You can choose specific days of the month for execution, either a single date or multiple dates.
You can also configure the Jobs to run on specific days of the week. Choose one day or multiple days.
After configuring the Scheduler, we can see how the parsing will take place.
Example:

Configuration

We need to configure our Job, which means we have to specify exactly which pages we will be processing with our built parser. When we open an unconfigured Job, we encounter a warning.

We recommend clicking Yes and proceeding with this configuration.

In this tab, there are two fields.
In the URLs field, you can enter all the URLs you want to process.
In the Preview URLs field, you can see all the URLs you added.
Next, there are several ways you can add URLs to Jobs.
You can add URLs you want to process in several ways.

Method 1
Import a file with URLs that need to be processed.
To do this, click

You will see a window prompting you to choose a file with your URLs.
Also, please note that if you don’t uncheck the box
your first row will be treated as a header in this file and will not be processed.

As indicated, the maximum file size is 20MB, and the maximum number of rows is 500,000.
After adding, click Create.
If the file was not selected, you will receive the following notification: “Select the CSV file!”

Method 2
Add the URLs you want to process in the empty URLs field, and they will appear in the Preview URLs field.
Example:

When you click, a form will open where you can do this.
Add what you want to name your list of URLs:
After adding you will see the following:
You must enter the Product into the URL:
https://us.ecoflow.com/products/${Product}
“Product” is the name of the field that we specified.
The result we should get:
And you can also add the Range field.
Go to this tab and write the name you want to give to this field, and choose the range from which to which it will be ranged.
Example:
The result will be:

Method 3
This method is suitable for users who do not have a list of URLs.
We can obtain such a list directly from the site. To do this, click

Next, a form will open that you need to fill out.
In the Website field, enter the absolute URL, which is the main address of the site.

Then, in the Pattern field, enter the path where your products are located that you want. For example, */product/*. The symbol “*” will get the part of the link that is followed by the “/” delimiter, and then comes “product,” which is part of the link.

As an example of what we can get:
https://us.ecoflow.com/products/converter-tips-for-laptops
The Limits field is where you specify the number of pages you want to get with this pattern.
We recommend first checking on a small number of pages, for example, 10.
After configuration, it should look like this:
After configuration, click Import.
Configuration of your Job is complete, and the result will be like this:
Click Save, and it will redirect you to the list of your Jobs.

Parsing status

In this window, you can reconfigure the configuration or modify parameters as needed. You can also delete the Job. You can choose whether to receive email notifications upon task completion.

You can track the status of your tasks.
Here are the following details:
  • Status
  • URL
  • Status Code
  • Date and Time
  • Execution Time
  • Cost
  • Error Message

Possible statuses:

  • Complete: Your configured parser successfully processed and retrieved data from the page.
  • Not valid result: Your configured parser could not gather data from this page. In this case, we recommend going to View Result and using the option to add this URL to the project’s HTMLs with parser adjustments.
  • Bad Status Code: The 404 code means that a server could not find a client-requested webpage.

Under the Completed status for your Job, you will find the task start time, the number of URL rescans, and the Rescan button, which becomes available when the status is Not valid result.

Using the Rescan button

In the URL table, you can select View Result, which allows you to view the results by URL. You can find more detailed information in the description of the Storage.