Dashboard configuration

This field allows you to view and modify project settings right in this dashboard, providing complete control over your projects. To make changes, use the settings button.

HTML

HTMLs is a crucial section in the ScrapeSuite dashboard, serving as a key element to specify the starting point for data collection. When creating a new project, navigate to the HTMLs section to configure fields, including specifying the URL of the page from which the data collection process will begin.

The list of added HTMLs allows you to view all the HTMLs that have been added to the project. You can see their names, settings, as well as the date and time of addition. Here, you can create a new URL, modify settings for existing URLs, or delete URLs as needed. Initially, your parser will be configured for one URL, but during the process, you may find that adjustments based on other URLs are necessary. In this section, we provide a list of your URLs for you to manage and fine-tune the parser accordingly.

Controllers

The controller in ScrapeSuite is a tool used to determine how to interact with a website or a specific page. It serves as the foundation on which a Job is built. The controller describes how requests to the website will be made and also acts as the basis for the Processing API.
When you create a project, a default controller is automatically generated, which you can use as is. However, for more precise customization and to account for specific requirements, you have the option to create a new controller and configure its parameters according to your needs. This allows you to have greater control over how the interactions with the website or page are conducted.

Jobs

Jobs is a section in the dashboard designed for executing the parsing of one or more pages, where you can configure a specific list of URLs with defined parameters. You can start a Job and analyze the results of the completed work in Storage. Additionally, a Job describes how the system will interact with the website according to the settings established in the controller. Thus, the controller plays a crucial role in determining the steps that a Job will take when parsing data. The Job is built upon the controller, which defines the method for retrieving the page.

Storage

Storage is the place where the results of your Jobs and Controllers are stored. Here, you can analyze the parsing results and exported data. Exported data can be used for analysis, processing, and generating reports. It serves as a central repository for the output generated by your scraping and parsing activities, allowing you to manage and utilize the data efficiently for various purposes.

Timeline

The Timeline in ScrapeSuite provides a comprehensive overview of all stages of working with a project. It displays key events such as project creation, configuration of the base controller, addition of HTML, creation of a new Job, and deployment of the project. This tool allows you to easily track all the steps in your work and understand the current status of your project.
If you are using automated data scraping, it is convenient to see and track the schedule of running processes in this window. The Timeline feature helps you stay organized and informed about the progress of your project at different stages.

Deployment

The Deployment section in ScrapeSuite provides crucial information about the status of your project. When building or modifying a parser, any changes made will not take effect in the data collection process until the deployment is done. Therefore, once the work on building or modifying the parser is completed, it is necessary to deploy the changes.

The Deployment section displays the time of the last deployment, the time of the last modifications, the current status of the project, and labels such as:
  • Need Deploy indicating the need for deployment after making changes.
  • Usage actual version indicating the parser’s currency and the absence of changes in it.
This feature allows you to easily manage updates and maintain the current version of your project, ensuring that the latest changes are applied for data collection.