The HTML Visual Code Representation section in ScrapeSuite offers an intuitive interface for identifying and selecting elements on a webpage that you want to parse. This dual-pane view allows users to interact with both the rendered web page and its underlying HTML code, ensuring precise data extraction. Here’s how it works:
The JSON Result section in ScrapeSuite displays the data extracted from a webpage in a structured JSON format. This section is essential for reviewing and verifying the parsed data before using it in further applications or exporting it. Here’s what you need to know:
Note:
JSON serves as a universal data exchange format, facilitating seamless communication between different programming languages. It offers flexibility in defining data structure and format, enhancing parsing capabilities. With ScrapeSuite, you can harness the power of JSON to efficiently extract and manipulate data from web pages.
In ScrapeSuite, the Settings Tree is an essential component for configuring your web scraping tasks. It allows you to define the structure of the elements you want to scrape, how they are selected, and how they are processed. The Settings Tree consists of several key components: the main container, containers, content, selectors, and container types.
Creating elements in ScrapeSuite is an intuitive process designed to empower users in defining specific parts of a webpage for parsing. Whether selecting directly from the HTML Preview, from the HTML Code, or through the element tree, users have the flexibility to tailor their parsing tasks to their needs.
The Hidden Elements feature in ScrapeSuite allows you to hide specific elements from the HTML preview while keeping them in the code. This is useful for decluttering the preview and focusing on the elements you want to parse.
The “Content” block is used to define specific elements within a container that you want to scrape. Clicking on an area of interest in the HTML visual interface creates a Content block by default.
Assign a name to this content block for easy identification in the JSON Result.
Choose whether to parse all occurrences of the element: only the first or the last one. This is useful when the element appears multiple times on a page.
Indicate if the element might not be present on the page. If enabled, the ScrapeSuite will not fail if the element is missing.
Specify how data will be extracted from the selected element. Options include extracting the text content or a specific attribute value (e.g., href, src).
The CSS code is used to identify elements on the webpage. This allows precise targeting of elements based on their attributes and structure.Detailed information about supported selectors you can find here.
Options to process data after parsing.Detailed information about post-processing features you can find here.
Define actions for subordinate operators (AND, OR, PLUS)
When multiple selectors are added within a Content block, the subordinate operator defines their interaction:
You have two selectors within a Content block to capture different product title formats. The subordinate operator ensures they work together as defined.
Content Selectors allow precise data extraction by using CSS code to identify elements on the webpage.
Specify how data will be extracted from the selected element. Options include extracting the text content or a specific attribute value (e.g., href, src).
The CSS code is used to identify elements on the webpage. This allows precise targeting of elements based on their attributes and structure.Detailed information about supported selectors you can find here.
Options to process data after parsing.Detailed information about post-processing features you can find here.
Containers help organize elements into logical groups. Their settings have been updated for improved usability.
When multiple selectors are added within a Container, the subordinate operator defines their interaction:
“Container Type” allows multiple areas of interest within a container. This adds flexibility in defining complex page structures.
Assign a name to the container type for easy identification in JSON Result.
Define actions for subordinate operators (And, Or, Plus).
When multiple selectors are added within a Container, the subordinate operator defines their interaction:
The CSS code is used to identify elements on the webpage. This allows precise targeting of elements based on their attributes and structure.Detailed information about supported selectors you can find here.
Post-processing is a crucial step in the scraping process, allowing you to clean, format, or transform data after it has been scraped but before it is stored or used. This feature ensures that the data meets your specific requirements and is ready for immediate use.Post-processing provides the opportunity for additional data processing before saving or exporting, enhancing the accuracy and usability of the parsing result.
In ScrapeSuite, selectors play a crucial role in identifying and extracting specific elements from a webpage. You can create selectors automatically by clicking on an area of interest in the HTML preview or manually by entering the CSS code in the “Selector” field.
Selectors allow you to pinpoint the exact elements you want to parse. Below is a list of supported selector methods, along with their descriptions and examples of how to construct them in ScrapeSuite.
Selector | Description | Example |
| Selects all elements with the specified class. |
|
| Selects all elements with both class names. |
|
| Selects all elements with class2 that are descendants of an element with class1. |
|
| Selects the element with the specified ID. |
|
| Selects all elements. |
|
| Selects all elements of the specified type. |
|
| Selects all elements of the specified type with the specified class. |
|
| Selects all elements of the specified types. |
|
| Selects all elements of the specified type that are descendants of the specified element. |
|
| Selects all elements of the specified type that are direct children of the specified element. |
|
| Selects the first element of the specified type that is immediately adjacent to the specified element. |
|
| Selects all elements of the specified type that are preceded by the specified element. |
|
| Selects all elements with the specified attribute. |
|
| Selects all elements with the specified attribute value. |
|
| Selects all elements with the specified attribute value as a word. |
|
| Selects all elements with the specified attribute value starting with a specific value. |
|
| Selects all elements with the specified attribute value ending with a specific value. |
|
| Selects all elements with the specified attribute value containing a specific substring. |
|
| Selects every element that is the first child of its parent. |
|
| Selects every element that is the last child of its parent. |
|
| Selects every element that does not match the specified selector. |
|
| Selects every element that is the nth child of its parent. |
|
Using these selectors, you can fine-tune your data extraction to capture exactly the elements you need from a webpage. Whether you are selecting elements by class, ID, attribute, or position, these methods provide a powerful way to define your parsing targets in ScrapeSuite.
The Elements section in ScrapeSuite simplifies the process of selecting and managing elements for parsing on web pages. It offers an intuitive interface for precisely defining the data to be extracted. Here’s a detailed overview of its functionalities:
The Elements section provides a user-friendly approach to selecting elements of interest directly from the HTML preview or code view. It empowers users to define their data extraction criteria with precision.
Elements included in the JSON Result are marked with the “In result” indicator, allowing users to easily identify which elements have been successfully included in parsing results.
Including selected elements in JSON Result is provided by the “Elements” section in Content and Selector settings.
Timeline is a feature that displays the sequence of events and actions related to setting up the parser. This timeline provides users with an overview of all key moments, showing the creation of containers, properties, text selectors, attribute selectors, and CSS selectors. By selecting a specific point on the timeline, you can always go back one or more steps and reconfigure with the changes you need.
IMPORTANT! Note that if you go back one or more steps and make changes to your settings, all subsequent changes on the timeline made after that point will be completely removed. Changes that may be removed after timeline modifications will be displayed in a faded gray color.
Example: Before making changes to the timeline and removal.
After making changes and deletions on the timeline.
© ScrapeSuite 2025. All rights reserved.