Filters are rules to include or exclude web content for machine learning.
You can set up three group of rules.
Pages to learn
This is the white list for web pages that should be learned by AI. You can further define how the AI should use the white list, whether it learns only the pages on the list or just learn them at a higher priority.
Pages to exclude
This is the black list for web pages. AI will ignore any web pages that path matches any pattern in this list.
Note the Pages to exclude overrides the Pages to learn. In other words, a web pate match both rules for Pages to learn and Page to exclude, it will be ignored.
Sections to exclude
You can further exclude certain contents from the web pages. The examples of such contents include
- Customers testimonials
- Quoted paragraphs
- User comments
To exclude such contents, you provide XPATH based on their HTML structure.