Skip to main content

SERPINFO (experimental)

warning

SERPINFO is an experimental feature introduced in v8.11.0 for Chrome and Firefox. It can be manually enabled in the extension's options page.

SERPINFO is a system for defining and parsing search engine results pages (SERPs) using YAML definition files. It allows for extracting structured information from search results across various search engines.

File Format

SERPINFO files use YAML format with the following structure:

name: SearchEngineName
version: 1.0.0
homepage: https://example.com/
license: MIT
lastModified: 2023-09-01T00:00:00Z

pages:
- name: PageType
matches:
- https://search.example.com/search?*
results:
- root: .result-selector
url: a.link
props:
title: .title
commonProps:
$site: example
$category: web

Top-Level Properties

PropertyTypeDescriptionRequired
namestringName of the search engineRequired
versionstringVersion of the definition fileOptional
descriptionstringDescription of the definition fileOptional
homepagestringHomepage URL of the definitionOptional
lastModifiedstringLast modification date in ISO formatOptional
pagesarrayArray of page definitionsRequired

Page Definition

Each page in the pages array represents a specific type of search result page (Web, Images, News, etc.):

PropertyTypeDescriptionRequired
namestringName of the page typeRequired
matchesarrayArray of match patterns to include pagesRequired
excludeMatchesarrayArray of match patterns to exclude pagesOptional
includeRegexstringRegular expression pattern to include pagesOptional
excludeRegexstringRegular expression pattern to exclude pagesOptional
userAgentstringCan be "any", "desktop", or "mobile"Optional
resultsarrayArray of result extractorsRequired
commonPropsobjectCommon properties applied to all results on this pageOptional
delaynumber | booleanDelay in milliseconds after page load, or boolean to enable/disable delayOptional

Result Extractor

Each item in the results array defines how to extract data from a single search result:

PropertyTypeDescriptionRequired
rootstring | arrayRoot command to locate result elementsRequired
urlstring | arrayProperty command to extract the URL from the resultRequired
propsobjectKey-value pairs where keys are property names and values are property commands for extractionOptional
buttonarrayButton command to add block buttons to resultsOptional
preserveSpacebooleanDetermines whether to retain the layout space of blocked results, preventing layout shiftsOptional

Commands

SERPINFO supports various commands for complex extraction scenarios, organized into three categories:

Root Commands

  • <css-selector> or [selector, <css-selector>]: Find elements matching the CSS selector
  • [upward, <level>, <root-command>]: Navigate up the DOM tree by <level> steps from each element found by the root command

Element Commands

  • <css-selector> or [selector, <css-selector>, <element-command>?]: Get element by CSS selector
  • [upward, <level>, <element-command>?]: Navigate up in the DOM tree by <level> steps

Note: When <element-command> is omitted in any of the above commands, the current root element is implicitly used.

Property Commands

  • <css-selector>: Get element's textContent property (default) or href attribute (when used in URL extraction) from element matching the CSS selector
  • [attribute, <name>, <element-command>?]: Get attribute value
  • [property, <name>, <element-command>?]: Get property value
  • [const, <value>]: Return a constant string
  • [domainToURL, <property-command>]: Convert a domain to a URL
  • [regexInclude, <pattern>, <property-command>]: Include only if matches regex
  • [regexExclude, <pattern>, <property-command>]: Exclude if matches regex
  • [regexSubstitute, <pattern>, <replacement>, <property-command>]: Replace with regex
  • [or, [<property-command>*], <element-command>?]: Try multiple commands in sequence

Note: When <element-command> is omitted in any of the above commands, the current root element is implicitly used.

Button Commands

  • [inset, <options>?, <element-command>?]: Add a button to a specific position within the element. Options include:
    • top: CSS length or percentage
    • right: CSS length or percentage
    • bottom: CSS length or percentage
    • left: CSS length or percentage
    • zIndex: z-index value (default: 1)

Note: When <element-command> is omitted, the current root element is used.

Example

Here's an example for Bing Image Search:

name: Bing
version: 0.1.0
homepage: https://github.com/ublacklist/builtin#readme
license: MIT
lastModified: 2023-04-05T11:11:20Z

pages:
- name: Images (desktop)
matches:
- https://www.bing.com/images/search?*
userAgent: desktop
results:
- root: [upward, 1, ".iuscp"]
url:
- regexSubstitute
- '"purl":"([^"]+)'
- "\\1"
- [attribute, "m", ".iusc"]
props:
title: [attribute, "title", "li > a"]
button: [inset, { top: "32px", right: 0 }, ".iuscp"]
commonProps:
$site: bing
$category: images

Subscription

Publishing Your SERPINFO

You can share your SERPINFO definitions with others by hosting your YAML file and providing a subscription link.

  1. Create your SERPINFO YAML file following the format described above
  2. Host the file at a publicly accessible URL (e.g., GitHub, your own website)
  3. Share the subscription link with users using this format:
https://ublacklist.github.io/serpinfo/subscribe?url=<url-encoded-url>

Where <url-encoded-url> is your hosted YAML file's URL encoded with URL encoding.

Subscribing to SERPINFO

To subscribe to a SERPINFO definition:

  1. Click on a subscription link provided by the publisher
  2. The extension will open its options page and prompt you to add the subscription
  3. Confirm to add the SERPINFO to your extension
note

To use subscription links, you need to enable them manually. Go to the extension's options page and turn on "Enable SERPINFO subscription links".