Recently a spam wave utilising the vendors search of Shopify made the rounds on Linkedin.
The Problem
Shopify’s built-in store search, as well as the vendor search, utilize GET parameters for the input queries.
This makes it possible, depending on the theme, to create semi-persistent search result pages with user input by simply linking to them. While this will not allow the attacker to create backlinks in 99.99% of cases, it might allow them to highjack the ranking of the store page for their own message.
The major examples of these kind of attacks right now seem to be using it for advertising dubious Instagram followers or „FIFA Coins“ (currency of a popular vidoe game) for sale. By submitting these search result pages to Google, the spammers can create a highly ranking message for a keyword of their choice, advertising their domains or social media handles.
The Easy Solution
Shopify by default already blocks the normal search results from indexing.
The most straightforward solution would be to disallow indexing of all vendor search results as well.
This can be done via the robots.txt file, by creating a file robots.txt.liquid
in your theme’s templates
directory, and adding this code:
{% for group in robots.default_groups %}
{{- group.user_agent }}
{%- for rule in group.rules -%}
{{ rule }}
{%- endfor -%}
{%- if group.user_agent.value == '*' -%}
{{ 'Disallow: /collections/vendors' }}
{%- endif -%}
{%- if group.sitemap != blank -%}
{{ group.sitemap }}
{%- endif -%}
{% endfor %}
Bear in mind this will completely remove the vendor search and overview pages from indexing.
The Better Solution
Better would be to utilise whitelisting to keep those vendors that we want (eg which actually exist in the system). This code must be added to the head of the theme.liquid
file:
{%- if request.path == '/collections/vendors' -%}
{%- assign lowercase_vendors = shop.vendors | join: ',' | downcase | split: ',' -%}
{%- assign lowercase_input = collection.title | downcase -%}
{%- unless lowercase_vendors contains lowercase_input -%}
<meta name="robots" content="noindex">
{%- endunless -%}
{%- endif -%}
We check the request path against the vendors
pseudo-collection, and if there, compare the lowercase user search input (which will be the title of a virtual collection inside the global collection
object) against all lowercased vendors in our product database. If there is no match, a hidden noindex
meta field is printed, keeping the user experience exactly the same while safeguarding against the SEO impact of such attacks.
For malicious pages already in the index, please refer to each search engine’s guide on requesting their deletion.