Tag Index Backlog

Tag pages (/tag/{slug}/) are auto-generated by _data/process_data.py. A tag page with fewer than 3 conferences gets both googlebot_noindex: true and sitemap: false written into its front matter.

  • googlebot_noindex: true keeps the page out of Google (we intentionally block thin/low-quality pages from Google to protect site quality signals; Bing ignores this directive).
  • sitemap: false removes thin tag pages from the sitemap, which also drops them from the IndexNow submission (submit_indexnow.py reads the sitemap) — so thin tags are hidden from Bing too until they have real content.

This is unlike hotels and workshops, which carry googlebot_noindex but stay in the sitemap on purpose: Bing indexes them well and they are valuable long-tail pages, so we keep feeding Bing while blocking Google. Thin tag pages (1-2 conferences) aren’t worth indexing anywhere yet, so they get sitemap: false as well.

This re-indexes itself — no manual step

process_data.py rewrites every tag page from scratch on each run (process_data.py ~line 580-616). The googlebot_noindex + sitemap: false lines are only emitted when tag_conf_counts[tag] < 3. So the moment a tag reaches 3 confirmed conferences and you re-run cd _data && python process_data.py, both lines are simply not written: the page re-enters the sitemap (so Bing/IndexNow picks it up) and becomes indexable by Google again. You never edit tag/*.md by hand.

Threshold lives in one place: process_data.py if tag_conf_counts.get(tag, 0) < 3:. Change the 3 there if the policy changes.

How to use this list when adding conferences

When the task is “add conferences to flesh out the thin tags,” prefer real, confirmed conferences (official source required — see CLAUDE.md “Researching Conference Data”) that legitimately carry one of the tags below. Adding a conference with a tag that already exists on other entries will keep tag usage consistent.

Priority order: tags 1 conference away from indexing come first (lowest effort, highest SEO payoff), then tags 2 away.

1 conference away (2 conferences now → add 1 to index)

  • Audio
  • Autonomous Systems
  • Collective Intelligence
  • Crowdsourcing
  • Edge AI
  • Film
  • Game Development
  • Knowledge Graphs
  • Knowledge Representation
  • Middleware
  • Personalization
  • Recommender Systems
  • Scheduling
  • Search
  • Security Research
  • Semantic Web
  • Social Networks
  • Supercomputing
  • Telecommunications
  • Urban Computing

2 conferences away (1 conference now → add 2 to index)

  • Autonomous Agents
  • Cloud
  • Document Analysis
  • Fog Computing
  • Human-Computer Interaction
  • Humanities
  • Language Resources
  • Linguistics
  • Linux
  • Mechatronics
  • Metrics
  • Mobile Security
  • Multi-Agent Systems
  • Network Security
  • Neuroscience
  • Performance
  • Quality of Service
  • Wireless
  • Wireless Security

Refreshing this list

Counts drift as conferences are added. To regenerate the under-3 list:

cd _data && python -c "
import csv, re
from collections import Counter
counts=Counter()
with open('conferences.csv', encoding='utf-8', newline='') as f:
    for row in csv.DictReader(f):
        for tag in re.split(r'\s*,\s*', (row.get('tags') or '').strip()):
            if tag.strip(): counts[tag.strip()]+=1
for c,t in sorted((c,t) for t,c in counts.items() if c<3):
    print(c, '|', t)
"

Last refreshed: 2026-06-07 (39 tags under the threshold).