At Lexica, we use automated crawlers to monitor local news sources and provide our subscribers with comprehensive intelligence about what's happening in the countries we cover. This page explains how our crawlers work and provides information for website administrators.
Crawler Identification
Our crawler identifies itself as:
User-Agent: Lexica News Bot/1.0 (+https://lexica.news/crawlers; [email protected])
How We Crawl
We've designed our crawlers to be respectful and efficient:
- Frequency: We check news sources once per hour
- Smart Fetching: We only fetch RSS feeds or homepages to identify new articles
- Minimal Load: We only scrape articles that are new since our last visit
- Rate Limiting: We implement delays between requests to avoid overwhelming servers
- Article Cap: Maximum of 50 articles per site per hour
What We Collect
Our crawlers collect publicly available news articles including:
- Article headlines
- Publication dates
- Article body text
- Source attribution
We do not collect:
- User comments or personal data
- Images or multimedia content
- Subscription-only content
- Advertisement data
Respecting Your Website
We understand the importance of server resources and have implemented several measures:
- Polite Crawling: Random delays of 0.5-1.5 seconds between requests
- Smart Caching: We track which articles we've already seen to avoid re-fetching
- Deduplication: URL-based checks prevent duplicate scraping
- Distributed Timing: Spread requests throughout the hour to minimize load
Robots.txt Compliance
We respect robots.txt directives. If you want to control how our crawler accesses your site, please update your robots.txt file:
# Allow Lexica crawler
User-agent: Lexica News Bot
Crawl-delay: 2
Allow: /
# Or block specific sections
User-agent: Lexica News Bot
Disallow: /private/
Disallow: /admin/
Why We Crawl
Our mission is to raise awareness of local news coverage and help international professionals access and understand local perspectives. We serve:
- Diplomatic and government personnel
- International business professionals
- NGO and development workers
By highlighting local news in native languages, we help these professionals understand countries like insiders while driving high-quality, engaged readership back to local news sources. Our analytical briefs always include clear attribution, encouraging our readers to visit the original sources for full coverage.
We believe in amplifying local voices, not replacing them. Every story we feature directs attention and traffic back to the original publishers.
Opting Out
If you prefer that we don't crawl your website, you have several options:
- robots.txt: Block our crawler using the User-agent string above
- Contact us: Email [email protected] with your domain
We'll respect your preferences immediately upon notification.
Benefits of Being Crawled
When Lexica features your content:
- Clear Attribution: Every article includes "Local Coverage: [Your Publication]"
- Quality Traffic: Your journalism reaches engaged international decision-makers who value in-depth local reporting
- Expanded Reach: Your stories gain visibility among diplomats, business leaders, and development professionals
- Direct Referrals: Our readers regularly click through to read full articles on your site
- International Recognition: Your local reporting gains global relevance and impact
Technical Details
- Crawler Type: Mixed approach for static HTML and JavaScript-rendered content
- Languages Supported: Multiple languages in the countries we cover
- Response Format: We accept HTML, RSS/Atom feeds
- Encoding: UTF-8 preferred, but we handle various encodings
Contact Us
For questions, concerns, or to report issues with our crawler:
Email: [email protected]
Technical Support: [email protected]
General Inquiries: [email protected]
We typically respond within 24 hours and are happy to work with publishers to ensure mutually beneficial crawling practices.
Updates
This page was last updated: August 2025
We'll update this page whenever we make significant changes to our crawling behavior. Website administrators can subscribe to updates at [email protected].
Lexica - Transforming local news into global intelligence