Does your company rely on browser automation or web scraping? We have a wild offer for our early customers! Read more →

XPath Contains Function: A Complete Guide for Web Scraping and Automation (2025)

published 12 days ago
by Robert Wilson

Key Takeaways

  • XPath contains() is a versatile function for flexible element selection in web scraping and automation, supporting both text content and attribute matching with case-sensitive partial string comparison
  • Contains() behavior varies between XPath versions 1.0 and 2.0+, particularly when handling multiple text nodes - understanding these differences is crucial for cross-browser compatibility
  • Best practices include combining contains() with other XPath functions, using relative paths, and implementing proper error handling for robust selectors
  • Modern automation frameworks like Selenium, Playwright, and Puppeteer fully support XPath contains() with enhanced debugging capabilities
  • Performance optimization techniques such as caching results and narrowing search scope can significantly improve scraping efficiency

Introduction to XPath Contains

In the evolving landscape of web scraping and automation, finding and interacting with the right elements on a page presents unique challenges. Modern web applications often use dynamic IDs, complex class hierarchies, or constantly changing text content. This is where XPath's contains() function becomes an essential tool in your automation arsenal. According to recent data, over 65% of web automation projects utilize XPath selectors, with contains() being among the most frequently used functions.

The rise of dynamic web applications and single-page applications (SPAs) has made traditional exact-match selectors less reliable. Modern frameworks like React, Vue, and Angular often generate dynamic class names and IDs, making contains() particularly valuable for robust element selection strategies.

Understanding XPath Contains

The contains() function is a built-in XPath method that searches for a substring within a string, providing flexible element selection capabilities. Its syntax follows a simple pattern:

contains(string1, string2)

Where:

  • string1: The text to search within (haystack) - can be element text or attribute value
  • string2: The text to search for (needle) - the substring you're trying to match

The function performs a case-sensitive comparison and returns true if string2 is found anywhere within string1, making it particularly useful for partial matches. This flexibility addresses many common web scraping challenges, such as dealing with dynamic content or varying text patterns.

Common Use Cases and Implementation

1. Dynamic Text Content

Modern web applications often generate dynamic content that may include timestamps, user-specific data, or changing prices. Contains() excels in these scenarios:

# Example: Finding price elements regardless of the actual value
//div[contains(text(), 'Price')]//span[contains(@class, 'amount')]

# Example: Matching elements with partial text
//button[contains(text(), 'Subscribe')]  

# Example: Finding elements with dynamic data attributes
//div[contains(@data-testid, 'user-profile')]

2. Class Name Variations

With modern CSS frameworks and component libraries, class names often combine multiple values or include dynamic suffixes. The contains() function provides flexibility in handling these scenarios:

# Finding elements with specific class patterns
//div[contains(@class, 'btn-') and contains(@class, '-primary')]

# Matching Bootstrap utility classes
//div[contains(@class, 'mt-') and contains(@class, 'px-')]

Advanced Techniques and Patterns

Error Handling and Validation

Implementing robust error handling is crucial for production-grade web scraping. Here's a comprehensive Python example with retry mechanisms:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
from tenacity import retry, stop_after_attempt, wait_exponential

class ElementNotFoundError(Exception):
    pass

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry_error_callback=lambda _: None
)
def find_element_safely(driver, xpath, timeout=10):
    try:
        element = WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((By.XPATH, xpath))
        )
        return element
    except TimeoutException:
        print(f"Element not found with xpath: {xpath}")
        raise ElementNotFoundError(f"Failed to find element: {xpath}")
    except StaleElementReferenceException:
        print("Element became stale, retrying...")
        raise  # This will trigger a retry

# Usage example with advanced error handling
try:
    element = find_element_safely(
        driver,
        "//div[contains(@class, 'product-card')]//h2[contains(text(), 'Limited Edition')]"
    )
    if element:
        print("Element found successfully")
except ElementNotFoundError:
    print("All retry attempts failed")

Performance Optimization

To improve scraping efficiency, consider these advanced optimization techniques:

  1. Cache XPath results when performing repeated operations
  2. Use more specific parent elements to narrow the search scope
  3. Combine contains() with other XPath functions for precise selection
  4. Implement proper wait strategies to handle dynamic content
  5. Use indexing when possible to limit the search space

Cross-browser Compatibility

Different browsers may implement XPath engines differently, affecting contains() behavior. Here's a comprehensive compatibility-focused approach:

# Cross-browser compatible XPath with multiple conditions
//div[
  contains(@class, 'card') and 
  not(contains(@class, 'hidden')) and
  normalize-space(text()[contains(., 'target')]) and
  not(ancestor::*[contains(@class, 'template')])
]

# Handling different text node structures
//div[
  (.//text()[contains(., 'target')] or @*[contains(., 'target')]) and
  not(ancestor::*[@hidden or contains(@style, 'display: none')])
]

Debugging and Troubleshooting

Address common issues with these proven solutions:

1. Case Sensitivity

# Solution: Using translate() for case-insensitive matching
//div[contains(
    translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
    'target'
)]

# Alternative: Using multiple contains for different cases
//div[contains(text(), 'Target') or contains(text(), 'target')]

2. Whitespace Handling

# Solution: Using normalize-space()
//div[contains(normalize-space(.), 'target')]

# Combining with text node handling
//div[normalize-space(./text()[contains(., 'target')])]

Best Practices Summary

Follow these comprehensive guidelines for maintainable and efficient XPath expressions:

  • Use relative paths whenever possible to improve maintainability
  • Combine contains() with other XPath functions for precise selection
  • Implement proper error handling and wait strategies
  • Consider cross-browser compatibility in your selectors
  • Cache results when performing repeated operations
  • Use meaningful variable names and comments in your automation code
  • Document any browser-specific workarounds or special handling

Future Developments

Upcoming features may include:

  • Native case-insensitive matching options
  • Enhanced regular expression support
  • Improved whitespace handling mechanisms
  • New string manipulation functions
  • Better integration with modern web components

Community Perspectives on XPath Usage

Discussions across Reddit, Stack Overflow, and various technical forums reveal a divided opinion on XPath's role in modern web automation. Many experienced QA engineers advocate for using data-testid attributes as the primary selector strategy, arguing that working with development teams to implement these attributes leads to more maintainable test suites. Some teams have even implemented processes where automation pull requests using XPath are automatically rejected in favor of more specific selectors.

However, seasoned automation engineers point out that while data-testid attributes are ideal, this approach isn't always feasible in real-world scenarios. Particularly when working with legacy applications or in environments where QA teams have limited influence over development practices, XPath remains a valuable tool. The ability to traverse the DOM bidirectionally and create complex conditional selectors makes XPath irreplaceable in certain scenarios, especially when dealing with dynamic content or complex hierarchical structures.

Interestingly, the performance argument against XPath (that it's slower than CSS selectors) appears to be outdated. Community members note that while XPath was significantly slower in the Internet Explorer era, modern browsers have largely eliminated this performance gap. The choice between CSS selectors and XPath now primarily depends on specific use cases and team preferences rather than performance considerations.

A pragmatic middle-ground approach has emerged among many practitioners: using data-testid attributes as the first choice, falling back to accessibility attributes (aria-*) for user-facing elements, and reserving XPath for complex scenarios where other approaches fall short. Some teams have also adopted the practice of having QA engineers add their own test attributes to the frontend codebase, bridging the gap between ideal and practical approaches to element selection.

Conclusion

The XPath contains() function remains a cornerstone of modern web scraping and automation strategies. Its flexibility in handling dynamic content and complex DOM structures makes it an invaluable tool for developers. By understanding its version-specific behaviors, implementing proper error handling, and following best practices, you can build robust and maintainable web scraping solutions that stand the test of time.

For further learning and reference:

Robert Wilson
Author
Robert Wilson
Senior Content Manager
Robert brings 6 years of digital storytelling experience to his role as Senior Content Manager. He's crafted strategies for both Fortune 500 companies and startups. When not working, Robert enjoys hiking the PNW trails and cooking. He holds a Master's in Digital Communication from University of Washington and is passionate about mentoring new content creators.
Try Rebrowser for free. Join our waitlist.
Due to high demand, Rebrowser is currently available by invitation only.
We're expanding our user base daily, so join our waitlist today.
Just share your email to unlock a new world of seamless automation.
Get invited within 7 days
No credit card required
No spam
Other Posts
mastering-http-headers-with-axios-a-comprehensive-guide-for-modern-web-development
Learn how to effectively use HTTP headers with Axios, from basic implementation to advanced techniques for web scraping, security, and performance optimization.
published 2 months ago
by Nick Webson
a-complete-guide-to-implementing-proxy-rotation-in-python-for-web-scraping
Learn advanced proxy rotation techniques in Python with step-by-step examples, modern implementation patterns, and best practices for reliable web scraping in 2025.
published a month ago
by Nick Webson
how-to-parse-datetime-strings-with-python-and-dateparser-the-ultimate-guide
Time is tricky: A comprehensive guide to parsing datetime strings in Python using dateparser - from basic usage and real-world examples to handling complex international formats and optimizing performance.
published 17 days ago
by Nick Webson
cloudflare-error-1015-you-are-being-rate-limited
Learn how to fix Cloudflare Error 1015, understand rate limiting, and implement best practices for web scraping. Discover legal solutions, API alternatives, and strategies to avoid triggering rate limits.
published 3 months ago
by Nick Webson
http-error-503-a-complete-guide-to-service-unavailable-errors
The Ultimate Guide to Understanding and Fixing Service Unavailable Errors (2025) - Learn what causes 503 errors, how to troubleshoot them effectively, and implement preventive measures to maintain optimal website performance. Comprehensive solutions for both website visitors and administrators.
published 24 days ago
by Nick Webson
the-ultimate-guide-to-ethical-email-scraping-best-practices-for-collection-and-verification
Master the art of ethical email data collection with this comprehensive guide covering technical implementation, compliance requirements, and verification best practices.
published 5 days ago
by Robert Wilson