kushal

Finding Elements in the Ever-Changing DOM

The core part of hyphenbox is the ability to identify/highlight the html element based on a previously recorded information.

Here are some details of how i tackled this. It's not just about improving the element finding strategy but also improve the type of information we record so the finder has all the freedom.

Crisis 1: The "Now You See Me" Stability Problem

The Issue: How do you recognise that the component you are looking for is not loaded yet so that your strategies don't end up fail for no reason. What Didn't Work: Littering the code with setTimeout(). Which claude loves to do. (I have a new ick now) Our Solution: Wait for structural stability, not just element presence. Before searching for an element, we take a snapshot of the "important" parts of the page (modals, dialogs, portals). We then rapidly take more snapshots until the structure stops changing for a few consecutive checks. Only then we proceed.

// A more accurate, illustrative version of our stability check
const waitForModalStability = (stabilityThreshold = 3) => {
  const stabilitySelectors = `
    [data-portal="true"], [role="dialog"], .modal-content, 
    .MuiDialog-paper, .mantine-Modal-content
  `;

  const getStructureSignature = (elements) => {
    return Array.from(elements).map(el => {
      const r = el.getBoundingClientRect();
      // The signature includes tag, ID, class count, size, and position
      // for a much more robust stability check.
      return `${el.tagName}${el.id?'#'+el.id:''}:${el.classList.length}:${Math.round(r.width)}x${Math.round(r.height)}@${Math.round(r.left)},${Math.round(r.top)}`;
    }).join('|');
  };
}

Crisis 2: The "Big Container" Problem

The Issue: How do you reliably identify a large structural element, like an entire navigation bar or a specific dashboard card? These containers often lack unique IDs. What Didn't Work: Attribute-based selectors are useless here. They are either too generic (div[class*="navbar"]) and match multiple elements, or they are too specific and break with the slightest design change.

Our Solution: We use XPath (recordeed) to locate potential candidates for these large containers. But xpath strategy is highly prone to false positives. So.. Validate with a Sophisticated Validator:The real magic is in our XPathValidator. When we record a flow, we don't just store the container's XPath; we also store a "fingerprint" of its children—the icons, links, and text labels inside it. During playback, the validator inspects any element found by XPath and checks if it contains the children we expect. And we select the children with attributes!!

This approach gives us the best of both worlds. We use a broad tool (XPath) to find the neighborhood, then a precise tool (XPathValidator) to confirm we're at the right place.

// A conceptual look at our XPathValidator
const validateByChildren = (candidateElement, recordedChildren) => {
  let matchedChildren = 0;
  
  for (const recordedChild of recordedChildren) {
    let bestMatchScore = 0;
    
    // Find the best matching actual child for the recorded child
    for (const actualChild of candidateElement.children) {
      let currentScore = 0;
      // The validator checks tag name, attributes, classes, and text content
      // to calculate a match score for each child.
      if (actualChild.tagName === recordedChild.tagName) currentScore += 0.4;
      // ... more scoring logic ...
      
      bestMatchScore = Math.max(bestMatchScore, currentScore);
    }
    
    // If a reasonably good match was found for this child, count it.
    if (bestMatchScore > 0.6) {
      matchedChildren++;
    }
  }
  
  // Calculate final confidence based on how many children matched.
  const confidence = matchedChildren / recordedChildren.length;
  return confidence > 0.5; // Only accept if >50% of children match
};

Crisis 3: The Modal Maze

The Issue: The button you need is inside a modal rendered in a React portal. It's technically on the page, but invisible to a standard query.

Our Solution: Smart Search Root Detection. Instead of always searching the entire document, we first identify the "active" part of the page.

const getSearchRoots = () => {
  const roots = [];
  
  // 1. Look for portals first, a common pattern for modals.
  const portals = document.querySelectorAll('[data-portal="true"]');
  if (portals.length > 0) roots.push(...portals);
  
  // 2. Find specific modal *content* containers from popular libraries.
  // Note: The actual implementation has a much longer list of selectors.
  const modalContent = document.querySelectorAll(
    '.mantine-Modal-content, .modal-body, .MuiDialog-paper'
  );
  
  // 3. Sort by z-index to search the top-most modal first.
  const sortedModals = Array.from(modalContent).sort((a, b) => {
    const zIndexA = parseInt(getComputedStyle(a).zIndex) || 0;
    const zIndexB = parseInt(getComputedStyle(b).zIndex) || 0;
    return zIndexB - zIndexA;
  });
  
  roots.push(...sortedModals);
  
  // 4. Always fallback to the document as the final search root.
  roots.push(document); 
  return roots;
};

Crisis 4: The False Positive Trap

The Issue: You're looking for a "Submit" button. Your finder returns a button with textContent="Submit". Success! Except it's the wrong one—there are three on the page.

What Didn't Work: Making selectors more specific. This just trades false positives for brittleness.

Our Solution: Validate elements by their children and context. When an XPath strategy returns a candidate, we don't just trust it. We validate it with a XPathValidator that inspects its children.

During recording, we capture not just the target element, but also a profile of its immediate children (tag, attributes, classes, text).

During playback, our validator calculates a confidence score.

// A conceptual look at our XPathValidator
const validateByChildren = (candidateElement, recordedChildren) => {
  let matchedChildren = 0;
  
  for (const recordedChild of recordedChildren) {
    let bestMatchScore = 0;
    
    // Find the best matching actual child for the recorded child
    for (const actualChild of candidateElement.children) {
      let currentScore = 0;
      if (actualChild.tagName === recordedChild.tagName) currentScore += 0.4;
      // ... check attributes, classes, text, etc., adding to score ...
      bestMatchScore = Math.max(bestMatchScore, currentScore);
    }
    
    // If a reasonably good match was found, count it.
    if (bestMatchScore > 0.6) {
      matchedChildren++;
    }
  }
  
  // Calculate final confidence
  const confidence = matchedChildren / recordedChildren.length;
  return confidence > 0.5; // Only accept if >50% of children match
};