Understanding createTreeWalker method in the context of replacing strings (or parts of them)

−0

How does storing replaced strings in the node variable makes a change in the text appearing to the end user?

In your case, you're changing the textContent property. When accessed, it returns the text content of a node, concatenated with the text content of its descendants, and changing its value will change the element's contents.

For example, let's suppose I have this HTML:

<p id="content">
  A man <span>walks <b>into a <a href="#">bar</a></b> and gets a beer</span>
</p>

This is displayed as:

A man walks into a bar and gets a beer

The same as an image, rendered in my browser (Chrome):

The sentence "A man walks into a bar and gets a beer" as rendered by the browser, as a result of the previous HTML

Note that inside the paragraph there are other tags (span, b and a). But if I get the paragraph's textContent, only the text is returned:

const p = document.querySelector('#content');
console.log(p.textContent); // A man walks into a bar and gets a beer

Because textContent is a string that contains only the text content of the paragraph and its descendants. It doesn't return any child tags that the element might have, only their text contents.

Being a string, you can manipulate it as you'd do with any other string. So calling replace returns another string with the result:

console.log(p.textContent.replace('a', 'b'));

This prints A mbn walks into a bar and gets a beer (because replace('a', 'b') only replaces the first ocurrence of "a" for "b").

But note that it doesn't change the paragraph, because replace returns another string, leaving the original untouched. Only if you set this another string to textContent, the DOM is updated:

// changes DOM, page is updated
p.textContent = p.textContent.replace('a', 'b');

When you set textContent, the paragraph's contents are changed to the modified text. One important detail is that all the paragraph's descendant nodes (span, b and a) are removed, and now the paragraph contains only the text returned by replace:

A mbn walks into a bar and gets a beer

Anyway, when you set an element's textContent to some text, the element will be changed to contain only that text, and all the child tags that the element had will be removed.

Regarding createTreeWalker, we can check in the docs that it creates a TreeWalker, which is an object that represents a document subtree (the document is a tree that contains all the page's elements; think of TreeWalker as a subset of it: a subtree that contains only some elements).

The first argument is the starting point, the element from where you'll start searching for the others: in your case you used document.body, so it'll return all elements inside document.body that satisfies the criteria.

And the criteria is determined by the second argument. In your case, you used NodeFilter.SHOW_TEXT, which tells the function to return only text nodes. Therefore, the result is a TreeWalker that contains only the text nodes inside document.body (which is basically "all text nodes of the document").

To understand what text nodes are, let's consider the same HTML:

<p id="content">
  A man <span>walks <b>into a <a href="#">bar</a></b> and gets a beer</span>
</p>

If I get all the text nodes from this paragraph:

const p = document.querySelector('#content');
const walker = document.createTreeWalker(
  p, // ** Getting only for p, instead of whole document.body **
  NodeFilter.SHOW_TEXT
);

let node;
while ((node = walker.nextNode())) {
  console.log(`node=${node.textContent}`);
}

It'll print:

node=
  A man 
node=walks 
node=into a 
node=bar
node= and gets a beer
node=

Note that each "independent" chunk of text is a separate text node (including line breaks between tags). "A man" is a text between the opening tag <p> and the opening tag <span>, "walks" is the text between <span> and <b> and so on.

One important detail is that text nodes don't have child nodes, so changing their textContent doesn't cause the problem we saw above (it won't remove any child nodes, because there aren't any).

Hence, if you run your code:

while ((node = walker.nextNode())) {
  node.textContent = node.textContent.replace('a', 'b');
}

It will perform the replace in all text nodes and preserve the paragraph's child nodes. The result will be:

A mbn wblks into b bbr bnd gets a beer

So your code is basically getting all text nodes in the document, and for each of those nodes, it's replacing the first "a" for "b". I ran the code in Codidact's page and this was the result:

As a final note, replace('a', 'b') changes only the first ocurrence of "a". If you want to change all ocurrences, you could use replaceAll('a', 'b') or replace(/a/g, 'b').

posted over 3 years ago

CC BY-SA 4.0

3y ago

hkotsubo‭

5235 reputation 21 70 590 239

Copy Link

Raw

Markdown

History

Communities

Understanding createTreeWalker method in the context of replacing strings (or parts of them)

3 comment threads

1 answer

0 comment threads