Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

66%

+2 −0

Q&A Delete all occurrences of a character in a webpage with vanilla JavaScript

Some comments and your answer are doing this: document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*'); But as explained in the comments, this is not a good solution, because it can...

posted 3y ago by hkotsubo‭

Answer

#1: Initial revision by

hkotsubo‭ · 2022-04-27T13:06:44Z (about 3 years ago)

Copy Link

Raw

Markdown

Some comments and [your answer](https://software.codidact.com/posts/286304/286307#answer-286307) are doing this:

```javascript
document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*');
```

But as explained in the comments, this is not a good solution, because it can break the page's HTML. Actually, even if you restrict it to a specific element (instead of the whole `document.body`), it can still destroy the HTML of that element.

Let's suppose I have this HTML:

```html
<div>
  text, text, some text
  <p id="xyz">whatever</p>
  <textarea>text, text, more text</textarea>
</div>
```

And this CSS:

```css
#xyz::first-letter {
  text-transform: uppercase;
}
```

This will be rendered like this:

![HTML's rendered](https://software.codidact.com/uploads/w3T4M3gsbSe2rY6VNVdTL6JU)

Note that the letter "W" in the word "Whatever" is uppercased, due to CSS `text-transform` rule.

If I run `document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*');` on that page, it becomes this:

![broken HTML](https://software.codidact.com/uploads/jnG9VyWEEnASfMFoy6ane1io)

That's because we replaced all letters `x` by `*`, **in the whole HTML**. If we print it (call `console.log(document.body.innerHTML)` after `replace`), we'll see that the HTML is like this:

```html
<div>
  te*t, te*t, some te*t
  <p id="*yz">whatever</p>
  <te*tarea>te*t, te*t, more te*t</te*tarea>
</div>
```

Note that the paragraph's `id` has changed from `xyz` to `*yz`, so the CSS rule is no longer applied (hence, the word "whatever" doesn't have an uppercase "W" anymore), and the `textarea` simply disappeared, because the tag name was changed to `te*tarea`. The text inside it, though, was still displayed, because browsers are lenient and tries their best to display everything they can, even if the HTML is invalid.

Anyway, replacing everything in the `innerHTML` property is dangerous and very error-prone, because its value is the whole element's HTML. And HTML contains lots of different structures that must not be changed (such as tag names and attributes), so a full `replace` is not the way to go.

---

In a [comment](https://software.codidact.com/comments/thread/6166#comment-17191), you said that the replacement should be made only in "_parsed and rendered document_", therefore I guess this shouldn't include HTML tags and attributes.

In that case, [@meriton's answer](https://software.codidact.com/posts/286304/286317#answer-286317) is a good solution, as it changes only the text nodes (the ones that contains "rendered text" - the characters you actually see on the page). With that, you keep the HTML structure (tags and attributes) untouched.

Just to provide another alternative, you could use a `TreeWalker` (explained in more details [here](https://software.codidact.com/posts/284935/284941#answer-284941)) to achieve the same result:

```javascript
const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
let node;
while ((node = walker.nextNode())) {
    node.textContent = node.textContent.replace(/x/g, '*');
}
```

According to the [documentation](https://developer.mozilla.org/en-US/docs/Web/API/Document/createTreeWalker), `createTreeWalker` returns a [`TreeWalker` object](https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker), that can be traversed to perform whatever you need with the elements.

In this case, I'm searching for all the text nodes (as indicated by `NodeFilter.SHOW_TEXT`), so there's no risk of breaking the HTML structure. Now the page is rendered like this (after I run the JavaScript code):

![HTML structure preserved](https://software.codidact.com/uploads/Zk9aumsz469ELx5TQF46v9vE)

---

# Caveat: pseudo-elements

The solutions described so far (@meriton's recursive function and the `TreeWalker` code above) don't handle pseudo-elements.

Let's suppose I change my CSS to this:

```css
#xyz::first-letter {
  text-transform: uppercase;
}

p::before {
  content: "xbefore ";
}

p::after {
  content: " xafter";
}
```

Now the HTML will be rendered like this:

![HTML with pseudo elements](https://software.codidact.com/uploads/dkuvnwwrNQSqF4XEx1mGFuEm)


If I run the `TreeWalker` code (or @meriton's recursive function), the pseudo-elements `after` and `before` are not changed:

![pseudo elements did not change](https://software.codidact.com/uploads/7fH12iKYY9o7bJs5TNT97bU6)

Note that the letter "x" in the before and after texts are not changed.

In that case, we could adapt [this solution](https://software.codidact.com/posts/284842/284853#answer-284853) to include the pseudo-elements:

```javascript
// first I replace all text nodes
const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
let node;
while ((node = walker.nextNode())) {
    node.textContent = node.textContent.replace(/x/g, '*');
}

// then I replace all pseudo elements
const pseudoElements = ['after', 'before']; // include here all pseudo elements you want to check
// search for body and all its descendants
for (const e of [ document.body, ...document.body.querySelectorAll("*") ]) {
    for (const psEl of pseudoElements) { // for each pseudo element
        const style = window.getComputedStyle(e, `::${psEl}`);
        let content = style.getPropertyValue('content');
        if (content && content !== 'none') { // if there's something to be replaced
            const sheet = document.styleSheets[0];
            content = content.replace(/x/g, '*');
            // insert rule at the end, so it overwrites any others
            sheet.insertRule(`${e.tagName.toLowerCase()}::${psEl} { content: ${content} }`, sheet.cssRules.length);
        }
    }
}
```

And now the pseudo elements will also be changed:

![pseudo elements also changed](https://software.codidact.com/uploads/CXfgekSRHPBpk6EjQvmXCSh9)

---

As a side note, we could generalize this solution to a function like this:

```javascript
function replaceAllText(element, oldContent, newContent, ...pseudoElements) {
    // first I replace all text nodes
    const walker = document.createTreeWalker(element, NodeFilter.SHOW_TEXT);
    let node;
    while ((node = walker.nextNode())) {
        node.textContent = node.textContent.replace(oldContent, newContent);
    }

    // if there are no pseudo elements, exit the function
    if (pseudoElements.length === 0) return;

    // then I replace all pseudo elements
    // search for the element and all its descendants
    for (const e of [ element, ...element.querySelectorAll("*") ]) {
        for (const psEl of pseudoElements) { // for each pseudo element
            const style = window.getComputedStyle(e, `::${psEl}`);
            let content = style.getPropertyValue('content');
            if (content && content !== 'none') { // if there's something to be replaced
                const sheet = document.styleSheets[0];
                content = content.replace(oldContent, newContent);
                // insert rule at the end, so it overwrites any others
                sheet.insertRule(`${e.tagName.toLowerCase()}::${psEl} { content: ${content} }`, sheet.cssRules.length);
            }
        }
    }
}
```

The function receives:

- the element in which the replacements will be made: so you can pass `document.body` to change the whole page, of any other element, to change just part of the page
- the old and new contents: those will be simply passed to `replace`, so anything this method accepts can be used here
- a list of pseudo elements. This is optional: if none is passed, the pseudo elements won't be changed

Examples:

- change the whole page, but not the pseudo elements:
    ```javascript
    replaceAllText(document.body, /x/g, '*')
    ```
- change the whole page, and the `after` pseudo element:
    ```javascript
    replaceAllText(document.body, /x/g, '*', 'after')
    ```
- change the whole page, and the `after` and `before` pseudo elements:
    ```javascript
    replaceAllText(document.body, /x/g, '*', 'after', 'before')
    ```
- change only a specific element (for instance, the `textarea`), but not the pseudo elements:
    ```javascript
    replaceAllText(document.querySelector('textarea'), /x/g, '*')
    ```
- and so on...

Communities

Post History