Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Delete all occurrences of a character in a webpage with vanilla JavaScript

−1

Primarily for learning, I would like to try to delete a specific character (letter or number or special character), wherever it is in a webpage, with vanilla JavaScript.

The webpage won't change at all; it will stay almost the same but just without that character.

It doesn't matter where that character would appear:

In the start of a line
In the end of a line
Inside a word
Outside a word
Between two field separators (whitespaces/tabulations, etc.)

Wherever it will appear, it will be deleted.

What would be a vanilla JavaScript "brute force" tool to do so and how would you prefer to do so if asked by a client?

If I am not mistaken "Tree walker" is a JavaScript concept which should be useful here.

javascript replace

posted about 3 years ago

CC BY-SA 4.0

3y ago

deleted user

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

2 comment threads

replace() on innerHTML (5 comments)

Some musings... (3 comments)

3 answers

Score Active Age

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

−3

Found dangerous by hkotsubo‭

The following users marked this post as Dangerous:

User	Comment	Date
hkotsubo‭	Thread: Dangerous As explained in other answer and in [another comments thread](https://software.codid...	May 20, 2022 at 13:26

Credit to user Zakk which exampled a solution here.

document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*');

Which I've adjusted to my particular need (a particular scope inside the body scope) this way:

document.querySelector('.new_pages').innerHTML = document.querySelector('.new_pages').innerHTML.replace(/x/g, '');

posted about 3 years ago

CC BY-SA 4.0

3y ago by Zakk‭

deleted user

Copy Link

Raw

Markdown

History

2 comment threads

Dangerous (1 comment)

"The webpage won't change at all; it will stay almost the same but just without that character." (18 comments)

−0

With the help of a little recursion, it's straightforward to go through all text nodes and replace their contents:

function replaceIn(e) {
  if (e.nodeType == Node.TEXT_NODE) {
    e.nodeValue = e.nodeValue.replaceAll("a", "");
  } else {
    for (const child of e.childNodes) {
      replaceIn(child);
    }
  }
}

replaceIn(document.body);

posted about 3 years ago

CC BY-SA 4.0

3y ago

meriton‭

2375 reputation 7 56 220 35

Copy Link

Raw

Markdown

History

1 comment thread

Hello Meriton ! I think about deleting my own answer so only yours would be kept but I am having hard... (2 comments)

−0

Some comments and your answer are doing this:

document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*');

But as explained in the comments, this is not a good solution, because it can break the page's HTML. Actually, even if you restrict it to a specific element (instead of the whole document.body), it can still destroy the HTML of that element.

Let's suppose I have this HTML:

<div>
  text, text, some text
  <p id="xyz">whatever</p>
  <textarea>text, text, more text</textarea>
</div>

And this CSS:

#xyz::first-letter {
  text-transform: uppercase;
}

This will be rendered like this:

HTML's rendered

Note that the letter "W" in the word "Whatever" is uppercased, due to CSS text-transform rule.

If I run document.body.innerHTML = document.body.innerHTML.replace(/x/g, '*'); on that page, it becomes this:

broken HTML

That's because we replaced all letters x by *, in the whole HTML. If we print it (call console.log(document.body.innerHTML) after replace), we'll see that the HTML is like this:

<div>
  te*t, te*t, some te*t
  <p id="*yz">whatever</p>
  <te*tarea>te*t, te*t, more te*t</te*tarea>
</div>

Note that the paragraph's id has changed from xyz to *yz, so the CSS rule is no longer applied (hence, the word "whatever" doesn't have an uppercase "W" anymore), and the textarea simply disappeared, because the tag name was changed to te*tarea. The text inside it, though, was still displayed, because browsers are lenient and tries their best to display everything they can, even if the HTML is invalid.

Anyway, replacing everything in the innerHTML property is dangerous and very error-prone, because its value is the whole element's HTML. And HTML contains lots of different structures that must not be changed (such as tag names and attributes), so a full replace is not the way to go.

In a comment, you said that the replacement should be made only in "parsed and rendered document", therefore I guess this shouldn't include HTML tags and attributes.

In that case, @meriton's answer is a good solution, as it changes only the text nodes (the ones that contains "rendered text" - the characters you actually see on the page). With that, you keep the HTML structure (tags and attributes) untouched.

Just to provide another alternative, you could use a TreeWalker (explained in more details here) to achieve the same result:

const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
let node;
while ((node = walker.nextNode())) {
    node.textContent = node.textContent.replace(/x/g, '*');
}

According to the documentation, createTreeWalker returns a TreeWalker object, that can be traversed to perform whatever you need with the elements.

In this case, I'm searching for all the text nodes (as indicated by NodeFilter.SHOW_TEXT), so there's no risk of breaking the HTML structure. Now the page is rendered like this (after I run the JavaScript code):

HTML structure preserved

Caveat: pseudo-elements

The solutions described so far (@meriton's recursive function and the TreeWalker code above) don't handle pseudo-elements.

Let's suppose I change my CSS to this:

#xyz::first-letter {
  text-transform: uppercase;
}

p::before {
  content: "xbefore ";
}

p::after {
  content: " xafter";
}

Now the HTML will be rendered like this:

HTML with pseudo elements

If I run the TreeWalker code (or @meriton's recursive function), the pseudo-elements after and before are not changed:

pseudo elements did not change

Note that the letter "x" in the before and after texts are not changed.

In that case, we could adapt this solution to include the pseudo-elements:

// first I replace all text nodes
const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
let node;
while ((node = walker.nextNode())) {
    node.textContent = node.textContent.replace(/x/g, '*');
}

// then I replace all pseudo elements
const pseudoElements = ['after', 'before']; // include here all pseudo elements you want to check
// search for body and all its descendants
for (const e of [ document.body, ...document.body.querySelectorAll("*") ]) {
    for (const psEl of pseudoElements) { // for each pseudo element
        const style = window.getComputedStyle(e, `::${psEl}`);
        let content = style.getPropertyValue('content');
        if (content && content !== 'none') { // if there's something to be replaced
            const sheet = document.styleSheets[0];
            content = content.replace(/x/g, '*');
            // insert rule at the end, so it overwrites any others
            sheet.insertRule(`${e.tagName.toLowerCase()}::${psEl} { content: ${content} }`, sheet.cssRules.length);
        }
    }
}

And now the pseudo elements will also be changed:

pseudo elements also changed

As a side note, we could generalize this solution to a function like this:

function replaceAllText(element, oldContent, newContent, ...pseudoElements) {
    // first I replace all text nodes
    const walker = document.createTreeWalker(element, NodeFilter.SHOW_TEXT);
    let node;
    while ((node = walker.nextNode())) {
        node.textContent = node.textContent.replace(oldContent, newContent);
    }

    // if there are no pseudo elements, exit the function
    if (pseudoElements.length === 0) return;

    // then I replace all pseudo elements
    // search for the element and all its descendants
    for (const e of [ element, ...element.querySelectorAll("*") ]) {
        for (const psEl of pseudoElements) { // for each pseudo element
            const style = window.getComputedStyle(e, `::${psEl}`);
            let content = style.getPropertyValue('content');
            if (content && content !== 'none') { // if there's something to be replaced
                const sheet = document.styleSheets[0];
                content = content.replace(oldContent, newContent);
                // insert rule at the end, so it overwrites any others
                sheet.insertRule(`${e.tagName.toLowerCase()}::${psEl} { content: ${content} }`, sheet.cssRules.length);
            }
        }
    }
}

The function receives:

the element in which the replacements will be made: so you can pass document.body to change the whole page, of any other element, to change just part of the page
the old and new contents: those will be simply passed to replace, so anything this method accepts can be used here
a list of pseudo elements. This is optional: if none is passed, the pseudo elements won't be changed

Examples:

change the whole page, but not the pseudo elements:
```
replaceAllText(document.body, /x/g, '*')
```

change the whole page, and the after pseudo element:

replaceAllText(document.body, /x/g, '*', 'after')

change the whole page, and the after and before pseudo elements:

replaceAllText(document.body, /x/g, '*', 'after', 'before')

change only a specific element (for instance, the textarea), but not the pseudo elements:
```
replaceAllText(document.querySelector('textarea'), /x/g, '*')
```
and so on...

posted about 3 years ago

CC BY-SA 4.0

hkotsubo‭

5235 reputation 21 70 590 239

Copy Link

Raw

Markdown

History

1 comment thread

General comment broadening on the entire thread (2 comments)

Communities

Delete all occurrences of a character in a webpage with vanilla JavaScript

2 comment threads

3 answers

2 comment threads

1 comment thread

Caveat: pseudo-elements

1 comment thread