Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Code Reviews

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How to programmatically click through a list of elements if one has to wait for a click to load a set of predefined new elements?

Parent

How to programmatically click through a list of elements if one has to wait for a click to load a set of predefined new elements?

+4
−0

I would like to download the old-time radio show I Love a Mystery from the OTRR website. I figured out how to construct the list of right URLs,

const urlStub = "https://otrr.org/OTRRLibrary/jukebox/";
copy(Array.from(
    document.getElementsByTagName('a')).
        map( a => `"${urlStub}${a.href.split('/').pop()}"` ).
        join(' ')
)

but when I call it with curl,

for url in <copied-list>; do curl -k -O $url; done

I get a lot of 404 Not Found. I realized that clicking on the titles will load a player on the page "pre-loading" the audio file, and then I can use curl.1 It is quite a chore to go through hundreds of files though, so I figured I would make it a bit easier by collecting the "title" elements, and click them one by one until it errors out,

[1] TODO: Why so? It bothers me that I don't understand this part.

const t = Array.from(document.querySelectorAll('td.colTitle'));

t.pop().click()
...
t.pop().click()

but this is still not the best.

The solution thus far

I cobbled together the snippet below that uses MutationObserver recursively, based on the SO threads below, but I assume that there is a simpler solution (and one that doesn't potentially run out of memory on a page with thousands of episodes).

It seems to get to job done, but the Promise will always remain pending and it feels very hacky:

function waitForElement(querySelector, elemArr) {
    return new Promise((resolve, reject) => {
    
        if (elemArr.length) {
            console.log(elemArr);
            elemArr.pop().click();
        } else {
            return resolve();
        }
    
        if (document.querySelectorAll(querySelector).length) {
            waitForElement(querySelector, elemArr);
        }
    
        const observer = new MutationObserver( () => {
            if (document.querySelectorAll(querySelector).length) {
                observer.disconnect();
                waitForElement(querySelector, elemArr);
            }
        });
      
        observer.observe(document.body, {
            childList: true, 
            subtree: true
        });
    });
}

t = Array.from(document.querySelectorAll('td.colTitle'))

waitForElement("audio", t)
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+4
−0

It seems to get to job done, but the Promise will always remain pending


You must call resolve() or reject() (or throw an error) inside the executor function of the Promise, otherwise the Promise will remain pending forever:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/Promise#description says:

The executor's completion state has limited effect on the promise's state:

  • The executor return value is ignored. return statements within the executor merely impact control flow and alter whether a part of the function is executed, but do not have any impact on the promise's fulfillment value. If executor exits and it's impossible for resolveFunc or rejectFunc to be called in the future (for example, there are no async tasks scheduled), then the promise remains pending forever.
  • If an error is thrown in the executor, the promise is rejected, unless resolveFunc or rejectFunc has already been called.

Your code has to look like this:

function waitForElement(querySelector, elemArr) {
    return new Promise((resolve, reject) => {
    
        if (elemArr.length) {
            console.log(elemArr);
            elemArr.pop().click();
        } else {
            resolve();
            return;
        }
    
        if (document.querySelectorAll(querySelector).length) {
            waitForElement(querySelector, elemArr).then(() => resolve());
        }
    
        const observer = new MutationObserver(() => {
            if (document.querySelectorAll(querySelector).length) {
                observer.disconnect();
                waitForElement(querySelector, elemArr).then(() => resolve());
            }
        });
      
        observer.observe(document.body, {
            childList: true, 
            subtree: true
        });
    });
}

const t = Array.from(document.querySelectorAll('td.colTitle'));

const initialPromise = waitForElement("audio", t);
initialPromise.then(() => console.log('ready'));

Your code will only resolve the last recursively created Promise if elemArr.length is 0. By adding then(resolve) to the returned Promises, you can recursively resolve the parent Promise until initialPromise is resolved.


I assume that there is a simpler solution (and one that doesn't potentially run out of memory on a page with thousands of episodes)


If you want to keep it simple you can click the "title" elements as you are already doing. Instead of a recursive function you could use something like this:

async function collectUrls() {
    // This div gets filled with the audio player (audio element, title text, ...) if you click on an radio show
    const audioContainer = document.querySelector('#mainAudio');

    // This array keeps track of all audio urls
    const urlList = [];

    var observer = new MutationObserver(() => {
        const audioElement = audioContainer.querySelector('audio');
        // If an audio element exists, the source is added to the url list
        if (audioElement) {
            urlList.push(audioElement.querySelector('source').src);
            document.dispatchEvent(new Event('playerSourceChanged'));
        }
    })
    observer.observe(audioContainer, { subtree: true, childList: true });

    const titleElements = Array.from(document.querySelectorAll('td.colTitle'));
    for (let i = 0; i < titleElements.length; i++) {
        const titleElement = titleElements[i];
        const changePromise = new Promise(resolve => document.addEventListener('playerSourceChanged', () => resolve(), { once: true }));
        titleElement.click();
        // Wait until the mutation observer detects a change
        await changePromise;
        console.log(`${i+1} / ${titleElements.length}`);
        // delay to prevent server errors
        await new Promise(resolve => setTimeout(() => resolve(), 200))
    }

    console.log(urlList);
    copy(urlList.map(x => `"${x}"`).join(' '));
}

collectUrls();

An event is used to pause the for loop after a click until the server provides the audio file. See Event and EventTarget for more information about events.


Alternatively you could execute the onclick function yourself. It basically calls https://www.otrr.org/OTRRLibrary/php/files.php?qid=jukeC&ide=<entry-id>. This seems to prepare the server to provide the mp3 file and returns a json object with some information and the filename. In addition, this method prevents the automatic loading of audio files into the browser and is therefore faster:

async function collectUrls() {
    // This array keeps track of all audio urls
    const urlList = [];

    const titleElementIds = Array.from(document.querySelectorAll('td.colTitle')).map(e => e.parentElement.dataset.ide);

    for (let i = 0; i < titleElementIds.length; i++) {
        const ide = titleElementIds[i];
        const data = await fetch(`https://www.otrr.org/OTRRLibrary/php/files.php?qid=jukeC&ide=${ide}`).then(r => r.json());
        urlList.push(`https://otrr.org/OTRRLibrary/jukebox/${data.file.replaceAll('+', '%20')}`);

        console.log(`${i} / ${titleElementIds.length}`);
    }

    console.log(urlList);
    copy(urlList.map(x => `"${x}"`).join(' '));
}

collectUrls();
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

I keep returning to this answer every couple months:) Thanks again! (1 comment)
I keep returning to this answer every couple months:) Thanks again!
toraritte‭ wrote 10 days ago

I keep returning to this answer every couple months:) Thanks again!