Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Comments on How to programmatically click through a list of elements if one has to wait for a click to load a set of predefined new elements?
Parent
How to programmatically click through a list of elements if one has to wait for a click to load a set of predefined new elements?
I would like to download the old-time radio show I Love a Mystery from the OTRR website. I figured out how to construct the list of right URLs,
const urlStub = "https://otrr.org/OTRRLibrary/jukebox/";
copy(Array.from(
document.getElementsByTagName('a')).
map( a => `"${urlStub}${a.href.split('/').pop()}"` ).
join(' ')
)
but when I call it with curl
,
for url in <copied-list>; do curl -k -O $url; done
I get a lot of 404 Not Found
. I realized that clicking on the titles will load a player on the page "pre-loading" the audio file, and then I can use curl
.1 It is quite a chore to go through hundreds of files though, so I figured I would make it a bit easier by collecting the "title" elements, and click them one by one until it errors out,
[1] TODO: Why so? It bothers me that I don't understand this part.
const t = Array.from(document.querySelectorAll('td.colTitle'));
t.pop().click()
...
t.pop().click()
but this is still not the best.
The solution thus far
I cobbled together the snippet below that uses MutationObserver
recursively, based on the SO threads below, but I assume that there is a simpler solution (and one that doesn't potentially run out of memory on a page with thousands of episodes).
-
stackoverflow
javascript - How to wait until an element exists? -
stackoverflow
javascript - How to check if an element has been loaded on a page before running a script?
It seems to get to job done, but the Promise
will always remain pending
and it feels very hacky:
function waitForElement(querySelector, elemArr) {
return new Promise((resolve, reject) => {
if (elemArr.length) {
console.log(elemArr);
elemArr.pop().click();
} else {
return resolve();
}
if (document.querySelectorAll(querySelector).length) {
waitForElement(querySelector, elemArr);
}
const observer = new MutationObserver( () => {
if (document.querySelectorAll(querySelector).length) {
observer.disconnect();
waitForElement(querySelector, elemArr);
}
});
observer.observe(document.body, {
childList: true,
subtree: true
});
});
}
t = Array.from(document.querySelectorAll('td.colTitle'))
waitForElement("audio", t)
Post
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
toraritte | (no comment) | Feb 13, 2024 at 23:13 |
It seems to get to job done, but the
Promise
will always remainpending
You must call resolve()
or reject()
(or throw an error) inside the executor function of the Promise
, otherwise the Promise
will remain pending
forever:
The executor's completion state has limited effect on the promise's state:
- The executor return value is ignored. return statements within the executor merely impact control flow and alter whether a part of the function is executed, but do not have any impact on the promise's fulfillment value. If executor exits and it's impossible for resolveFunc or rejectFunc to be called in the future (for example, there are no async tasks scheduled), then the promise remains pending forever.
- If an error is thrown in the executor, the promise is rejected, unless resolveFunc or rejectFunc has already been called.
Your code has to look like this:
function waitForElement(querySelector, elemArr) {
return new Promise((resolve, reject) => {
if (elemArr.length) {
console.log(elemArr);
elemArr.pop().click();
} else {
resolve();
return;
}
if (document.querySelectorAll(querySelector).length) {
waitForElement(querySelector, elemArr).then(() => resolve());
}
const observer = new MutationObserver(() => {
if (document.querySelectorAll(querySelector).length) {
observer.disconnect();
waitForElement(querySelector, elemArr).then(() => resolve());
}
});
observer.observe(document.body, {
childList: true,
subtree: true
});
});
}
const t = Array.from(document.querySelectorAll('td.colTitle'));
const initialPromise = waitForElement("audio", t);
initialPromise.then(() => console.log('ready'));
Your code will only resolve the last recursively created Promise
if elemArr.length
is 0
. By adding then(resolve)
to the returned Promises, you can recursively resolve the parent Promise until initialPromise
is resolved.
I assume that there is a simpler solution (and one that doesn't potentially run out of memory on a page with thousands of episodes)
If you want to keep it simple you can click the "title" elements as you are already doing. Instead of a recursive function you could use something like this:
async function collectUrls() {
// This div gets filled with the audio player (audio element, title text, ...) if you click on an radio show
const audioContainer = document.querySelector('#mainAudio');
// This array keeps track of all audio urls
const urlList = [];
var observer = new MutationObserver(() => {
const audioElement = audioContainer.querySelector('audio');
// If an audio element exists, the source is added to the url list
if (audioElement) {
urlList.push(audioElement.querySelector('source').src);
document.dispatchEvent(new Event('playerSourceChanged'));
}
})
observer.observe(audioContainer, { subtree: true, childList: true });
const titleElements = Array.from(document.querySelectorAll('td.colTitle'));
for (let i = 0; i < titleElements.length; i++) {
const titleElement = titleElements[i];
const changePromise = new Promise(resolve => document.addEventListener('playerSourceChanged', () => resolve(), { once: true }));
titleElement.click();
// Wait until the mutation observer detects a change
await changePromise;
console.log(`${i+1} / ${titleElements.length}`);
// delay to prevent server errors
await new Promise(resolve => setTimeout(() => resolve(), 200))
}
console.log(urlList);
copy(urlList.map(x => `"${x}"`).join(' '));
}
collectUrls();
An event is used to pause the for loop after a click until the server provides the audio file. See Event and EventTarget for more information about events.
Alternatively you could execute the onclick
function yourself. It basically calls https://www.otrr.org/OTRRLibrary/php/files.php?qid=jukeC&ide=<entry-id>
. This seems to prepare the server to provide the mp3
file and returns a json
object with some information and the filename. In addition, this method prevents the automatic loading of audio files into the browser and is therefore faster:
async function collectUrls() {
// This array keeps track of all audio urls
const urlList = [];
const titleElementIds = Array.from(document.querySelectorAll('td.colTitle')).map(e => e.parentElement.dataset.ide);
for (let i = 0; i < titleElementIds.length; i++) {
const ide = titleElementIds[i];
const data = await fetch(`https://www.otrr.org/OTRRLibrary/php/files.php?qid=jukeC&ide=${ide}`).then(r => r.json());
urlList.push(`https://otrr.org/OTRRLibrary/jukebox/${data.file.replaceAll('+', '%20')}`);
console.log(`${i} / ${titleElementIds.length}`);
}
console.log(urlList);
copy(urlList.map(x => `"${x}"`).join(' '));
}
collectUrls();
0 comment threads