Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

71%
+3 −0
Q&A Why is this symlink() call returning successfully while apparently failing to create the sym-link?

Summary I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (h...

posted 3y ago by ghost-in-the-zsh‭  ·  edited 2y ago by ghost-in-the-zsh‭

Answer
#5: Post edited by user avatar ghost-in-the-zsh‭ · 2021-12-03T18:33:10Z (over 2 years ago)
Replace account with placeholder
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause
  • The best explanation I currently have is that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is probably obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the sym-link in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the sym-link at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the left-over `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • The code without the left-over `fork` looks as follows:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • run_simulator(panel, proxies, sigint).await?;
  • // ...
  • }
  • ```
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/<user>/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause
  • The best explanation I currently have is that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is probably obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the sym-link in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the sym-link at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the left-over `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • The code without the left-over `fork` looks as follows:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • run_simulator(panel, proxies, sigint).await?;
  • // ...
  • }
  • ```
#4: Post edited by user avatar ghost-in-the-zsh‭ · 2020-12-19T00:52:37Z (over 3 years ago)
Minor wording update and code sample
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause?;
  • The best explanation I currently have is that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause
  • The best explanation I currently have is that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is probably obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the sym-link in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the sym-link at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the left-over `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • The code without the left-over `fork` looks as follows:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • run_simulator(panel, proxies, sigint).await?;
  • // ...
  • }
  • ```
#3: Post edited by user avatar ghost-in-the-zsh‭ · 2020-12-18T02:03:16Z (over 3 years ago)
Minor wording update
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause?;
  • What I *think* was happening here was that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause?;
  • The best explanation I currently have is that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
#2: Post edited by user avatar ghost-in-the-zsh‭ · 2020-12-10T10:42:19Z (over 3 years ago)
Minor wording and formatting improvements
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I asked Rust to list the directory contents immediately after returning from the `symlink` call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the `symlink` call was indeed creating the link in the file system.
  • As expected, the symlink behavior described in the OP was an effect. I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • What I *think* was happening here was that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
  • # Summary
  • I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.
  • # Additional Observations
  • I got Rust to list the directory contents immediately after the `symlink` sys-call shown in the OP, with this simple loop:
  • ```rust
  • for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
  • println!("{}", p.unwrap().path().display());
  • }
  • ```
  • It actually showed that the sym-link was present in the file system. As expected, the `symlink` behavior described in the OP was an effect --not the cause.
  • I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:
  • ```rust
  • // ...
  • std::process::Command::new("mktemp")
  • .arg("drop.XXXXXXX")
  • .output()
  • .expect("mktemp::drop failed");
  • ```
  • However, I never saw more than a single `eprintln!` message...
  • # Getting to the Point
  • Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:
  • ```rust
  • #[tokio::main]
  • async fn main() {
  • // ...
  • let pty = forkpty(None, None)?;
  • match pty.fork_result.is_parent() {
  • true => run_simulator(panel, proxies, sigint).await?,
  • false => exec_child(&m),
  • }
  • close(pty.master)?;
  • // ...
  • }
  • // ...
  • fn exec_child(m: &ArgMatches) {
  • let args: Vec<CString> = m.value_of("child_cmd")
  • .expect("Missing child process command")
  • .split(' ')
  • .map(|s| CString::new(s).expect("CString from String failed"))
  • .collect();
  • let argv: Vec<&CStr> = args
  • .iter()
  • .map(|s| s.as_c_str())
  • .collect();
  • execvp(argv[0], &argv)
  • .expect(format!("execvp failed: {:?}", argv).as_str());
  • }
  • ```
  • Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.
  • ## Actual Cause?;
  • What I *think* was happening here was that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.
  • When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.
  • The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.
  • **In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.
  • This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.
#1: Initial revision by user avatar ghost-in-the-zsh‭ · 2020-12-10T08:05:51Z (over 3 years ago)
# Summary

I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes `fork`s, COWs, and dead children.

# Additional Observations

I asked Rust to list the directory contents immediately after returning from the `symlink` call shown in the OP, with this simple loop:

```rust
for p in std::fs::read_dir("/home/ray/Projects/vpanel").unwrap() {
    println!("{}", p.unwrap().path().display());
}
```

It actually showed that the `symlink` call was indeed creating the link in the file system.

As expected, the symlink behavior described in the OP was an effect. I discovered that the `drop` implementation was (unexpectedly) being called *twice*, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call `drop` on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:

```rust
// ...
std::process::Command::new("mktemp")
    .arg("drop.XXXXXXX")
    .output()
    .expect("mktemp::drop failed");
```

However, I never saw more than a single `eprintln!` message...

# Getting to the Point

Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. `vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'`. The main program would `fork` itself and the child process would use `execvp` to replace itself with the `-c`ommand given on the CLI. Here's a code excerpt:

```rust
#[tokio::main]
async fn main() {
    // ...
    let pty = forkpty(None, None)?;
    match pty.fork_result.is_parent() {
        true => run_simulator(panel, proxies, sigint).await?,
        false => exec_child(&m),
    }
    close(pty.master)?;
    // ...
}

// ...

fn exec_child(m: &ArgMatches) {
    let args: Vec<CString> = m.value_of("child_cmd")
        .expect("Missing child process command")
        .split(' ')
        .map(|s| CString::new(s).expect("CString from String failed"))
        .collect();
    let argv: Vec<&CStr> = args
        .iter()
        .map(|s| s.as_c_str())
        .collect();
    execvp(argv[0], &argv)
        .expect(format!("execvp failed: {:?}", argv).as_str());
}
```

Due to some changes on how I was testing things, I stopped using the `-c ...` option, but the code was still `fork`ing and trying to `execvp` the child, and this has some implications. When a GNU+Linux process `fork`s, the child process is a copy of the parent and, due to [COW](https://en.wikipedia.org/wiki/Copy-on-write), that includes the parent's own memory pages.

What I *think* was happening here was that when I stopped using the `-c` option, the `exec_child` function would `panic!`s instead of replacing itself with the actual command's code on `execvp`. (This much is obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.

When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to `drop` it. This `drop` is the first (and unexpected) breakpoint hit, but it *is* successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the symlink in the file system when checking, but would see it listed from Rust as mentioned previously.

The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to `drop` the (original) proxy obj instance (for going out of scope). However, the `drop` implementation can no longer find the symlink at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first `drop`. Since my shell *is* connected to the parent's I/O streams, I always got to see *that* `panic!` message.

**In short:** The parent `fork`ed a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to `drop` the child's own copy of the instance. This unexpected `drop` would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get `drop`ed, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.

This is why removing the `forkpty` call following the removal of the `-c` option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.