Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Why is this symlink() call returning successfully while apparently failing to create the sym-link?

−0

Summary

I'm building an internal system (hardware simulator, using Rust) to help test some Python-based services that talk to hardware. (The services talk to hardware via TTYs.) To trick the Python services into "believing" they're talking to the hardware they expect, I create some PTYs, where the master side is used by my simulator and the slave side of the PTY is given over to the Python-based service.

Since the Python service looks for its PTY by using a specific name, I create a symbolic link that points back to the slave PTY mentioned above, but using the name the Python service expects --or so I thought.

The problem is that the symlink function reports it's being successful, but no link actually exists in the file system. I'm not yet sure what I may be missing.

My question is: Does anyone know why this is happening and how to fix it?

Details

I'm using the std::os::unix::fs::symlink function to create the sym-link. The function's documentation is clear about its usage, e.g. symlink("a.txt", "b.txt") would create a sym-link called b.txt that points back to a.txt. (Assuming a.txt already exists.)

When I use the function in the simulator, I observe the following:

The function call returns successfully, and
No actual sym-link can be found in the file system.

I wrote a simple test program (below) to verify my usage, and it works fine:

use std::os::unix::fs::symlink;

fn main() {
    symlink("/tmp/src.txt", "/tmp/dst.txt").expect("symlink failed");
}

The above shows that I'm using it correctly, with dst.txt being the link that points back to src.txt. Also, the above would've failed if I had the arguments inverted or didn't have the correct permissions, so it's not that, either.

I also ran the simulator with strace to check what the lower-level syscall was actually doing (e.g. maybe that was failing but the Rust std library was not handling the error?), but it shows a successful return code:

symlink("/dev/pts/5", "/home/<user>/Projects/vpanel/ttymxc4") = 0

This is despite the fact that the symlink /home/<user>/Projects/vpanel/ttymxc4 does not really exist. A simple visual inspection with the ls command in the above directory does not show the file, the Python service complains that it cannot find the symlink to its PTY, and the simulator itself reports a panic! when it tries to clean up after itself but fails to find the symlink to remove it:

Drop for ServiceProxy { pty: OpenptyResult { master: 9, slave: 10 }, fspath: "/home/<user>/Projects/vpanel/ttymxc4", timeout: 8s }
thread 'tokio-runtime-worker' panicked at 'remove_file failed: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/proxy.rs:262:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
task join failed: JoinError::Panic(...)

The backtrace is not very helpful.

Reference Code

This is what the relevant code looks like in the actual simulator:

// main.rs
#[tokio::main]
async fn main() -> Result<(), Error> {
    let m = App::from(load_yaml!("cli.yaml")).get_matches();
    let ptys: Vec<&str> = m
        .values_of("ptys")
        .expect("Missing PTY paths")
        .collect();

    let proxies = ptys
        .iter()
        .map(|pty| ServiceProxy::new(*pty))
        .collect();

    // ...
}

// proxy.rs
impl ServiceProxy {
    pub fn new(symlink_path: &str) -> Self {
        let pty = openpty(None, None).expect("openpty failed");
        let cstr = unsafe { CStr::from_ptr(ttyname(pty.slave)) };
        let slave_path = String::from(cstr.to_str().expect("CStr::to_str failed"));
        
        // This call claims success, but no link ever shows up
        symlink(&slave_path, &symlink_path).expect("PTY symlink failed");

        // ...
    }
    // ...
}

impl ops::Drop for ServiceProxy {
    fn drop(&mut self) {
        eprintln!("Drop for {:?}", self);
        close(self.pty.master).expect("close master failed");
        close(self.pty.slave).expect("close slave failed");
        
        // The panic shown earlier comes from here
        remove_file(&self.fspath).expect("remove_file failed");
    }
}

Remarks

I don't think any of these should make a difference, but just in case

the simulator is using tokio (async/await futures, tasks, etc);
the simulator is working with PTYs instead of "regular" files like the short Rust test/example;
a simple Python test script using os.symlink(...) works fine.

Update

I added the following code to the simulator, as a test, right after the symlink call:

if Path::new(&symlink_path).exists() {
    eprintln!("What?!: {}", symlink_path);
}
for p in std::fs::read_dir("/home/<user>/Projects/vpanel").unwrap() {
    eprintln!("{:?}", p.unwrap().path().display());
}

Interestingly, it lists the symlink as being present (irrelevant stuff omitted):

What?!: /home/<user>/Projects/vpanel/ttymxc4
...
"/home/<user>/Projects/vpanel/ttymxc4"
...

However, it's never listed by commands such as ls -la or anything. To make sure that there weren't any unexpected remove_file calls, I checked as follows:

$ find src -name '*.rs' | xargs grep remove_file
src/proxy.rs:    fs::remove_file,
src/proxy.rs:        remove_file(&self.fspath).expect("remove_file failed");

The only hit for an actual call in the code base is from the std::ops::Drop implementation. (The top hit is from the use std::{..., fs::remove_file, ...}; block.)

In short, there're no hidden/unexpected/accidental calls to remove_file after the symlink call. There's only the one we already knew about.

rust linux

posted over 4 years ago

CC BY-SA 4.0

3y ago

ghost-in-the-zsh‭

703 reputation 8 12 88 45

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

1 comment thread

General comments (7 comments)

1 answer

Score Active Age

−0

Summary

I've fixed the issue and what follows is the best explanation I have so far. What I had described in the OP were some of the observations, but I'll be including more detail so that it's (hopefully) easy to follow. This includes forks, COWs, and dead children.

Additional Observations

I got Rust to list the directory contents immediately after the symlink sys-call shown in the OP, with this simple loop:

for p in std::fs::read_dir("/home/<user>/Projects/vpanel").unwrap() {
    println!("{}", p.unwrap().path().display());
}

It actually showed that the sym-link was present in the file system. As expected, the symlink behavior described in the OP was an effect --not the cause.

I discovered that the drop implementation was (unexpectedly) being called twice, which, in short, did not really make sense due to how Rust handles object lifetimes. (I don't call drop on anything manually; the proxy gets destroyed automatically based on Rust's scoping rules.) This unexpected behavior was confirmed in the debugger (hitting the breakpoint twice) and also with a simple line of code that left 2 files behind instead of 1:

// ...
std::process::Command::new("mktemp")
    .arg("drop.XXXXXXX")
    .output()
    .expect("mktemp::drop failed");

However, I never saw more than a single eprintln! message...

Getting to the Point

Originally, the simulator took some CLI options including the full file system path for the PTY symlink and also a command to execute as a child process, e.g. vpanel -p $(pwd)/ttymxc4 -c 'python3 /tmp/script.py'. The main program would fork itself and the child process would use execvp to replace itself with the -command given on the CLI. Here's a code excerpt:

#[tokio::main]
async fn main() {
    // ...
    let pty = forkpty(None, None)?;
    match pty.fork_result.is_parent() {
        true => run_simulator(panel, proxies, sigint).await?,
        false => exec_child(&m),
    }
    close(pty.master)?;
    // ...
}

// ...

fn exec_child(m: &ArgMatches) {
    let args: Vec<CString> = m.value_of("child_cmd")
        .expect("Missing child process command")
        .split(' ')
        .map(|s| CString::new(s).expect("CString from String failed"))
        .collect();
    let argv: Vec<&CStr> = args
        .iter()
        .map(|s| s.as_c_str())
        .collect();
    execvp(argv[0], &argv)
        .expect(format!("execvp failed: {:?}", argv).as_str());
}

Due to some changes on how I was testing things, I stopped using the -c ... option, but the code was still forking and trying to execvp the child, and this has some implications. When a GNU+Linux process forks, the child process is a copy of the parent and, due to COW, that includes the parent's own memory pages.

Actual Cause

The best explanation I currently have is that when I stopped using the -c option, the exec_child function would panic!s instead of replacing itself with the actual command's code on execvp. (This much is probably obvious, since there's no CLI command for it to replace itself with.) We never saw the child's I/O in our shell because we were never directly connected to its I/O streams. However, the child would still "see" the pre-existing proxy object instance created by the parent b/c of the shared memory pages.

When the child copy panics and exits, the obj instance it sees also goes out of scope, and the Rust borrow-checker decides its time to drop it. This drop is the first (and unexpected) breakpoint hit, but it is successful in removing the link b/c it had just been created and it's actually there. This happens almost immediately after launching the main program (likely microseconds), explaining why I could never see the sym-link in the file system when checking, but would see it listed from Rust as mentioned previously.

The parent continues to run while this is happening, but when it wants to terminate, it hits the breakpoint a(n expected) second time b/c it's its own turn to drop the (original) proxy obj instance (for going out of scope). However, the drop implementation can no longer find the sym-link at this point, b/c that had been successfully, though unexpectedly, removed by the now dead child on the first drop. Since my shell is connected to the parent's I/O streams, I always got to see that panic! message.

In short: The parent forked a copy of itself as a child. Thanks to COW, the child shared the parent's memory space and could see the proxy/object instance responsible for managing the sym-link's life cycle. The child copy failed and exited, causing Rust to drop the child's own copy of the instance. This unexpected drop would then cause the object to "clean its own mess", taking the sym-link with it. Some time later, the parent would terminate and its own original proxy instance would get droped, causing it to also "clean up" a "mess" that had already been cleaned up and no longer existed. This would produce the documented panics.

This is why removing the left-over forkpty call following the removal of the -c option actually fixed the issue. It's not an immediately obvious or intuitive error to figure out and required some hindsight knowledge to understand why this even happened in the first place.

The code without the left-over fork looks as follows:

#[tokio::main]
async fn main() {
    // ...
    run_simulator(panel, proxies, sigint).await?;
    // ...
}

posted over 4 years ago

CC BY-SA 4.0

3y ago

ghost-in-the-zsh‭

703 reputation 8 12 88 45

Copy Link

Raw

Markdown

History

Communities

Why is this symlink() call returning successfully while apparently failing to create the sym-link?

Summary

Details

Reference Code

Remarks

Update

1 comment thread

1 answer

Summary

Additional Observations

Getting to the Point

Actual Cause

0 comment threads