Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How to convert Dos paths to Posix paths in Powershell

Parent

How to convert Dos paths to Posix paths in Powershell

+3
−0

What is the best way to convert Dos paths to Posix paths in Powershell? eg given:

C:\Program Files\PowerShell\Modules\

I want something like:

/Program\ Files/PowerShell/Modules/

Is the only solution to escape spaces and convert backslashes?

I've searched the web but couldn't find any existing Powershell function, and the solutions I found didn't mention escaping spaces, so I don't hold much hope that they're complete.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

What is your goal? (1 comment)
Do you need to escape the spaces? (3 comments)
Post
+5
−0

TL;DR: Use wslpath and hope for the best.

You can use it like this in Powershell: wsl wslpath -u 'C:\\some\\path\\to\\folder\\or\\file.txt'

Long answer

Converting them is hard. There is windows tool wslpath for Windows with WSL, but that may not work in every case. There is also winepath which is for Linux systems with wine installed. Both take advantage that there is a unixlike and a Windowslike system installed that can both access the same file and this tools have the needed information where this files are.

But there are a lot of problems you can run into and wslpath will not always solve all of them.

Slashes vs backslash

Windows accepts \ as separator in paths. In Posix paths \ is just a "normal" character. You have to convert all \ to /. This is probably the easiest part. Something wslpath and winepath should be able to do without any problem.

Drive letters

Windows absolute paths start with a partition letter. In Posix, all files are under /. So it is impossible to convert a absolute path without some knowledge about the system you are running on.

wslpath and winepath should be able to do this without any problem for the current system they are running on. But when you move to a different system, the paths maybe no longer valid.

Limits of Posix filenames

Posix accepts any character (means any byte value) in filenames except '/' and '\0' / NUL. The limits you have are: The maximum length of file or folder name, the maximum length of a path. No file with the name . (current folder) nor .. (parent folder) and a filename needs to have at least 1 character.

There can be however additional limits of the underlying filesystem or NAS-Server.

So there is no problem in this regard for converting from Windows to Posix filenames, but it can be a problem for the other way around.

Spaces in Filenames

Posix and Windows accept spaces in filenames without any problem. Both don't need any special treatment on a kernel or syscall level.

However, a shell, such as bash, may split a string by spaces to create a list of arguments. This is done in the bash application / program and doesn't touch the kernel or the underlying filesystem. To prevent a shell such as bash splitting a string into spaces you can either escape the spaces. For example:

myCommand this\ is\ a\ single\ argument\ to\ myCommand

Or by putting quotes around it, which is probably the simplest solution:

myCommand "this is also a single argument to myCommand"
myCommand 'this too'

Note that this isn't only the case in Posixlike system, but cmd.exe from Windows also needs to use quotes around paths that have spaces.

There is no conversation problem to be expected in this regard.

Encoding

Windows NT (the kernel, not just the Windows OS version) uses UTF-16. AFAIK Posix doesn't require a specific encoding, but it has to be a 8bit encoding. But most modern unixlike systems use UTF-8. You can solve 98% of the problems when you just convert UTF-16 to UTF-8.

The kernel needs to care about encoding as soon as you mount a filesystem with a kernelspace driver that doesn't use the same byte values. For example, when you mount a NTFS drive, the driver has to convert 16 bit characters to a stream of 8 bit bytes and vice versa. On Linux you can specify how this is done with the iocharset mount option.

However, be aware that there can be more problems, depending on what level of compatibility you need. For example some buggy software may converted UCS-2 to CESU-8 or a very old software may use iso8859-1 for 8bit encoding.

Case insensitivity

Windows converts some of the characters case before comparing filename. I don't know the rules when this is done and how. (This may sound simple but comparing case insensitive is a very difficult task that has a lot of cases that need to be defined in a non-obvious way).

This can be a problem when you rename something on Windows in a way that old paths to that file will still work unchanged even when the name is different. On Posix, the new name will be a different file.

For example, you have this files:

Windows:   C:\my\path\to\file.txt
Posix:     /my/path/to/file.txt

And after renaming you have

Windows:   C:\My\Path\to\FILE.txt
Posix:     /My/Path/to/FILE.txt

The Windows paths C:\my\path\to\file.txt and C:\My\Path\to\FILE.txt are equvivalent and point to the same file. On Posix, /my/path/to/file.txt and /My/Path/to/FILE.txt are different paths and most likely different files.

If you know how to convert between the cases in the exact way Windows does it, you can convert all files to lowercase or UPPERCASE. But is that what the user expects? And you have to know the case changing function never changes (which is hard given that this is a complex task and that new letters and new language rules maybe invented in the future).

I don't want to go into all the details of converting cases, since this has a long list of edge cases, information loss, undefined cases and more that makes that a very difficult task. But you will have to deal with it when you want to create a 1:1 mapping between Posix filenames and Windows filenames.

wslpath will give you the correct filename at the time of running, but when you rename the file later, it maybe no longer valid in Posix and WSL maybe not be able to access it with the same path while the path on Windows still works.

8.3 filename

On windows all files with a name longer than 8+3 characters have a 8.3 shortcut. If you want to convert them correctly, you may have to expand them.

Special files

On Windows ./NUL is a special file, and so is ./CON and even C:\thisFolderExist\CON but not C:\thisFolderDoesNotExist\CON. This files: ./CON, C:\thisFolderExist\CON do map to /dev/tty on some (all?) posixlike systems, but C:\thisFolderDoesNotExist\CON does not map to any file.

The question is if you care. Best approach is probably checking if the basename is a special file and if it is, generate some form of error.

Universal Naming Convention

Windows allows to specify a path using their Universal Naming Convention (UNC). This starts with \\ and can hold a path to the local machine or it can contain a server address. A local path may look like this \\?\C:\SomeFolder\file.txt. If you want to convert this as well, you need to check for such paths.

A equivalent that exist on some applications for unixlike systems would be a URL, which also can hold a path to a local machine like this file:///SomeFolder/file.txt, however most applications will probably not support this and most (all?) kernels also don't support that.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

2 comment threads

Characters vs bytes (2 comments)
Works for me (1 comment)
Characters vs bytes
Karl Knechtel‭ wrote over 1 year ago · edited over 1 year ago

I see that you address the following to some extent in the "Encoding" section of the answer, but I think it could use some elaboration.

My understanding is that Posix indeed allows filenames to contain arbitrary byte sequences aside from the 0x00 NUL byte and the path separator (0x2f byte). It's also my understanding that modern Unices interpret the bytes in this path as representing text in UTF-8 - which conveniently does not require either of those byte values to represent other characters in multi-byte sequences. However, what does it do with bytes that are not valid UTF-8? And what about UTF-8 that happens to encode surrogate-pair characters, or unassigned/otherwise "special" code points?

H_H‭ wrote over 1 year ago · edited over 1 year ago

For a normal file system, the kernel nor the file system do care about the encoding (there is a exception for file systems that specify a encoding*). Only a tiny fraction of applications care, since even most applications handle them as a string of bytes. And applications that show it to the user also don't care that much, if a sequence is is not valid UTF8 it may just display � or not at all or whatever.

For valid UTF-8 encodings that encode a code point that does not exist, some may display it with a tiny square or rectangle containing the 4 or 6 hexadecimal digits..

This problems are not that related to just filenames. This problems are also part of every application that deals with text (browsers, chat software, office applications, PDF-Readers, .... )

*Sidenote: The Linux mount command has the option iocharset to specify how to convert names with 16 bit characters to a 8 bit byte stream and vice versa.