Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How to write a bash function to sanitize filenames for Linux and Windows

Post

How to write a bash function to sanitize filenames for Linux and Windows

+2
−2

I'm trying to write a bash function that can sanitize filenames to make them compatible with both Linux and Windows file systems. The function should perform the following operations:

  1. Replace invalid characters with similar valid ones (e.g., replace * or + with - or _, ? with ¿, etc).
  2. Remove leading and trailing spaces before the file extension.
  3. Replace multiple consecutive spaces, carriage returns, etc., with a single space.
  4. Replace single spaces with underscores (_).

Here's the function I've come up with:

sanitize_filename() {
    local input_str="$1"
    
    # Replace invalid characters
    input_str="${input_str//[^a-zA-Z0-9_\-\. ]/-}"
    
    # Remove spaces at the beginning and end before the extension
    input_str="${input_str#"${input_str%%[![:space:]]*[![:space:]]}"}"
    input_str="${input_str%"${input_str##*[![:space:]]}"}"
    
    # Replace multiple spaces, carriage returns, etc. with a single space
    input_str=$(echo "$input_str" | tr -s '[:space:]' | tr -s '\r')
    
    # Replace single spaces with underscore
    input_str="${input_str// /_}"
    
    echo "$input_str"
}

# Test cases
test_cases=(
    "my file 123.txt"
    "my+file+123?.txt"
    "file_with/special*characters.jpg"
    "   leading_spaces.docx"
    "trailing_spaces   .txt"
    "multiple    spaces.txt"
    "multiple  +  spaces.txt"
    "carriage\r\r return.txt"
)

# Expected sanitized filenames
expected=(
    "my_file_123.txt"
    "my-file-123¿.txt"
    "file_with_special-characters.jpg"
    "leading_spaces.docx"
    "trailing_spaces.txt"
    "multiple_spaces.txt"
    "multiple-spaces.txt"
    "carriage_return.txt"
)

# Test the function
for ((i=0; i<${#test_cases[@]}; i++)); do
    result=$(sanitize_filename "${test_cases[i]}")
    if [ "$result" == "${expected[i]}" ]; then
        echo "Test $i: PASSED"
    else
        echo "Test $i: FAILED"
        echo "Expected: ${expected[i]}"
        echo "Got: $result"
    fi
done

I get the following output:

❯ ./sanitize_filenames.sh
Test 0: PASSED
Test 1: FAILED
Expected: my-file-123¿.txt
Got: my-file-123-.txt
Test 2: FAILED
Expected: file_with_special-characters.jpg
Got: file_with-special-characters.jpg
Test 3: PASSED
Test 4: FAILED
Expected: trailing_spaces.txt
Got: trailing_spaces_.txt
Test 5: PASSED
Test 6: FAILED
Expected: multiple-spaces.txt
Got: multiple_-_spaces.txt
Test 7: FAILED
Expected: carriage_return.txt
Got: carriage-r-r_return.txt

How can I fix this function to handle all the test cases correctly?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Broken by design? (1 comment)
Vague and unfocused (1 comment)
Broken by design?
mauke‭ wrote 6 months ago

As a general rule, all Windows filenames are also valid on Linux, so the "for Linux" part of the question is redundant.

The rules for Windows you've proposed are neither necessary nor sufficient. For example, spaces and apostrophes are perfectly valid in Windows filenames, but Aux.txt or nul.tar.gz are not valid names: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions

Also, Windows path names have a much lower length limit. A name consisting of 300 As may already cause problems, depending on how you use it.

Even if you fix the supposed bugs in your implementation, I don't think the result will be fit for purpose. (And as the other comment says, this question is much too unfocused if you're only trying to understand a specific bug.)