Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How to write a bash function to sanitize filenames for Linux and Windows

Post

How to write a bash function to sanitize filenames for Linux and Windows

+2
−2

I'm trying to write a bash function that can sanitize filenames to make them compatible with both Linux and Windows file systems. The function should perform the following operations:

  1. Replace invalid characters with similar valid ones (e.g., replace * or + with - or _, ? with ¿, etc).
  2. Remove leading and trailing spaces before the file extension.
  3. Replace multiple consecutive spaces, carriage returns, etc., with a single space.
  4. Replace single spaces with underscores (_).

Here's the function I've come up with:

sanitize_filename() {
    local input_str="$1"
    
    # Replace invalid characters
    input_str="${input_str//[^a-zA-Z0-9_\-\. ]/-}"
    
    # Remove spaces at the beginning and end before the extension
    input_str="${input_str#"${input_str%%[![:space:]]*[![:space:]]}"}"
    input_str="${input_str%"${input_str##*[![:space:]]}"}"
    
    # Replace multiple spaces, carriage returns, etc. with a single space
    input_str=$(echo "$input_str" | tr -s '[:space:]' | tr -s '\r')
    
    # Replace single spaces with underscore
    input_str="${input_str// /_}"
    
    echo "$input_str"
}

# Test cases
test_cases=(
    "my file 123.txt"
    "my+file+123?.txt"
    "file_with/special*characters.jpg"
    "   leading_spaces.docx"
    "trailing_spaces   .txt"
    "multiple    spaces.txt"
    "multiple  +  spaces.txt"
    "carriage\r\r return.txt"
)

# Expected sanitized filenames
expected=(
    "my_file_123.txt"
    "my-file-123¿.txt"
    "file_with_special-characters.jpg"
    "leading_spaces.docx"
    "trailing_spaces.txt"
    "multiple_spaces.txt"
    "multiple-spaces.txt"
    "carriage_return.txt"
)

# Test the function
for ((i=0; i<${#test_cases[@]}; i++)); do
    result=$(sanitize_filename "${test_cases[i]}")
    if [ "$result" == "${expected[i]}" ]; then
        echo "Test $i: PASSED"
    else
        echo "Test $i: FAILED"
        echo "Expected: ${expected[i]}"
        echo "Got: $result"
    fi
done

I get the following output:

❯ ./sanitize_filenames.sh
Test 0: PASSED
Test 1: FAILED
Expected: my-file-123¿.txt
Got: my-file-123-.txt
Test 2: FAILED
Expected: file_with_special-characters.jpg
Got: file_with-special-characters.jpg
Test 3: PASSED
Test 4: FAILED
Expected: trailing_spaces.txt
Got: trailing_spaces_.txt
Test 5: PASSED
Test 6: FAILED
Expected: multiple-spaces.txt
Got: multiple_-_spaces.txt
Test 7: FAILED
Expected: carriage_return.txt
Got: carriage-r-r_return.txt

How can I fix this function to handle all the test cases correctly?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Broken by design? (1 comment)
Vague and unfocused (1 comment)
Vague and unfocused
Karl Knechtel‭ wrote 6 months ago

There seem to be many separate questions here. Please start by trying to make functions to implement the individual rules that you have in mind, and testing them. Then, the requirements for those rules need to be clearer for anyone to be able to help with them. For example, how should we know which characters are "invalid" (according to whom? What concrete problem do you hope to solve by this replacement?), and what is "similar" to each for the replacement? And what part of the existing code are you expecting to implement that logic? I don't see ¿ anywhere in the code, so it's hard to imagine how it could be expected to replace ? with ¿.