Communities

Writing

Codidact Meta

The Great Outdoors

Photography & Video

Scientific Speculation

Cooking

Electrical Engineering

Judaism

Languages & Linguistics

$Mathematics$

tag:snake search within a tag

answers:0 unanswered questions

user:xxxx search by author id

score:0.5 posts with 0.5+ score

"snake oil" exact phrase

votes:4 posts with 4+ votes

created:<1w created < 1 week ago

post_type:xxxx type of post

Search help

Notifications

Mark all as read See all your notifications »

Q&A Code Reviews Meta

Q&A

Posts Tags Edits

Ask Question

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How to write a bash function to sanitize filenames for Linux and Windows

Post

How to write a bash function to sanitize filenames for Linux and Windows

−2

I'm trying to write a bash function that can sanitize filenames to make them compatible with both Linux and Windows file systems. The function should perform the following operations:

Replace invalid characters with similar valid ones (e.g., replace * or + with - or _, ? with ¿, etc).
Remove leading and trailing spaces before the file extension.
Replace multiple consecutive spaces, carriage returns, etc., with a single space.
Replace single spaces with underscores (_).

Here's the function I've come up with:

sanitize_filename() {
    local input_str="$1"
    
    # Replace invalid characters
    input_str="${input_str//[^a-zA-Z0-9_\-\. ]/-}"
    
    # Remove spaces at the beginning and end before the extension
    input_str="${input_str#"${input_str%%[![:space:]]*[![:space:]]}"}"
    input_str="${input_str%"${input_str##*[![:space:]]}"}"
    
    # Replace multiple spaces, carriage returns, etc. with a single space
    input_str=$(echo "$input_str" | tr -s '[:space:]' | tr -s '\r')
    
    # Replace single spaces with underscore
    input_str="${input_str// /_}"
    
    echo "$input_str"
}

# Test cases
test_cases=(
    "my file 123.txt"
    "my+file+123?.txt"
    "file_with/special*characters.jpg"
    "   leading_spaces.docx"
    "trailing_spaces   .txt"
    "multiple    spaces.txt"
    "multiple  +  spaces.txt"
    "carriage\r\r return.txt"
)

# Expected sanitized filenames
expected=(
    "my_file_123.txt"
    "my-file-123¿.txt"
    "file_with_special-characters.jpg"
    "leading_spaces.docx"
    "trailing_spaces.txt"
    "multiple_spaces.txt"
    "multiple-spaces.txt"
    "carriage_return.txt"
)

# Test the function
for ((i=0; i<${#test_cases[@]}; i++)); do
    result=$(sanitize_filename "${test_cases[i]}")
    if [ "$result" == "${expected[i]}" ]; then
        echo "Test $i: PASSED"
    else
        echo "Test $i: FAILED"
        echo "Expected: ${expected[i]}"
        echo "Got: $result"
    fi
done

I get the following output:

❯ ./sanitize_filenames.sh
Test 0: PASSED
Test 1: FAILED
Expected: my-file-123¿.txt
Got: my-file-123-.txt
Test 2: FAILED
Expected: file_with_special-characters.jpg
Got: file_with-special-characters.jpg
Test 3: PASSED
Test 4: FAILED
Expected: trailing_spaces.txt
Got: trailing_spaces_.txt
Test 5: PASSED
Test 6: FAILED
Expected: multiple-spaces.txt
Got: multiple_-_spaces.txt
Test 7: FAILED
Expected: carriage_return.txt
Got: carriage-r-r_return.txt

How can I fix this function to handle all the test cases correctly?

bash

posted about 1 year ago

CC BY-SA 4.0

ShadowsRanger‭

103 reputation 9 1 -9 46

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

2 comment threads

Broken by design? (1 comment)

Vague and unfocused (1 comment)

Karl Knechtel‭ wrote about 1 year ago

copy link

There seem to be many separate questions here. Please start by trying to make functions to implement the individual rules that you have in mind, and testing them. Then, the requirements for those rules need to be clearer for anyone to be able to help with them. For example, how should we know which characters are "invalid" (according to whom? What concrete problem do you hope to solve by this replacement?), and what is "similar" to each for the replacement? And what part of the existing code are you expecting to implement that logic? I don't see ¿ anywhere in the code, so it's hard to imagine how it could be expected to replace ? with ¿.