Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
How to write a bash function to sanitize filenames for Linux and Windows
+2
−2
I'm trying to write a bash function that can sanitize filenames to make them compatible with both Linux and Windows file systems. The function should perform the following operations:
- Replace invalid characters with similar valid ones (e.g., replace
*
or+
with-
or_
,?
with¿
, etc). - Remove leading and trailing spaces before the file extension.
- Replace multiple consecutive spaces, carriage returns, etc., with a single space.
- Replace single spaces with underscores (
_
).
Here's the function I've come up with:
sanitize_filename() {
local input_str="$1"
# Replace invalid characters
input_str="${input_str//[^a-zA-Z0-9_\-\. ]/-}"
# Remove spaces at the beginning and end before the extension
input_str="${input_str#"${input_str%%[![:space:]]*[![:space:]]}"}"
input_str="${input_str%"${input_str##*[![:space:]]}"}"
# Replace multiple spaces, carriage returns, etc. with a single space
input_str=$(echo "$input_str" | tr -s '[:space:]' | tr -s '\r')
# Replace single spaces with underscore
input_str="${input_str// /_}"
echo "$input_str"
}
# Test cases
test_cases=(
"my file 123.txt"
"my+file+123?.txt"
"file_with/special*characters.jpg"
" leading_spaces.docx"
"trailing_spaces .txt"
"multiple spaces.txt"
"multiple + spaces.txt"
"carriage\r\r return.txt"
)
# Expected sanitized filenames
expected=(
"my_file_123.txt"
"my-file-123¿.txt"
"file_with_special-characters.jpg"
"leading_spaces.docx"
"trailing_spaces.txt"
"multiple_spaces.txt"
"multiple-spaces.txt"
"carriage_return.txt"
)
# Test the function
for ((i=0; i<${#test_cases[@]}; i++)); do
result=$(sanitize_filename "${test_cases[i]}")
if [ "$result" == "${expected[i]}" ]; then
echo "Test $i: PASSED"
else
echo "Test $i: FAILED"
echo "Expected: ${expected[i]}"
echo "Got: $result"
fi
done
I get the following output:
❯ ./sanitize_filenames.sh
Test 0: PASSED
Test 1: FAILED
Expected: my-file-123¿.txt
Got: my-file-123-.txt
Test 2: FAILED
Expected: file_with_special-characters.jpg
Got: file_with-special-characters.jpg
Test 3: PASSED
Test 4: FAILED
Expected: trailing_spaces.txt
Got: trailing_spaces_.txt
Test 5: PASSED
Test 6: FAILED
Expected: multiple-spaces.txt
Got: multiple_-_spaces.txt
Test 7: FAILED
Expected: carriage_return.txt
Got: carriage-r-r_return.txt
How can I fix this function to handle all the test cases correctly?
2 comment threads