Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Conditionally ignore files in git

+7
−0

I'm using git for LaTeX projects and am in a little dilemma about how to best ignore files.

  • if I add *.pdf to my .gitignore file, I keep forgetting to force add included graphics

  • if I don't add it, I keep accidentally adding the compiled documents, which are often quite large and blow up my repository sizes

  • and even if I remember to add specific filename of the compiled pdf to my gitignore, then dozens of them will clutter the ignore files in my bigger repos and I'll need to add ignore files even for the smaller repos for which normally would not need anything special in addition to my global ignore file.

Is there any way to solve this dilemma? Something like automatically ignoring all .pdf files for which a .tex of the same name exists?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Must the built PDFs be colocated with the source PDFs? (2 comments)
I've never used LaTeX, so I don't know if this is an option: Can you set LaTeX to put the .pdf output... (2 comments)

3 answers

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+6
−0

Something like automatically ignoring all .pdf files for which a .tex of the same name exists?

We can do something close to that. We can reject the commit if your change list contains pdf and tex files with the same name and path with a git hook. You can then remove the files and try again. This will only work for your repository, your collaborators will have to also put this hook in their repository to get the same protection.

Create a file in your .git/hooks folder with the name pre-commit.

#!/bin/sh

# This is just to provide a way to bypass just this check.
if test -n "${SKIP_PDF_CHECK+1}"
then
    echo "Check for PDF generated files in commit skipped."
    exit 0
fi

# This command creates a table of file name and the number of times the name appears in the output of git diff.
# Example input / output:
#   Given git diff shows these changed files:
#        lonely.pdf
#        other.pdf
#        singular.tex
#        oops.pdf
#        oops.tex
#        oops.odd.periods.pdf
#        oops.odd.periods.tex
#   counts will be:
#        1 lonely
#        1 other
#        1 singular
#        2 oops
#        2 oops.odd.periods
# 
# Comments for each command:
#   Get the list of changes including Added Copied Modified. See other options: https://git-scm.com/docs/git-diff#Documentation/git-diff.txt---diff-filterACDMRTUXB82308203
#   Filter down to just the files that end in pdf or tex
#   Remove the extension.
#   Sort the list in preparation for uniq
#   Show only unique lines and include the count
counts=$(git diff --cached --name-only --diff-filter=ACM | awk '/\.pdf|\.tex$/ {print}' | awk -F. 'BEGIN { OFS = FS }; NF { NF -= 1 }; {print $0};' | sort | uniq -c)

# Print the output of $counts
# Check if any of the counts are more than 1. Which would mean there are 2 files with the same name but different extension. 
if echo "$counts" | awk '$1 > 1 {exit 1}'
then
    echo "No generated files detected in commit."

    # Allow git commit.
    exit 0
else
    echo "The following file names look generated. Remove them or skip this check with SKIP_PDF_CHECK=1 git commit";
    # Print the offending files.
    echo "$counts" | awk '$1 > 1 {print $2}'

    # Prevent git commit.
    exit 1
fi

This script will now run every time you do git commit. If you want to skip this check for whatever reason you can do SKIP_PDF_CHECK=1 git commit.

You can do pretty much any algorithm for detecting generated files since the hook is just a shell script. I will also caveat that script hasn't been tested thoroughly. It may not work with things like emojiis in filenames and such.

Of course, it's strange that these files are getting accidentally added to the change list at all. Are you using git add . or similar? Perhaps use git add -ip to have git interactively prompt you about each change so you can review them.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

2 comment threads

If your input .pdf files are considerably smaller than all your output .pdf files, you can optionally... (1 comment)
That is a really great answer! Thanks a lot for your solution! (2 comments)
+10
−2

I'm not familiar with Latex, but it seems the PDFs are generated from the Latex files.

It then seems the real problem is that you are trying to keep source and objects derived from that source in the GIT repository. Ideally, a GIT repository is only for the actual source files (those directly edited by humans). Put the files that are automatically derived from source elsewhere. This can be automated with your build scripts putting derived objects in a different place. That different place could be a subdirectory within the repository that is added to the .gitignore list.

Another possibility is to have a cleaner script that you run before each commit. This script would delete all the derived objects.

Yet another possibility, although I really don't like this one, is to have the cleaner script instead edit the .gitignore file to ignore the known derived objects, based on the existence of particular source files.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Editing .gitignore is fragile. (8 comments)
+1
−0

Per man gitignore there are four sources of patterns for ignoring files. Command-line arguments are probably too much hassle; .gitignore is itself version-controlled (unless you include .gitignore in it), which creates complications. That leaves $GIT_DIR/info/exclude and the file listed in config variable core.excludesFile. Sadly, it seems that $GIT_DIR/info/exclude has to be a file and not a directory whose contents are concatenated.

A somewhat complicated solution which probably doesn't conflict directly with anything you're already doing would be:

  1. Create a script which updates $GIT_DIR/info/exclude to list the .pdf files for which there are .tex files. If you want to be really cautious you could delimit a section of the file with comments and replace only that section.
  2. alias git to invoke the script and then pass the arguments along to /usr/bin/git. (In practice this probably means making the script itself pass the arguments along, and make the script the alias).
History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Thank you for your answer! Interesting to learn about the additional places which contribute to the l... (1 comment)

Sign up to answer this question »