Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
How to find the last line number of a section in a text file (particular format)
I'm writing a function which reads specified sections in a text file formatted a particular way, but I'm having a hard time figuring out how to locate the last line number. Here's an example of the text file's format, where the numbers correspond to unique sections contained within the square brackets:
[1
Some text
]
[8
Some text
Some text
]
And here's the relevant parts of my code so far:
def readSection(sectionNumber):
fileLines = open(file).readlines()
for lineNumber, line in enumerate(fileLines, 1):
start = '[' + str(sectionNumber)
if start in line:
firstLineNumber = str(lineNumber)
Expected output based on format example: if sectionNumber is 1, firstLineNumber is 1 and lastLineNumber is 3. If sectionNumber is 8, firstLineNumber is 5 and lastLineNumber is 9.
Assigning the variable end to ']' gets the line number of every closing bracket, so I need a way to only get the first one after firstLineNumber but Google hasn't given me any useful information.
1 answer
Assigning the variable end to ']' gets the line number of every closing bracket
To solve this, just set the last line number only if the first line number is already set. Like this:
def readSection(sectionNumber):
with open(file) as f:
# before reading the file, first line number is not set
firstLineNumber = None
for lineNumber, line in enumerate(f, start=1):
# start of section found
if line.startswith(f'[{sectionNumber}'):
firstLineNumber = lineNumber
# first line number is set, which means it's the end of the section
elif firstLineNumber and line[0] == ']':
lastLineNumber = lineNumber
break # end of section found, no need to continue the loop
print(f'first line={firstLineNumber}, last line={lastLineNumber}')
With this, if you find a ]
but the first line number is not set yet, it means that it's not the end of the desired section.
And when the end of the section is found, I can interrupt the loop using break
. There's no need to continue looping, assuming that each section occurs just once in the file.
I've tested with:
readSection(1)
readSection(8)
And the output is:
first line=1, last line=3
first line=5, last line=9
PS: I'm assuming that if a line starts with ]
, it's closing a section. But if there are lines such as ]whatever
and they are part of a session, you should change the code to:
def readSection(sectionNumber):
with open(file) as f:
# before reading the file, first line number is not set
firstLineNumber = None
for lineNumber, line in enumerate(f, start=1):
# start of section found
if line.startswith(f'[{sectionNumber}'):
firstLineNumber = lineNumber
# first line number is set, which means it's the end of the section
elif firstLineNumber and line.strip() == ']': # remove line break from line before comparing to ']'
lastLineNumber = lineNumber
break
print(f'first line={firstLineNumber}, last line={lastLineNumber}')
Note that I can use enumerate
directly in the file object.
And I didn't use readlines()
because it will read the whole file and create a list with its contents, and when you loop through it, you're basically looping the file contents twice. For small files it won't make any significant difference, but for big files it may have an impact (but always test to see if it makes some difference).
But if the function will be called many times, maybe you could use readlines
just once outside the function, so you'll have a list with the file's contents. Then you pass this list to the function:
def readSection(lines, sectionNumber):
# before reading the file, first line number is not set
firstLineNumber = None
for lineNumber, line in enumerate(lines, start=1):
# start of section found
if line.startswith(f'[{sectionNumber}'):
firstLineNumber = lineNumber
# first line number is set, which means it's the end of the section
elif firstLineNumber and line.strip() == ']': # remove line break from line
lastLineNumber = lineNumber
break
print(f'first line={firstLineNumber}, last line={lastLineNumber}')
with open(file) as f:
# read the file once, create a list with all the lines
lines = f.readlines()
# use the same list to search for different sections
readSection(lines, 1)
readSection(lines, 8)
With this approach, you read the file just once, but keep all the contents in memory during the execution (which can be a fair trade-off, depending on the context).
If there are many sections to be searched, this could be a better approach, instead of reading from the file all the time.
Return values instead of printing
You could change the function to return the line numbers instead of printing them. By doing this, the code that calls the function decides what to do with the line numbers (print, pass them to another function, etc).
And as said in the comments, the file name could be a parameter of the function, so it becomes more flexible, as it can work with many different files (but it could also receive the list with all the lines, as previously said):
def readSection(fileName, sectionNumber):
with open(fileName) as f:
# first and last lines start unset
firstLineNumber = lastLineNumber = None
for lineNumber, line in enumerate(f, start=1):
if line.startswith(f'[{sectionNumber}'):
firstLineNumber = lineNumber
elif firstLineNumber and line[0] == ']':
lastLineNumber = lineNumber
break
return firstLineNumber, lastLineNumber
first, last = readSection('file.txt', 1)
# check if first or last line is not set
if first is None:
print('section not found')
elif last is None:
print('section without closing bracket')
else: # both are set
print(f'first line number={first}, last line number={last}')
Pre-processing line numbers
And finally, if the file is big (lots of different sections) and the line numbers will be searched many times, it may not be ideal to read the file every time or keep all of its contents in memory.
A better approach is to pre-process all the sections and their respective line numbers. In this case, you could read the file just once and return a dictionary that maps each section number to a tuple containing the respective first and last lines:
def getSectionsLineNumbers(fileName):
sections = {} # dictionary with all the sections
with open(fileName) as f:
firstLineNumber = currentSection = None
for lineNumber, line in enumerate(f, start=1):
# start of section found
if line.startswith('['):
firstLineNumber = lineNumber
# assuming that the rest of the line contains only digits
currentSection = int(line.strip()[1:])
# first line number is set, which means it's the end of the section
elif firstLineNumber and line.strip() == ']': # remove line break from line
# key=section number, value=tuple with first and last lines
sections[currentSection] = (firstLineNumber, lineNumber)
firstLineNumber = currentSection = None
return sections
# just look for the section number in the dictionary
def readSection(sections, sectionNumber):
if sectionNumber in sections:
firstLineNumber, lastLineNumber = sections[sectionNumber]
print(f'first line={firstLineNumber}, last line={lastLineNumber}')
else:
print(f'section {sectionNumber} not found')
# process all sections (assuming the file name is "file.txt")
sections = getSectionsLineNumbers('file.txt')
# search for different section numbers
readSection(sections, 1) # first line=1, last line=3
readSection(sections, 8) # first line=5, last line=9
readSection(sections, 4) # section 4 not found
The function assumes that the file is well formed (all the sections have the start and close brackets). And if the same section number occurs more than once, the last occurrence will overwrite the previous ones.
1 comment thread