−0

Assigning the variable end to ']' gets the line number of every closing bracket

To solve this, just set the last line number only if the first line number is already set. Like this:

def readSection(sectionNumber):
    with open(file) as f:
        # before reading the file, first line number is not set
        firstLineNumber = None
        for lineNumber, line in enumerate(f, start=1):
            # start of section found
            if line.startswith(f'[{sectionNumber}'):
                firstLineNumber = lineNumber
            # first line number is set, which means it's the end of the section
            elif firstLineNumber and line[0] == ']':
                lastLineNumber = lineNumber
                break # end of section found, no need to continue the loop
        print(f'first line={firstLineNumber}, last line={lastLineNumber}')

With this, if you find a ] but the first line number is not set yet, it means that it's not the end of the desired section.

And when the end of the section is found, I can interrupt the loop using break. There's no need to continue looping, assuming that each section occurs just once in the file.

I've tested with:

readSection(1)
readSection(8)

And the output is:

first line=1, last line=3
first line=5, last line=9

PS: I'm assuming that if a line starts with ], it's closing a section. But if there are lines such as ]whatever and they are part of a session, you should change the code to:

def readSection(sectionNumber):
    with open(file) as f:
        # before reading the file, first line number is not set
        firstLineNumber = None
        for lineNumber, line in enumerate(f, start=1):
            # start of section found
            if line.startswith(f'[{sectionNumber}'):
                firstLineNumber = lineNumber
            # first line number is set, which means it's the end of the section
            elif firstLineNumber and line.strip() == ']':  # remove line break from line before comparing to ']'
                lastLineNumber = lineNumber
                break
        print(f'first line={firstLineNumber}, last line={lastLineNumber}')

Note that I can use enumerate directly in the file object.

And I didn't use readlines() because it will read the whole file and create a list with its contents, and when you loop through it, you're basically looping the file contents twice. For small files it won't make any significant difference, but for big files it may have an impact (but always test to see if it makes some difference).

But if the function will be called many times, maybe you could use readlines just once outside the function, so you'll have a list with the file's contents. Then you pass this list to the function:

def readSection(lines, sectionNumber):
    # before reading the file, first line number is not set
    firstLineNumber = None
    for lineNumber, line in enumerate(lines, start=1):
        # start of section found
        if line.startswith(f'[{sectionNumber}'):
            firstLineNumber = lineNumber
        # first line number is set, which means it's the end of the section
        elif firstLineNumber and line.strip() == ']':  # remove line break from line
            lastLineNumber = lineNumber
            break
    print(f'first line={firstLineNumber}, last line={lastLineNumber}')

with open(file) as f:
    # read the file once, create a list with all the lines
    lines = f.readlines()

    # use the same list to search for different sections
    readSection(lines, 1)
    readSection(lines, 8)

With this approach, you read the file just once, but keep all the contents in memory during the execution (which can be a fair trade-off, depending on the context).

If there are many sections to be searched, this could be a better approach, instead of reading from the file all the time.

Return values instead of printing

You could change the function to return the line numbers instead of printing them. By doing this, the code that calls the function decides what to do with the line numbers (print, pass them to another function, etc).

And as said in the comments, the file name could be a parameter of the function, so it becomes more flexible, as it can work with many different files (but it could also receive the list with all the lines, as previously said):

def readSection(fileName, sectionNumber):
    with open(fileName) as f:
        # first and last lines start unset
        firstLineNumber = lastLineNumber = None
        for lineNumber, line in enumerate(f, start=1):
            if line.startswith(f'[{sectionNumber}'):
                firstLineNumber = lineNumber
            elif firstLineNumber and line[0] == ']':
                lastLineNumber = lineNumber
                break
        return firstLineNumber, lastLineNumber

first, last = readSection('file.txt', 1)
# check if first or last line is not set
if first is None:
    print('section not found')
elif last is None:
    print('section without closing bracket')
else: # both are set
    print(f'first line number={first}, last line number={last}')

Pre-processing line numbers

And finally, if the file is big (lots of different sections) and the line numbers will be searched many times, it may not be ideal to read the file every time or keep all of its contents in memory.

A better approach is to pre-process all the sections and their respective line numbers. In this case, you could read the file just once and return a dictionary that maps each section number to a tuple containing the respective first and last lines:

def getSectionsLineNumbers(fileName):
    sections = {} # dictionary with all the sections
    with open(fileName) as f:
        firstLineNumber = currentSection = None
        for lineNumber, line in enumerate(f, start=1):
            # start of section found
            if line.startswith('['):
                firstLineNumber = lineNumber
                # assuming that the rest of the line contains only digits
                currentSection = int(line.strip()[1:])
            # first line number is set, which means it's the end of the section
            elif firstLineNumber and line.strip() == ']':  # remove line break from line
                # key=section number, value=tuple with first and last lines
                sections[currentSection] = (firstLineNumber, lineNumber)
                firstLineNumber = currentSection = None

    return sections

# just look for the section number in the dictionary
def readSection(sections, sectionNumber):
    if sectionNumber in sections:
        firstLineNumber, lastLineNumber = sections[sectionNumber]
        print(f'first line={firstLineNumber}, last line={lastLineNumber}')
    else:
        print(f'section {sectionNumber} not found')


# process all sections (assuming the file name is "file.txt")
sections = getSectionsLineNumbers('file.txt')

# search for different section numbers
readSection(sections, 1)  # first line=1, last line=3
readSection(sections, 8)  # first line=5, last line=9
readSection(sections, 4)  # section 4 not found

The function assumes that the file is well formed (all the sections have the start and close brackets). And if the same section number occurs more than once, the last occurrence will overwrite the previous ones.

posted 1 day ago

CC BY-SA 4.0

8h ago

hkotsubo‭

5190 reputation 20 70 583 237

Copy Link

Raw

Markdown

History

Communities

How to find the last line number of a section in a text file (particular format)

1 comment thread

1 answer

Return values instead of printing

Pre-processing line numbers

0 comment threads