Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Queries to count points lying on arbitrary line

+5
−1

Suppose we have N points on XY plane, ie. (x, y) and x, y are integers and multiple queries where each query is of the form y = mx + c and m, c are integers.

Is it possible to count number of points that lie on the given line more efficiently than O(N) per query?

I believe some preprocessing might help or some data structure like QuadTree that counts points inside arbitrary rectangle.

The queries are offline so if it's possible using some batch processing technique or even SQRT optimization it would be better than naive brute force.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Are the points distinct? (1 comment)
Worst case or average case? (1 comment)

2 answers

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+4
−0

Some answer-specific terminology to make describing things simpler:

  • I will refer to the N points as candidate points.
  • I will refer to the integer points inside the bounding box of the N points as lattice points.

Two approaches

There are two ways to approach this question:

  • Check whether each of the N candidate points is on the line.
  • Check whether each of the lattice points on the line is one of the N candidate points.

Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.

Example

As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:

  • -100 <= x <= 100
  • -100 <= y <= 100.

How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:

  • If the line is y = x + 1 then it will pass through 200 lattice points in this region.
    It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
  • If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
    It will be more efficient to check by asking whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).

This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.

Improving the check for whether a candidate point satisfies the equation

In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a quadtree (or perhaps a k-d tree) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.

Improving the check for whether a lattice point is in the set of candidate points

In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (checking for presence is O(N)) by putting them in a data structure such as a hashmap (checking for presence is O(1)).

Deciding which approach to take

Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.

Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines or the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

+0
−0

Since you didn't mention memory, I'll assume O(N^2) is acceptable.

Setup pseudocode:

for every p1, p2:
    ln = line(p1, p2)
    points_on_line[ln.m, ln.c] += [p1, p2]

If there are at least 2 colinear points on your line, you can get them in O(1) with points_on_line[m, c]. If not, you'll have to do brute force.

The initialization takes O(N^2) so it is practical if the number of searches you need to do is much more than N^2 (eg. 5 points, 1000 searches). Note that you can add points:

def add_point(p):
    for m, c in points_on_line:
        if p.y == m * p.x + c:
            points_on_line[m, c] += [p]

This will be O(N), so adding new points is okay so long as you are doing a lot of searches for each point you add.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »