Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

75%

+4 −0

Q&A Queries to count points lying on arbitrary line

Some answer-specific terminology to make describing things simpler: I will refer to the N points as candidate points. I will refer to the integer points inside the bounding box of the N points ...

posted 2y ago by trichoplax‭ · edited 2y ago by trichoplax‭

Answer

#4: Post edited by

trichoplax‭ · 2023-05-29T00:09:14Z (about 2 years ago)
Typo

Copy Link

Raw

Markdown

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking []()whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (checking for presence is O(N)) by putting them in a data structure such as a [hashmap] (checking for presence is O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking []()whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (checking for presence is O(N)) by putting them in a data structure such as a [hashmap] (checking for presence is O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines or[]() the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

#3: Post edited by

trichoplax‭ · 2023-05-29T00:07:58Z (about 2 years ago)
Clarification

Copy Link

Raw

Markdown

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking []()whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (O(N)) by putting them in a data structure such as a [hashmap] (O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking []()whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (checking for presence is O(N)) by putting them in a data structure such as a [hashmap] (checking for presence is O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

#2: Post edited by

trichoplax‭ · 2023-05-29T00:05:20Z (about 2 years ago)
Typo

Copy Link

Raw

Markdown

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
~~It will be more efficient to check by asking whether each of the 2 lattice points are present in the set of 50 candidate points (rather than having to check all 50 candidate points).~~
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (O(N)) by putting them in a data structure such as a [hashmap] (O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

Some answer-specific terminology to make describing things simpler:
- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.
---
## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.
Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.
### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.
How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:
- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking []()whether each of the 2 lattice points is present in the set of 50 candidate points (rather than having to check all 50 candidate points).
This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.
### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.
### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (O(N)) by putting them in a data structure such as a [hashmap] (O(1)).
### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.
Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.
[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

#1: Initial revision by

trichoplax‭ · 2023-05-29T00:03:05Z (about 2 years ago)

Copy Link

Raw

Markdown

Some answer-specific terminology to make describing things simpler:

- I will refer to the N points as ***candidate points***.
- I will refer to the integer points inside the [bounding box] of the N points as ***lattice points***.

---

## Two approaches
There are two ways to approach this question:
- Check whether each of the N candidate points is on the line.
- Check whether each of the lattice points on the line is one of the N candidate points.

Which approach is most efficient will depend on the specific details of each case, and how much can be precomputed.

### Example
As an example, consider the case where N = 50 and all 50 candidate points lie in the region where:
- -100 <= x <= 100
- -100 <= y <= 100.

How many lattice points in this region need to be checked to see if they are one of the 50 candidate points? There are 201 * 201 = 40,401 lattice points, but most of these do not need to be checked as the line does not pass through them. This affects which of the two approaches is more efficient:

- If the line is y = x + 1 then it will pass through 200 lattice points in this region.
It will be more efficient to check by asking whether each of the 50 candidate points satisfies the equation (rather than having to check all 200 lattice points).
- If the line is y = 100x + 1 then it will only pass through 2 lattice points in this region.
It will be more efficient to check by asking whether each of the 2 lattice points are present in the set of 50 candidate points (rather than having to check all 50 candidate points).

This is simplified for the sake of example. In reality the check for presence in a set will not necessarily take the same time as the check for satisfying the equation. The point is that there will be some threshold where one approach is replaced by the other as the most efficient.

### Improving the check for whether a candidate point satisfies the equation
In those cases where you are checking whether each of the N candidate points satisfies the equation, a data structure such as your suggestion of a [quadtree] (or perhaps a [k-d tree]) will allow you to exclude some of the N points, as they are in a region that the line does not pass through, so you will only have to check a subset of the N points.

### Improving the check for whether a lattice point is in the set of candidate points
In those cases where you are checking whether each of the lattice points that the line passes through are in the set of N candidate points, you could avoid checking each of the N candidate points (O(N)) by putting them in a data structure such as a [hashmap] (O(1)).

### Deciding which approach to take
Once you have made each of the two approaches as efficient as you can, you can have some method of estimating which one is best for a given line and set of candidate points. Alternatively you could start both approaches and take the answer from the approach that finishes first, terminating the other.

Which approach is more efficient will also depend on how efficiently you can determine the set of lattice points that a given line passes through within the bounding box of the N candidate points. If either the set of lines of the set of candidate points is available in advance for precomputing the lattice points or the bounding box then the resulting improved efficiency is likely to affect which approach is best for some cases.

[bounding box]: https://en.wikipedia.org/wiki/Minimum_bounding_box#Axis-aligned_minimum_bounding_box
[k-d tree]: https://en.wikipedia.org/wiki/K-d_tree
[quadtree]: https://en.wikipedia.org/wiki/Quadtree
[hashmap]: https://en.wikipedia.org/wiki/Hash_table

Communities

Post History