Software Development

matthewsnyder‭ wrote 11 months ago

Also, the following code is simpler:

# Part 1: Filter
filtered = [i for i in all_data if i["value"] > i["threshold"]]

# Part 2: Cumsum
s = 0
for i in filtered:
      s += i["value"]
      i["cumsum"] = s

congusbongus‭ wrote 11 months ago

copy link

you have misinterpreted the question; the condition for the cumsum involves different rows for "value" and "threshold", therefore the filter and cumsum cannot be split into two parts, as it would lead to the incorrect result of "3, 4, 6" instead of "3, 1, 3" as expected. Further, and this may not have been your intention, but the question specifies apache-spark/pyspark so plain python code cannot be the answer as it would be inefficient, although it can be used to clarify the question.

Communities

Comments on How to get conditional running cumulative sum based on current row and previous rows?

How to get conditional running cumulative sum based on current row and previous rows?

3 comment threads