Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

84%
+9 −0
Q&A Is it dangerous to use json.loads on untrusted data?

Short answer: No, it's not dangerous. Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. Thi...

posted 2y ago by Derek Elkins‭  ·  edited 2y ago by hkotsubo‭

Answer
#4: Post edited by user avatar hkotsubo‭ · 2021-10-20T05:08:14Z (over 2 years ago)
minor fixes (removed duplicated "when", added punctuation, etc)
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can^[To be clear, "as strictly as you can" means respecting the protocol/file format. If you claim to accept JSON matching a schema, you should accept *all* JSON matching that schema, otherwise you aren't accepting JSON.]. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This Stack Overflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass that around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security.
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can^[To be clear, "as strictly as you can" means respecting the protocol/file format. If you claim to accept JSON matching a schema, you should accept *all* JSON matching that schema, otherwise you aren't accepting JSON.]. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
#3: Post edited by user avatar Derek Elkins‭ · 2021-10-19T08:03:41Z (over 2 years ago)
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can^[To be clear, "as strictly as you can" means respecting the protocol/file format. If you claim to accept JSON matching a schema, you should accept *all* JSON matching that schema, otherwise you aren't accepting JSON.]. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
#2: Post edited by user avatar Derek Elkins‭ · 2021-10-19T07:50:23Z (over 2 years ago)
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings.
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
  • Short answer: No, it's not dangerous.
  • Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.
  • As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.
  • While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.
  • However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security
  • If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.
  • Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.
#1: Initial revision by user avatar Derek Elkins‭ · 2021-10-19T07:47:18Z (over 2 years ago)
Short answer: No, it's not dangerous.

Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. [This StackOverflow answer](https://stackoverflow.com/questions/38813298/is-json-loads-vulnerable-to-arbitrary-code-execution/45483187#45483187) goes into detail for the actual implementation at that time. I didn't find any CVEs related to `json.loads`.

As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.

While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.

However, to that end, this code is likely slightly better than passing the string through untouched *assuming* some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass than around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to *reserialize* (i.e. `json.dumps`) so that you *know* the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of `json.dumps`, then you only need to consider the output `json.dumps` can produce, and not all possible legal JSON strings.

If the string is truly an opaque blob with respect to your code *as well as to any 3rd party services you pass it to*, then [you shouldn't be touching it at all](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/walfish/walfish.pdf). I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.

Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is *terrible* advice. See [A Patch for Postel's Robustness Principle](https://langsec.org/papers/postel-patch.pdf).) I recommend the [LangSec site](https://langsec.org/) generally for more discussion of these matters.