Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Is it dangerous to use json.loads on untrusted data?

+9
−0

I manage a wsgi application that accepts JSON data via POST from potentially untrusted sources. Normally it is treated as a text blob and never parsed, but there is a value in the expected input that I would like to log.

The obvious way to do it looks like this (where payload is the untrusted input):

try:
    data = json.loads(payload)
except json.JSONDecodeError as ex:
    logging.warning("bad payload: %s", ex)
else:
    value = data.get('something', 'ERR_MISSING')
    logging.debug("something: %.30s", value)

This feels like it should be safe, which automatically makes me wonder if it is not. Broken json is handled, and the contents of data are not passed to anything other than the logger. But I am uncertain if it's safe to run json.loads on untrusted input, and I'm not sure that all possible json values are safe to stringify for logging.

What am I missing, if anything?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+9
−0

Short answer: No, it's not dangerous.

Short of bugs in the implementation or monkey-patching, there's no reason it would or should allow executing of anything other than the JSON parsing code. This Stack Overflow answer goes into detail for the actual implementation at that time. I didn't find any CVEs related to json.loads.

As a potental denial-of-service vector, I haven't looked at the code, but there's no real reason to expect it to be super-linear in either time or space with respect to the length of the string input.

While it doesn't seem relevant here, you can sometimes have security issues where different pieces of code interpret the same input in different ways. This is usually security relevant when one of those pieces of code is doing some kind of security check. This doesn't seem to be the case here.

However, to that end, this code is likely slightly better than passing the string through untouched assuming some other part of your system processes it as JSON. This code will then ensure that you are passing valid JSON to those deeper parts of the code. (Ideally, you would also check the JSON against a schema.) Even better, would be to parse into a JSON object and then pass that around rather than a string. You can convert it to a string when you need it as a string. If you still want to pass it around as a string though, it would be better to reserialize (i.e. json.dumps) so that you know the JSON is in a consistent, well-behaved format. For example, let's say the JSON parser used elsewhere in your code does have some vulnerability when exposed to legal but carefully formatted JSON. If the input to that JSON parser is only ever the output of json.dumps, then you only need to consider the output json.dumps can produce, and not all possible legal JSON strings. In general, anything you can do to reduce the number of representations for the same value is a good thing for correctness, ease of testing, and security.

If the string is truly an opaque blob with respect to your code as well as to any 3rd party services you pass it to, then you shouldn't be touching it at all. I assume that isn't the case, because otherwise you'd have no reason to expect it to be JSON at all.

Generally speaking, you want to parse into a structured representation as soon as you can and as strictly as you can[1]. (If this seems to conflict with Postel's Principle, often sloganized as "Be conservative in what you do, and liberal in what you accept from others", that's because it is commonly misunderstood and the misunderstood form is terrible advice. See A Patch for Postel's Robustness Principle.) I recommend the LangSec site generally for more discussion of these matters.


  1. To be clear, "as strictly as you can" means respecting the protocol/file format. If you claim to accept JSON matching a schema, you should accept all JSON matching that schema, otherwise you aren't accepting JSON. ↩︎

History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »