Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
I want to write one or more self-answered Q&As on the topic of text encoding in Python, to serve as canonicals and preempt future lower-quality questions. I can think of the following things th...
#1: Initial revision
How should I organize material about text encoding in Python into questions?
I want to write one or more self-answered Q&As on the topic of text encoding in Python, to serve as canonicals and preempt future lower-quality questions. I can think of the following things that need to be addressed: * What is *an encoding*? * What are *encoding* (the process) and *decoding*? How do I know which is which? * Why do I need a text encoding? *When* do I need one? * How can I know which text encoding to use? * How can I know *if/how much freedom I have* in choosing a text encoding? * Are encodings used for other things? Why? * How do I specify an encoding... * for converting bytes to a string or vice-versa? * for reading and writing files? * when working with web libraries such as Requests, BeautifulSoup etc.? * when using a library to parse formats like CSV, JSON etc.? * What is the `codecs` standard library module for, and how does it relate to text encoding? * What are `UnicodeEncodeError` and `UnicodeDecodeError`? What do they mean; what causes them; and how do I resolve them? * Historical: in Python 2.x, why can attempts to decode cause `UnicodeEncodeError`, and vice-versa? * Historical / migration: how should I understand the type names `bytes`, `str` and `unicode` in 2.x vs 3.x? * Historical: What was `basestring` in 2.x and why was it needed? * Historical / migration: why did 2.x treat those types the way it did, and why does 3.x treat them differently? Why shouldn't I try to emulate the old approaches in new code? There might be more that I'm forgetting. My question here is, *how should I organize* these facets of the topic into questions? I don't think all of this material can be covered in a single post, but making things too fine-grained makes things awkward in the future - it becomes too hard to search for the right question because you find the other ones instead, and the material becomes redundant between questions.