While Lundin has written a reasonably complete answer, I think it suffers a bit from the expert view: conflating core issues with implementation details related to modern hardware architecture. I'd like to try to separate these notions.
Kinds of variables
To start with I'm going to divide the uses of program storage into three categories. The categories are defined by the way the lifetime of the stored object is determined.
For simplicity I'm going to refer to these stored whatsits as "variables" even if they are constant in you program, and I'm going to assume a compiled implementation in which the compiler reads all your code ahead of time and then creates either machine code or byte-code to implement it.
It will turn out that the properties of these categories encourage different strategies for assigning (allocating) them space in memory, and those strategies are the basis of an abstract model of memory use that feature "the stack" and "the heap" as the main players.
Global and static
Some kinds of variables "exist" for the whole duration of the program (for values of "exist" that mean you could reference them in code and the compiler or interpreter would give you what you expect). The most common example is global variables, though many language also support the concept of "static" variables associated with some sub-program or named data structure (such as an object).
Examples include global variables in most language, the hidden pointer that makes C's
strtok such a troublesome beast (marked with the
Some kinds of variables exist during the execution of a sub-program (whether you call that a "function", a "procedure", or even just a "scope block").
Generally a single name may exist in different scopes and it means different storage, and the location in memory is allowed to be different on different calls. Indeed, if the language supports recursion multiple different instances of the "same" variable may exist in memory at once.
Examples include function local variables, loop counters and so on.
Sometime you want to be able to say to your execution system "Hey, give me some space to store this thing" and have it hang around until you are ready to give it back. You may not even know how many times you want to say that in a given run of the program.
The lifetime of these variables is not linked to the execution of any particular sub-program: they can come into being in one function exist after that function returns and be destroyed much later but still before the program ends.
Examples include the nodes in a linked list.
I'm not going to talk about data that is persisted between invocations of a program.
Implementing these categories requires different strategies. I'm going to take them in reverse order, and describe the most common approach to dealing which each one. These are nearly ubiquitous today but the history of computing features variation and outright violations of the rules I write here.
To implement a on-demand storage system you will need a chunk of available memory, at catalog of what is currently in use and what is free, a way to mark a new chuck as in use, and a way to mark a currently used chunk as no longer in use.
This structure is unambiguously called a "free store" but is often also termed "the heap" (not to me confused with a heap data structure). Actual implementation can be straight forward (which might actually use a heap data structure to maintain the free list) to very subtle.
Two factors drive the handling of scoped memory allocations: (a) it's not needed all the time and (b) the possibility of duplicate copies of "one" variable if recursion occurs.
The former suggests that the associate of a particular part of memory with a variable name should be temporary (to reduce memory usage to what is currently being used), and the latter suggests that we should be building up and tearing down this storage as we enter and leave scopes (so that we can build the local storage for a function and when it calls itself build some new storage for the new invocation).
That is, we want to "add" storage to the end of a list when we enter a sub-program and remove that storage when we exit the sub-program. We can build onto the end if we call another sub-program, but we can't remove our own storage until we get control back (which means when everything added "beyond" our memory has already been reaped). So a data structure where we only add and remove stuff at one end: a stack.
Now, a stack of this kind could be implemented in several ways and in some sense the safest would be as a linked list on the free store, but that leaves a problem with tracking how names connect to nodes. On the other hand, most hardware architectures already have a function call stack and by allocating scoped storage on that stack the naming issue reduces to stack offsets that can be computed at compile-time.
By far the most common way of implementing scoped storage is on the call stack, generally shorted to "the stack".
Global and static storage
In principle this could be simply placed at the beginning of the stack, but hardware limits on stack size were ubiquitous and relatively small back in the "good" 'ole days, so that wasn't generally a good idea. Instead a separate section of memory was set aside. Due to the nature of this storage it's size could be computed in advance.
Names like "the data segment" and "bss" are basically implementation details from early systems that have become ubiquitous because we have to call them something.
Wait! So, how does this related to registers and caches and stuff?
The thing about the discussion above is that it implicitly assumes that memory is a nice uniform thing that is all equivalent and has limited internal structure. That was reasonably true of, say, my Apple ][+, but isn't really true at all of a modern PC.
These days CPU cycles are fast and your main RAM (while much faster than it used to be) simply can't keep up. Not because no one can build fast enough memory, but because that memory is unreasonably expensive. So the way a modern PC works is by having lots of slow memory (your RAM) and a small amount of fast memory (one or more layers of cache) which mirrors a ever changing part of RAM. And there is some sophisticated circuitry that decides what part of RAM to mirror at any given time. (On top of that, if you want to use more memory than you actually have the computer can use some of your disk or SSD to fake it, but that is even slower!)
On top of all this, processors have a very small amount of internal storage called "registers". These are even faster than cache.
How compilers take advantage of the memory hierarchy
A very simple compiler that you might write as a hobby would implement the abstract scheme discussed above more or less as I wrote it, and because "the heap", "the stack", and "the data segment" all reside (ultimately) in main memory the code it produced would be relatively slow (somewhat relieved by the hardware cache management).
But serious compiler projects (commercial or open source) compete on the speed of produced code. So, they will try to optimize any way they can. Maybe the heap can be managed to optimize cache performance. (Certainly some flow control constructs can be , but that is a different story.) Maybe some local variables (especially loop counters that don't last long) don't need to be put into memory at all but can just live in a register. The compiler could also give a variable a place in memory, but temporarily "move" it to a register in a stretch of code when it is in heavy rotation.
All of these tricks require the compiler to make sophisticated decisions, but a modern compiler is a very powerful program.
In short, register and cache storage don't play any part in the abstract model which divides program memory into "stack, heap, and data segment", but they do mean that reasoning based only on that model may miss important points.
Local variables are notionally allocated "on the stack", dynamic variables "on the heap", and global or static variables in a third location, but the compiler may play games behind the scenes for reasons of its own and the multilayered structure of memory means that the actual authoritative location of any value could vary in time.