Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

71%

+3 −0

Q&A Would a MySQL database run more efficiently with smaller varchar lengths?

YES or NO: It all depends on the storage engine Fairly universally, though IIRC from looking at PostgreSQL a while back, PostgreSQL may not even do that, there is a difference between CHAR/VARCHAR/...

posted 4y ago by manassehkatz‭ · edited 4y ago by manassehkatz‭

Answer

#2: Post edited by

manassehkatz‭ · 2020-10-22T01:29:17Z (over 4 years ago)

Copy Link

Raw

Markdown

~~### It all depends on the storage engine~~
Fairly universally, though IIRC from looking at PostgreSQL a while back, PostgreSQL may not even do that, there is a difference between CHAR/VARCHAR/BINARY/BLOB/TEXT etc. types based on declared size, where 1, 2 or 4 bytes are used to store the actual length of the object:
* 1 - 255 = 1 byte
* 256 - 65,535 = 2 bytes
* 65,536 - 2^24-1 = 3 bytes
* 2^24 - 2^32-1 (4 Gig.) = 4 bytes
In MySQL (some other databases handle this differently) there are even different names, at least for some of the data types:
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings
* TINYTEXT
* TEXT
* MEDIUMTEXT
* LONGTEXT
But the data has to be stored *somewhere* too. That can be as part of the actual record, which is how databases used to typically work. That is still the case for MySQL when using MyISAM Static Tables:
https://dev.mysql.com/doc/refman/8.0/en/static-format.html
With MyISAM Static Tables, the declared field length (within certain limits) determines the row size, which determines the on-disk table size. Create a table where each record has an integer ID (typically 4 bytes) and 4 VARCHAR(255) then each record will take just over 1 Kilobyte on disk - for 1,000 records that's a Megabyte (not counting overhead and indexes) and for 1,000,000 records that's a Gigabyte. Create the same table with VARCHAR(25) and you've cut the on-disk size by 90%. If the table has 100 records, that makes no real difference. If it has 1,000,000 records then it makes a big difference - possibly the difference between "entire table read into memory" vs. "lots of reads every time you access a different record".
While storing full VARCHAR capacity may seem like a total waste, the table manipulation is incredibly simple - no pointers, just read a fixed size row and chop it up. This is, of course, no longer the default engine for MySQL, and InnoDB (and even other variants of MyISAM) have many other advantages, but for this specific MyISAM Static, the answer to the original question is a clear **yes**.
Other database engines work differently. For example, InnoDB (and by inference in the documentation about it, also some of the non-Static variants of MyISAM) store VARCHAR strings using just the amount of space needed for the actual text + the byte count + some overhead (e.g., InnoDB uses a 4-byte alignment, which means every string gets to start on a typical 32-bit word boundary).
On balance, with *typical* current installations, the answer is either **no** or "so little difference that it just doesn't matter". But I am almost always in favor of optimizing the stated structure to match the actual usage. I often process a new batch of data using arbitrarily long fields and then analyze it to see what is really needed, and adjust my production schema accordingly.

### YES or NO: It all depends on the storage engine
Fairly universally, though IIRC from looking at PostgreSQL a while back, PostgreSQL may not even do that, there is a difference between CHAR/VARCHAR/BINARY/BLOB/TEXT etc. types based on declared size, where 1, 2 or 4 bytes are used to store the actual length of the object:
* 1 - 255 = 1 byte
* 256 - 65,535 = 2 bytes
* 65,536 - 2^24-1 = 3 bytes
* 2^24 - 2^32-1 (4 Gig.) = 4 bytes
In MySQL (some other databases handle this differently) there are even different names, at least for some of the data types:
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings
* TINYTEXT
* TEXT
* MEDIUMTEXT
* LONGTEXT
But the data has to be stored *somewhere* too. That can be as part of the actual record, which is how databases used to typically work. That is still the case for MySQL when using MyISAM Static Tables:
https://dev.mysql.com/doc/refman/8.0/en/static-format.html
With MyISAM Static Tables, the declared field length (within certain limits) determines the row size, which determines the on-disk table size. Create a table where each record has an integer ID (typically 4 bytes) and 4 VARCHAR(255) then each record will take just over 1 Kilobyte on disk - for 1,000 records that's a Megabyte (not counting overhead and indexes) and for 1,000,000 records that's a Gigabyte. Create the same table with VARCHAR(25) and you've cut the on-disk size by 90%. If the table has 100 records, that makes no real difference. If it has 1,000,000 records then it makes a big difference - possibly the difference between "entire table read into memory" vs. "lots of reads every time you access a different record".
While storing full VARCHAR capacity may seem like a total waste, the table manipulation is incredibly simple - no pointers, just read a fixed size row and chop it up. This is, of course, no longer the default engine for MySQL, and InnoDB (and even other variants of MyISAM) have many other advantages, but for this specific MyISAM Static, the answer to the original question is a clear **yes**.
Other database engines work differently. For example, InnoDB (and by inference in the documentation about it, also some of the non-Static variants of MyISAM) store VARCHAR strings using just the amount of space needed for the actual text + the byte count + some overhead (e.g., InnoDB uses a 4-byte alignment, which means every string gets to start on a typical 32-bit word boundary).
On balance, with *typical* current installations, the answer is either **no** or "so little difference that it just doesn't matter". But I am almost always in favor of optimizing the stated structure to match the actual usage. I often process a new batch of data using arbitrarily long fields and then analyze it to see what is really needed, and adjust my production schema accordingly.

#1: Initial revision by

manassehkatz‭ · 2020-10-21T18:21:20Z (over 4 years ago)

Copy Link

Raw

Markdown

### It all depends on the storage engine

Fairly universally, though IIRC from looking at PostgreSQL a while back, PostgreSQL may not even do that, there is a difference between CHAR/VARCHAR/BINARY/BLOB/TEXT etc. types based on declared size, where 1, 2 or 4 bytes are used to store the actual length of the object:

* 1 - 255 = 1 byte
* 256 - 65,535 = 2 bytes
* 65,536 - 2^24-1 = 3 bytes
* 2^24 - 2^32-1 (4 Gig.) = 4 bytes

In MySQL (some other databases handle this differently) there are even different names, at least for some of the data types:

https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings

* TINYTEXT
* TEXT
* MEDIUMTEXT
* LONGTEXT

But the data has to be stored *somewhere* too. That can be as part of the actual record, which is how databases used to typically work. That is still the case for MySQL when using MyISAM Static Tables:

https://dev.mysql.com/doc/refman/8.0/en/static-format.html

With MyISAM Static Tables, the declared field length (within certain limits) determines the row size, which determines the on-disk table size. Create a table where each record has an integer ID (typically 4 bytes) and 4 VARCHAR(255) then each record will take just over 1 Kilobyte on disk - for 1,000 records that's a Megabyte (not counting overhead and indexes) and for 1,000,000 records that's a Gigabyte. Create the same table with VARCHAR(25) and you've cut the on-disk size by 90%. If the table has 100 records, that makes no real difference. If it has 1,000,000 records then it makes a big difference - possibly the difference between "entire table read into memory" vs. "lots of reads every time you access a different record".

While storing full VARCHAR capacity may seem like a total waste, the table manipulation is incredibly simple - no pointers, just read a fixed size row and chop it up. This is, of course, no longer the default engine for MySQL, and InnoDB (and even other variants of MyISAM) have many other advantages, but for this specific MyISAM Static, the answer to the original question is a clear **yes**.

Other database engines work differently. For example, InnoDB (and by inference in the documentation about it, also some of the non-Static variants of MyISAM) store VARCHAR strings using just the amount of space needed for the actual text + the byte count + some overhead (e.g., InnoDB uses a 4-byte alignment, which means every string gets to start on a typical 32-bit word boundary).

On balance, with *typical* current installations, the answer is either **no** or "so little difference that it just doesn't matter". But I am almost always in favor of optimizing the stated structure to match the actual usage. I often process a new batch of data using arbitrarily long fields and then analyze it to see what is really needed, and adjust my production schema accordingly.

Communities

Post History