Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Would a MySQL database run more efficiently with smaller varchar lengths?
I have a database with quite a few VARCHAR fields. When the database was first built the lengths of the columns were set a bit larger than absolutely necessary.
Now after, having used the DB for a while and run a lot of data through it, I have a better idea of how long the fields need to be and am wondering about if reducing the VARCHAR lengths would make it run better.
If I set the lengths to say 10 characters plus what is currently the max length would that help the select and join times?
3 answers
A simple google search of VARCHAR size
, showed that it is not an arbitrarily sized string, which means VARCHAR 150
and VARCHAR 2
would take up the same amount space. So, no-- I don't think there would be any speed gains (unless the cache is in play which would most likely have a performance impact). My advice to you is to go with the smaller size (if and only if your profiling efforts show that it is faster, otherwise it would be premature optimization).
0 comment threads
YES or NO: It all depends on the storage engine
Fairly universally, though IIRC from looking at PostgreSQL a while back, PostgreSQL may not even do that, there is a difference between CHAR/VARCHAR/BINARY/BLOB/TEXT etc. types based on declared size, where 1, 2 or 4 bytes are used to store the actual length of the object:
- 1 - 255 = 1 byte
- 256 - 65,535 = 2 bytes
- 65,536 - 2^24-1 = 3 bytes
- 2^24 - 2^32-1 (4 Gig.) = 4 bytes
In MySQL (some other databases handle this differently) there are even different names, at least for some of the data types:
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings
- TINYTEXT
- TEXT
- MEDIUMTEXT
- LONGTEXT
But the data has to be stored somewhere too. That can be as part of the actual record, which is how databases used to typically work. That is still the case for MySQL when using MyISAM Static Tables:
https://dev.mysql.com/doc/refman/8.0/en/static-format.html
With MyISAM Static Tables, the declared field length (within certain limits) determines the row size, which determines the on-disk table size. Create a table where each record has an integer ID (typically 4 bytes) and 4 VARCHAR(255) then each record will take just over 1 Kilobyte on disk - for 1,000 records that's a Megabyte (not counting overhead and indexes) and for 1,000,000 records that's a Gigabyte. Create the same table with VARCHAR(25) and you've cut the on-disk size by 90%. If the table has 100 records, that makes no real difference. If it has 1,000,000 records then it makes a big difference - possibly the difference between "entire table read into memory" vs. "lots of reads every time you access a different record".
While storing full VARCHAR capacity may seem like a total waste, the table manipulation is incredibly simple - no pointers, just read a fixed size row and chop it up. This is, of course, no longer the default engine for MySQL, and InnoDB (and even other variants of MyISAM) have many other advantages, but for this specific MyISAM Static, the answer to the original question is a clear yes.
Other database engines work differently. For example, InnoDB (and by inference in the documentation about it, also some of the non-Static variants of MyISAM) store VARCHAR strings using just the amount of space needed for the actual text + the byte count + some overhead (e.g., InnoDB uses a 4-byte alignment, which means every string gets to start on a typical 32-bit word boundary).
On balance, with typical current installations, the answer is either no or "so little difference that it just doesn't matter". But I am almost always in favor of optimizing the stated structure to match the actual usage. I often process a new batch of data using arbitrarily long fields and then analyze it to see what is really needed, and adjust my production schema accordingly.
The manual writes:
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
That is, the only effect of specifying a shorter length limit is that it can enable the database to use 1 rather than 2 bytes to store the (byte) length of the text.
Specifically, this means that a VARCHAR(1)
and a VARCHAR(63)
are always stored and retrieved in the exact same way (assuming an database character encoding with at most 4 bytes per character such as utf8mb4
). Depending on the character encoding used, a VARCHAR with a higher limit may require 1 additional byte to store the length.
That is, VARCHAR
length limits have negligible performance impact, and it many cases have no performance impact at all.
The database provides VARCHAR
length limit support not because the database needs a limit, but because an application might. For instance, if your user interface has room only for 20 characters, you may want to express this constraint in the database depending on your application architecture.
Historically, this was an important feature because many popular application programming languages such a COBOL used fixed length strings, and databases were often shared by many applications. Nowadays, applications usually handle overly long strings gracefully enough that such constraints are no longer needed (or at least, no longer needed at the database level).
1 comment thread