Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on What's the better way to store base64 in SQLite Database?

Parent

What's the better way to store base64 in SQLite Database?

+2
−0

What's the better way to store base64 in SQLite Database? I was thinking to put base64 as TEXT since base64 has lots of chars.

CREATE TABLE name (id INTEGER PRIMARY KEY AUTO_INCREMENT, image TEXT);

Is there any other value in SQL for base64? I don't want to use bitmap (BLOB)[1].


  1. not removing the whole text since someone had mentioned it in answer. And I forgot the name that's why I used bitmap earlier. ↩︎

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+3
−0

The question seems to show a certain level of confusion about terminology. I shall address what I believe to be the underlying question: How can I store binary data in SQLite?

SQLite supports 5 datatypes: NULL, INTEGER, REAL, TEXT, and BLOB. The natural datatype to use for binary data is BLOB. However, I assume that this is what you mean by bitmap when you say "I don't want to use bitmap".

That leaves text encodings. Note that

The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE)

so the most efficient text encoding to use depends on the database configuration.

If the database is using UTF-8 then the optimal encoding would be base-128 using codepoints \u0000 to \u007f. If for some reason you want to avoid certain codepoints, binary base encoding is fastest when it can work with blocks, i.e. with bases whose base-2 logarithm is just more than a fraction whose numerator is a small multiple of 8. E.g. if lg b = (8n + epsilon)/d you can encode a block of n bytes as a block of d bytes for efficiency n/d. That gives the following good bases:

Base b     lg b         Efficiency
   41    16.03 / 3         0.67
   49    56.15 / 10        0.70
   53    40.10 / 7         0.71
   57    64.16 / 11        0.72
   64    24.00 / 4         0.75
   75    56.06 / 9         0.78
   85    32.05 / 5         0.80
  102    40.03 / 6         0.83
  116    48.01 / 7         0.86
  128    56.00 / 8         0.88

So basically you want to pick the highest base which gives you enough codepoints in the range \u0000 to \u007f which you deem acceptable.

If the database is using UTF-16 then the basic principles are the same but you have the complication of working around the surrogate codepoints. I would avoid this if at all possible.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

I wonder about the relevance of `lg b` if encoding efficiency is expressed simply by `n/d`... (8 comments)
I wonder about the relevance of `lg b` if encoding efficiency is expressed simply by `n/d`...
elgonzo‭ wrote about 3 years ago

...see comment thread title...

Peter Taylor‭ wrote about 3 years ago

I'm not sure what your question is.

elgonzo‭ wrote about 3 years ago · edited about 3 years ago

Okay, let me rephrase it into two questions: How is the bit about lg b relevant to the information conveyed by the answer? Why is it important to talk about lg b if the encoding efficiency is equivalently expressed by n/d, which is both simpler and easier to grasp than going into log₂ b? I hope the intent of my comment is easier to understand now...

Peter Taylor‭ wrote about 3 years ago

How do you find n and d without first calculating lg b?

elgonzo‭ wrote about 3 years ago · edited about 3 years ago

For the numbers involved here it's rather simple and intuitive. Let's take base-128, for example. A UTF-8 code unit is 8 bit (a byte). Base-128 can encode 7 source/data bits per UTF-8 code unit (assuming we stick with code points <= \u007f for the encoded symbols, as you mentioned). Therefore, the encoding efficiency is 7/8 = 0.875. With the numbers at play here, i claim most programmers, even relative novices, can deduce that a range of 128 values/symbols requires 7 bits to represent -- or in other words, a range of 128 values/symbols can encode 7 bits worth of information. I furthermore claim, that for those that are not able to understand this, binary logarithm and that formula in your answer will be an even greater mystery if left unexplained.

elgonzo‭ wrote about 3 years ago · edited about 3 years ago

To be clear, i am not saying that your answer is wrong. I just personally find the stuff about the binary logarithm and formula a distraction that in my opionion is in the best case not adding much to the answer and in the worst case is an impediment for a reader consuming your answer.

Peter Taylor‭ wrote about 3 years ago

Take a non-trivial example. If I tell you that base 57 can encode 5.83289 bits per symbol, what block size should you use?

Skipping 1 deleted comment.

elgonzo‭ wrote about 3 years ago · edited about 3 years ago

Okay, i see your point now. I stand corrected. I actually hadn't thought of unusual non-power-2 [*] BaseN encodings before much, but it seems to be rather nifty to spread the value range of a group of data bits across the symbol value range(s) of multiple symbols that are being otherwise left unused when the symbol value space is larger than required for encoding a group of "whole" bits. ([*] Come to think of it, Base10 is anything but unusual, lol - so much for a self-defeating argument (: )