Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Extracting Firefox Local Data
How does one parse/decode/extract information stored by a web page in Firefox's browser local storage without the browser?
So far, I've worked out that the data lives at ~/.mozilla/firefox/${profile}/storage/default/${site}/ls/data.sqlite
, where the site looks something like https+++software.codidact.com
. Because the browser locks the database at least when the page is open, I copy it off to a temporary version.
SQLite3 then shows two tables, database
that mostly only describes the site, and data
which holds what we find under Developer Tools/Storage/Local Storage/(relevant URL). Querying SELECT value FROM data WHERE key = 'whateverKeyWeCareAbout'
gives...something.
So far, so good.
However, the "something" that we get from the value
column might come through as plain JSON - it looks like this happens when the browser serializes a smaller object - or it might come through as something more complicated, which resembles small fragments of the expected JSON interspersed with opaque binary sequences. The latter is the concern, here.
The site that interests me happens to be Open Source, so I checked to see if they maybe used some bizarre encryption, but no, they store to local storage with localStorage.setItem(JSON.stringify(object))
from a single central utility function, so I can't pull the mechanism out of there.
I've seen suggestions that Firefox uses MessagePack for this storage, but the data doesn't seem to decode with that assumption. Another suggests that this might be the internal representation of the structured clone algorithm, but I couldn't find any corroboration for this or any code or tool to get a serialized form to verify it.
The ideal would be to run something like sqlite3 "copied-data.sqlite" "SELECT value FROM data WHERE key = 'whateverKeyWeCareAbout'" | tool-to-get-JSON | jq .field
and use that further in a script. Though I'm open to almost anything that doesn't tie up the browser.
1 answer
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
John C |
Thread: Works for me Ah, thanks! That was the missing piece. For those coming into this with a similar problem, I grabbed one of the [Snappy implementations](https://google... |
May 11, 2025 at 21:42 |
Having looked up how Snappy compression works, I'm going to upgrade my comment to an answer as I'm almost certain this is what's happening.
Based on the Firefox source code, it inserts LSValue
s which are strings with some meta-information specifying how they should be stored. Relevant here is the CompressionType
which can either be UNCOMPRESSED
or SNAPPY
. This gets stored in the data
table as the compression_type
field with a value of 1 indicating Snappy compression.
Snappy compression is a relatively simple byte-oriented, dictionary-based compression algorithm. The compressed output consists of a sequence of elements which are either uncompressed parts of the input, or references to spans of decompressed output. The "opaque binary sequences" you're seeing are almost certainly the tags for the elements (which is all there is of a reference element).
To specifically answer your question, to read the value
field of a row in the data
table, you need to take into account the conversion_type
field and the compression_type
field. If the compression_type
field is 1, then the value needs to be decompressed with the Snappy algorithm first. If it is 0, then it is not compressed. After that, the conversion_type
indicates whether the data is stored as UTF-8 or UTF-16. If conversion_type
is 1 then the data is stored as UTF-8, otherwise it's stored as UTF-16.
1 comment thread