Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Can pandas be used as a database backend for persistent storage?
Question
What is the current state of the art database app? How does it compare to SQL? Can pandas be used in place of either?
If not, is there something that bridges the gap between SQL and pandas or the current state of the art?
Context
I have become proficient in pandas and have come to like the syntax. I have experience with SQL, but would have to study my notes to get back up to speed, and find the syntax rather tedious.
2 answers
SQL is as good as it gets, generally speaking. The best engine for a small project is SQLite. If you need more, the next step is Postgres. Postgres is very capable and it's unlikely that you'll need more even as a medium sized company. But it sounds like you're asking about small personal projects.
"State of the art" that goes beyond SQL/Postgres is usually about scalability and division of labor. These are concerns relevant to large companies (Facebook, Netflix, ...) and not for personal projects. By using these cutting edge DBs, you will be accepting compromises that make your life more difficult, and gaining benefits that don't apply to you.
As stated elsewhere, you can read/write from a SQL DB with Pandas, and many DB connectors support some kind of .to_pandas
method. However, this won't save you form learning SQL, since you'll still have to pass the queries to fetch and upsert the data. Pandas also has a looser typing system than the typical DB, so you might often run into annoying type mismatch/conversion bugs there.
Instead of a DB, you can use pandas to read/write some simple file format, like CSV or Parquet. I would recommend CSV, it's easy to work with using other tools as well. If you want something nested, then JSON is a good start.
You should really just learn SQL. Pandas at its heart is a crude reimplementation of things SQL has perfected half a century ago.
You can use Pandas to_sql
method to upload records to a relational database.
However, be aware this functionality is aimed at using Pandas for its original purpose (data analytics and transformations), so that Pandas can make complex processing and send the results to a SQL table, but it is not advisable to write a traditional "database app" using Pandas as the database connector, since it will be slow performing and less scalable (Pandas is not meant for high throughput applications).
You can use an ORM like Django ORM or SQLAlchemy if writing bare SQL is annoying for you.
0 comment threads