The following guest post was authored by Peter Olson of Ponder. Developers from Ponder, Intel, and others contribute to open-source Modin*, which is part of the Intel end-to-end AI/ML software portfolio.
We are proud to announce that Modin, the open-source scalable drop-in replacement for pandas, has now been downloaded over 10 million times.
Modin was created in 2018 by then-Berkeley PhD student Devin Petersohn. In February 2022, it hit the 2.5-million download mark. In January 2023, it hit the 5-million mark. And now, only six months later, it’s at 10 million.
What explains Modin’s popularity?
- It addresses a critical problem – the pandas API, the foundation of data manipulation in Python*, is single-threaded and does not scale. Modin provides a parallelized version of pandas.
- It solves the scale problems of pandas without introducing user complexity. To use Modin, simply replace your pandas import statement with “import modin.pandas as pd.” No need to learn a new dataframe API.
There have been several inflection points in Modin’s growth. In 2021, Intel began distributing Modin as part of its oneAPI AI Analytics Toolkit. At the end of 2022, Amazon Web Services (AWS)* incorporated Modin into two different products – AWS Glue and the AWS SDK for pandas. And in early 2023, Ponder announced that Modin was an integral part of its product that translates pandas into SQL and runs it in various data warehouses.
Many companies rely on Modin to accelerate their data processing workloads. There’s nothing more powerful than hearing how easy it is to do this from a key user:
Modin creator Devin Petersohn had this to say on hitting the 10-million-download mark: “Modin is an interesting project for a lot of technical reasons, but I’m most proud of the community. Our community is kind, transparent, and welcoming. It’s been really exciting to grow the Modin project from 0 to 10M, but I still think the most exciting work is ahead of us.”