Siuba: Dplyr style dataframes in Python

07.02.22

Siuba: Dplyr style dataframes in Python

Pandas' group by operations are a pain to use. Siuba fixes that.

Python's success has been down to picking the best bits of other languages and stealing them for itself. Originally, Python had no array library, so it copied Matlab's to make Numpy. Pandas' data frames where inspired by R's. Since my general move to Python a few years ago, the main things I have missed are good graphics and the dplyr library group by operations.

A new library siuba addresses this:

from siuba import group_by, summarize, _
from siuba.data import mtcars

(mtcars
  >> group_by(_.cyl)
  >> summarize(
      hp_mean = _.hp.mean(),
      hp_sd  = _.hp.std())
  )
Out[2]:
   cyl     hp_mean      hp_sd
0    4   82.636364  20.934530
1    6  122.285714  24.260491
2    8  209.214286  50.976886

I really like this, as I do a lot of this type of manipulation when discovering a new dataset. Good work Micheal!

←Previous: Healthcare AI is stuck in POC hell

Next: IBM Watson obituary→

Siuba: Dplyr style dataframes in Python

Keep up with the latest developments in data science. One email per month.