07.02.22
Pandas' group by operations are a pain to use. Siuba fixes that.
Python's success has been down to picking the best bits of other languages and stealing them for itself. Originally, Python had no array library, so it copied Matlab's to make Numpy. Pandas' data frames where inspired by R's. Since my general move to Python a few years ago, the main things I have missed are good graphics and the dplyr library group by operations.
A new library siuba addresses this:
from siuba import group_by, summarize, _
from siuba.data import mtcars
(mtcars
>> group_by(_.cyl)
>> summarize(
hp_mean = _.hp.mean(),
hp_sd = _.hp.std())
)
Out[2]:
cyl hp_mean hp_sd
0 4 82.636364 20.934530
1 6 122.285714 24.260491
2 8 209.214286 50.976886
I really like this, as I do a lot of this type of manipulation when discovering a new dataset. Good work Micheal!