Why choose a compression method yourself when machine learning can predict the best one for your data and requirements?
y,Pclass, sex ,age
0, 3 ,female, 22
1, 1 , male , 38
0, 2 , male , 30
1, 3 ,female, 26 [x97]
{
"num_observations": 100,
"num_columns": 4,
"num_string_columns": 1,
...
}
Smallest size: csv+bz2 ✓
Fastest write time: csv+gzip ✓
Fastest read time: csv+gzip ✓
Weighted (3, 1, 1): csv+bz2 ✓
$ pip install shrynk
Now in python...
>>> from shrynk import save
You control how important size, write_time and read_time are.
Here, size is 3 times more important than write and read.
$ shrynk compress mydata.csv
$ shrynk decompress mydata.csv.gz