- Using the Power of Machine Learning to Compress

Use the Power of Machine Learning to Compress

Why choose a compression method yourself when machine learning can predict the best one for your data and requirements?

How prediction works:
Input: titanic.csv

y,Pclass, sex ,age 0, 3 ,female, 22 1, 1 , male , 38 0, 2 , male , 30 1, 3 ,female, 26 [x97]


{ "num_observations": 100, "num_columns": 4, "num_string_columns": 1, ... }

Model Predictions

Smallest size: csv+bz2 ✓ Fastest write time: csv+gzip ✓ Fastest read time: csv+gzip ✓ Weighted (3, 1, 1): csv+bz2 ✓

You can either read the blog, or keep on reading and try a demo.
$ pip install shrynk

Now in python...

>>> from shrynk import save

You control how important size, write_time and read_time are.
Here, size is 3 times more important than write and read.

>>> save(df, "mydata", size=3, write=1, read=1) "mydata.csv.bz2" >>> save(df, "mydata", size=3, write=1, read=1) "mydata.csv.bz2"
or from command line (will predict and compress)
$ shrynk compress mydata.csv $ shrynk decompress mydata.csv.gz
Contribute your data:
format_quote Data & Model for the community, by the community format_quote
1. Click below to upload a CSV file and let the compression be predicted.
2. It will also run all compression methods to see if it is correct.
3. In case the result is not in line with the ground truth, the features (not the data) will be added to the training data!
or run the example