Scikit-Learn can do THAT?! PyData Eindhoven 2024

Scikit-Learn can do THAT?!
.ical

07-11, 12:00–12:30 (Europe/Amsterdam), Else (1.3)

Many of us know scikit-learn for it's ability to construct pipelines that can do .fit().predict(). It's an amazing feature for sure. But once you dive into the codebase ... you realise that there is just so much more.

This talk will be an attempt at demonstrating some extra features in scikit-learn, and it's ecosystem, that are less common but deserve to be in the spotlight.

In particular I hope to discuss these things that scikit-learn can do:

sparse datasets and models
larger than memory datasets
sample weight techniques
image classification via embeddings
tabular embeddings/vectorisation
data deduplication
pipeline caching

If time allows I may also touch on extra topics.

There may be an opportunity to live code some of these examples, but if live coding is not possible it'd be preferable to know this ahead of time.

Prior Knowledge Expected –

It would really help to be somewhat familiar with scikit-learn.

Vincent D. Warmerdam

Vincent is a senior data professional, and recovering consultant, who worked as an engineer, researcher, team lead, and educator in the past. I’m especially interested in understanding algorithmic systems so that one may prevent failure. As such, he prefers simpler solutions that scale and worry more about data quality than the number of tensors we throw at a problem. He's also well known for creating calmcode as well as a small dozen of open-source packages.

He's currently employed at probabl where he works together with scikit-learn core maintainers to improve the ecosystem of tooling.

Scikit-Learn can do THAT?! .ical 07-11, 12:00–12:30 (Europe/Amsterdam), Else (1.3)

Scikit-Learn can do THAT?!
.ical

07-11, 12:00–12:30 (Europe/Amsterdam), Else (1.3)