Accepted talks

We are still working actively to prepare the detailed schedule for the conference.

However, you can find below the list of accepted talks as well as their description.

CEILS: Counterfactual Explanations as Interventions in Latent Space

Riccardo Crupi, Alessandro, daniele regoli

Abstract: In the Machine Learning context, Counterfactual Explanations as Interventions in Latent Space (CEILS) is an Explainable AI (XAI) methodology to generate counterfactual explanations capturing by design the underlying causal relations between the variables at hand, and at the same time to provide feasible recommendations to reach the proposed profile. For instance, in a problem of loan granting you may be interested in the action to put in place in order to change a rejection decision. Moreover, there are features like the credit score, very important for the final decision, that are not directly actionable, but can change in response to changes in other (actionable) variables, such as income, bank seniority, etc. CEILS, by taking into account this causal impact among variables, makes it possible to provide actions on actionable features only, but nevertheless leveraging their impact on the other variables to eventually reach the desired counterfactual outcome.

BioConvert: a comprehensive format converter for life sciences

thomas cokelaer

Abstract: Life science uses many different formats. They may be old, or with complex syntax and converting these formats may be challenging for scientists. Bioconvert aims to provide a standard tool/interface to convert life science data formats from one to another.

Many conversion tools already exist but they may be dispersed, focused on a few specific formats, difficult to install, or not optimised. With Bioconvert, we plan to cover a wide spectrum of format conversions; we will re-use existing tools when possible and provide an interface to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.

Bioconvert is developed in Python using continuous integration, a test suite and extensive Sphinx documentation. In March 2022, we had 48 formats, 98 direct conversions (125 different methods).

Discover Pythran through 10 code samples

Serge « sans » Paille

Abstract: The Pythran compiler is used to speed-up generic Python scientific kernels across the world. Through ten code samples taken from scipy, scikit-image codebase and stack overflow snippets, this talks is going to demonstrate the major features of the compiler, as well as some technical nits!

Sequana: a set of Next Generation Sequencing pipelines

thomas cokelaer

Abstract: Sequana software is developed within a Sequencing platform at Institut Pasteur. It provides a Python library dedicated to Next Generation Sequencing (NGS) analysis including visualisation of NGS formats. Sequana is also a project that provides (i) a set of pipelines dedicated to NGS in the form of Snakefiles (Makefile-like with Python syntax based on Snakemake framework), (ii) tools to help in the creation of such pipelines, (iii) a graphical interface for Snakemake framework, (iv) standalone applications for NGS analysis. Pipelines can be run locally or on HPC clusters. Common user interface is provided to ease user interface. These NGS pipelines are ready for production and have been applied on hundreds of projects including Covid variant detection, genomic, transcriptomics, etc

How to increase diversity in open source communities

Maren Westermann

Abstract: Today state of the art scientific research strongly depends on open source libraries. The demographic of the contributors to these libraries is predominantly white and male [1][2][3][4]. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain [1][3][5][6]. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities [2][3][7]. This talk highlights the underlying problems and explores how we can overcome them.

A Primer to Maintainable Code

Alexander CS Hendorf

Abstract: In this talk, I’ll give an overview of software quality and why it’s important - especially for scientists. Provide best practices and libraries to dive deeper into, hypes to ignore, and simple guidelines to follow to write code that your peers will love.

After the talk, the audience will have a guide on how to develop better code and be aware of potential blind spots.

Revolutionalise Data Visulization with PyScript

Cheuk Ting Ho

Abstract: Since the announcement of PyScript, it has gained lots of attention and imagination about how we can run applications of Python in the browser. Out of everything that I have come across, most of the use cases are data visualisation. Let’s see how we can up our data viz game with PyScript.

Conda Store : easy environments management & reproducibility for Teams and Enterprises

Pierre-Olivier Simonard

Abstract: End users think in terms of environments, not packages. Conda Store makes it easy for data scientists to define their environments, ensures reproducibility, productionizing, easing collaboration, and reduces friction and latency between developers and IT.

Real-time estimation of an heat pump I/O state with IoT data.

Davide Poggiali

Abstract: In the present time, we are facing a continuous growing of the energy price. It is then important to optimize the use of heat pumps, both in domestic and industrial environments. Using an opportunely labeled dataset of accelerometer, speed or relative position over time coming from a cheap sensor it is possible to estimate the I/O state of any heating or cooling engine. This new real-time measure allows then to compute the energy consumption and to study the most cheap usage scheme. In this presentation we will show a real-case implementation of some fast binary classifiers, from basic statistics to machine learning, assessing the performance of each method in terms of computational time, precision and accuracy levels.

Pragmatic Panel: Build and Deploy Complex Data-Driven WebApps

Pierre-Olivier Simonard

Abstract: Panel is one of the leading choices for building dashboards in Python. In this talk, we discuss the practical aspects of complex data-driven dashboards. There are tutorials and guides available which help teach new users the basics, but this talk focuses on the challenges of building more complex, industry-ready, deployed dashboards. There are a variety of niche issues which arise when you push the limits of complexity, and we will share the solutions we have developed. We will demonstrate these solutions as we walk through the entire lifecycle from data ingestion, though exploratory analysis to deployment as a finished website.

Learning Natural Language Processing with Python

Grishma Jena

Abstract: With the advent of voice-based assistants and chatbots in our homes, our phones and our computers, businesses, stakeholders and developers want to learn about language processing. But how exactly do these devices understand human language? How can they interpret similar intents even when different words are used? What does Natural Language Processing mean and what are the steps involved?

Discrete event simulations of ‘all electric’ mines

Nicholas Hall

Abstract: How a discrete event simulation can help mining companies reduce their dependence on diesel as a source of fuel for their large haulage trucks. Using open source software, mining environments are modeled, and helps decision making for building an all electric mine, where diesel powered vehicles are made obsolete.

JupyterLab 4 and the future of Jupyter Notebook

Jeremy Tuloup, Frédéric Collonval

Abstract: JupyterLab is a powerful computational environment for interactive data science in the browser, and the new version 4 release comes with many new features and improvements.

The Jupyter Notebook project decided to base its next major version 7 on JupyterLab components and extensions, which means many JupyterLab features are also available to Jupyter Notebook users.

In this presentation, we will demo all the features coming in these new versions and how users can seamlessly switch from one notebook interface to another.

The Beauty of Zarr

Sanket Verma

Abstract: In this talk, I’d be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. This talk presents a systematic approach to understanding and implementing Zarr by showing how it works, the need for using it, and a hands-on session at the end. Zarr is based on an open technical specification, making implementations across several languages possible. I’d be mainly talking about Zarr’s Python implementation and would show how it beautifully interoperates with the existing libraries in the PyData stack.

scikit-learn and fairness, tools and challenges

Adrin Jalali

Abstract: Fairness, accountability, and transparency in machine learning have become a major part of the ML discourse. Since these issues have attracted attention from the public, and certain legislation are being put in place regulating the usage of machine learning in certain domains, the industry has been catching up with the topic and a few groups have been developing toolboxes to allow practitioners incorporate fairness constraints into their pipelines and make their models more transparent and accountable. Some examples are fairlearn, AIF360, LiFT, fairness-indicators (TF), …

This talk explores some of the tools existing in this domain and discusses work being done in scikit-learn to make it easier for practitioners to adopt these tools.

Continuous and on demand benchmarking

Mridul Seth

Abstract: We all know and love our carefully designed CI pipelines, which tests our code and makes sure by adding some code or fixing a bug we aren’t introducing a regression in the codebase. But we often don’t give the same treatment to benchmarking as we give to correctness. The benchmarking tests are usually one off scripts written to test a specific change. In this talk, we will discuss various strategies to test our code for performance regressions using ASV (airspeed velocity) for python projects.

Scipp plot: modular interactive plotting from graph nodes

nvaytet

Abstract: We present the plotting framework of the Scipp package for multi-dimensional arrays. Based on the Model View Controller pattern, it uses a set of nodes connected in a graph to represent a sequence of processing steps that can be applied to the data before plotting it onto the figure axes. A common example of this could be a 2D scatter plot, accompanied by 1D histograms of the data points on the top and right hand side of the scatter axes. The histogramming nodes, that lie below the original root data node in the graph, perform a histogram operation in each of the X and Y dimensions, and their results get sent to the top and right plotting axes.

The use of a graph of connected nodes opens up the opportunity for a very modular way of creating interactive plots. For instance, using a library of widgets (such as ipywidgets), we can change the input to one of the nodes, which notifies all the nodes below it about the change. This means that modifying a parameter of the scatter data with e.g. a slider, would automatically update not only the main scatter plot, but also the histograms on the sides.

Any function (smoothing, fitting, filtering …) can be used inside a node, and any number of axes (or views) can be attached to a given node. This flexibility allows users to create complicated interactive visualizations with just a few lines of code.

Machine learning with missing values

Gaël Varoquaux

Abstract: This talk will cover how to build predictive models that handle well missing values, using scikit-learn. It will give on the one side the statistical considerations, both the classic statistical missing-values theory and the recent development in machine learning, and on the other side how to efficiently code solutions.

Elevating Contributor Experience: Development of SciPy command-line interface (CLI)

Sayantika Banik

Abstract: Open-source contributions could be daunting to start with, new contributors often find it challenging to navigate documentation and get started. I also had similar experiences, as a newcomer to the world of open source. The idea of elevating contributor experience could range in a variety of lengths and breadths, though a tiny step forward could be very helpful.

How can we achieve this? - Identifying the bottlenecks, which aid in degraded contributing experience. - Getting feedback from the community and contributors. - Coming up with a feasible solution. - Experimenting and improving.

At SciPy, we sailed through a similar journey and developed an informative and intuitive command-line interface, aiming to guide contributors.

Scaling scikit-learn performances: introducing new sets of computational routines

Julien Jerphanion

Abstract: scikit-learn is an open-source scientific library for machine learning in Python. In this talk, we will present the recent work carried over by the scikit-learn core-developers team to improve its native performance.

CLAIMED - An open source unified platform for batch, streaming and microservices based data science

Romeo Kienzler

Abstract: Data are processed in pipelines – either an entire data set, in batches or one by one. A variety of programming languages, frameworks and libraries exists. In CLAIMED – the component library for AI, Machine Learning, ETL and Data Science – we provide an opinionated set of coarse grained components implemented as jupyter notebooks. Through C3, the claimed component compiler those can be (as of now) transformed into Kubeflow Pipeline Components, Airflow Operators or simple (docker) container images to be executed on Knative. An adapter implemented as side car transforms those into either streaming components (currently http(s) and Kafka) or micro services – with scale to zero support. Using the jupyter lab Elyra pipeline editor and CLAIMED, anybody can create data science pipelines without programming skills. But the source code is only one click away. The jupyter notebook backing the component is available for review, adjustments or improvements of the components.

Increase citations, ease review & collaboration – Making machine learning in research reproducible

Jesper Dramsch

Abstract: Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This tutorial aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

Parallelizing your ETL with Dask on Kubeflow

Jacob Tomlinson

Abstract: Dask now has even better integration with Kubeflow, allowing folks to leverage advanced parallelism in Python for both interactive and pipeline workflows. Hardware acceleration and parallelism have long been leveraged in Machine Learning and Deep Learning tasks, but now you can get the same superpowers in your data exploration, processing, and ETL steps too.

In this talk, we will cover Dask’s new Kubernetes Operator, installing it on your Kubeflow cluster, and show examples of leveraging it in interactive sessions and scheduled workflows.

Open Source Mission Support System for research aircraft missions

Reimar Bauer

Abstract: The Mission Support System Software (MSS) is a client/server application developed in the community to collaboratively create flight plans based on model data. Through conda-forge, the components of MSS can be used on different platforms.

Discovering Mathematical Optimization with Python

Pamela Alejandra Bustamante Faúndez

Abstract: Mathematical optimization is the selection of the best alternative with respect to some criterion, among a set of candidate options.

There are multiple applications of mathematical optimization. For example, in investment portfolio optimization, we search for the best way to invest capital given different alternatives. In this case, an optimization problem will allow us to choose a portfolio that minimizes risk (or maximizes profit), among all possible allocations that meet the defined requirements.

In most cases, mathematical optimization is used as a tool to facilitate decision-making. Sometimes these decisions can be made automatically in real-time.

This talk will explore how to formulate and solve mathematical optimization problems with Python, using different optimization libraries.

Emergent structures in noisy channel message-passing

Iliya Zhechev

Abstract: In this blog post, we will explain a mechanism for generating neural network glyphs, like the glyphs we use in human languages. Glyphs are purposeful marks, images with 2D structure used to communicate information. We will use neural networks to generate those structured images, by optimizing for robustness.

Python in Storm

Soundharya

Abstract: The advancement and development in the field of Technology and Communication call for a need for Real-time data processing that is fast and fault-tolerant. Apache Storm provides an epoch platform to develop applications that can process a multitude of data in real-time. Being distributed, Storm is predominantly fast and maintains high accuracy with its topological analysis and task completion checks.

Introduction to scikit-learn II

To Be Defined

Abstract: This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

This tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-off to consider. Finally, we will show how to tune the hyperparameters of the pipeline.

Sliding into Causal Inference, with Python!

Alon Nir

Abstract: What would the world look like if Russia had won the cold war? If the Boston Tea Party never happened? And where would we all be if Guido van Rossum had decided to pursue a career in theatre? Unfortunately we don’t have the technology to slide into parallel worlds and explore alternative histories. However it turns out we do have the tools to simulate parallel realities and give decent answers to intriguing ‘what if’ questions. This talk will provide a gentle introduction to these tools, professionally known as Causal Inference.

Introduction to scikit-learn I

To Be Defined

Abstract: This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

Introduction to Audio & Speech Recognition

Vaibhav Srivastav

Abstract: The audio (& speech) domain is going through a massive shift in terms of end-user performances. It is at the same tipping point as NLP was in 2017 before the Transformers revolution took over. We’ve gone from needing a copious amount of data to create Spoken Language Understanding systems to just needing a 10-minute snippet.

This tutorial will help you create strong code-first & scientific foundations in dealing with Audio data and build real-world applications like Automatic Speech Recognition (ASR) Audio Classification, and Speaker Verification using backbone models like Wav2Vec2.0, HuBERT, etc.

Introduction to NumPy

To Be Defined

Abstract: This tutorial will provide an introduction to the NumPy library intended for beginners.

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

How to make the most precise measurement

Markus Gruber

Abstract: Computer chips are created using photolithography. Today’s lithography machines are highly complex machines containing ultra-high precision optics. How do you create and in particular measure these optics? That’s easy, you build the world’s best interferometer. But what if that’s not enough?

Introduction to pandas

To Be Defined

Abstract: This tutorial is an introduction to pandas intended for beginners.

At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.

Informative and pleasant dataviz with Raincloud plot

Davide Poggiali

Abstract: Categorical plots offer a variety of plotting styles that allows the user to picture even large datasets showing some summary statistics of the data. In some case graphs can be misleading, either unwilling or on purpose. The reader can be confused or even get an incorrect idea of the phenomenon underlying the data. In this talk we introduce the Raincloud plot, a multi-language plotting style aimed to create charming and informative graphical representations of a dataset. After some introduction, we will offer a simple tutorial that will cover different use cases. We will then compare the Raincloud plots with some other plot styles, showing that some data misunderstanding can be avoided with a sufficiently detailed plot.

Network Science with Python

Mridul Seth

Abstract: This workshop is for data scientists and other programmers who want to add another tool in their data science toolkit. Modelling, analysing and visualising data as networks! Network Science deals with analysing network data, and the data can come from different fields like politics, finance, computer science, law and even Game of Thrones!

Data-Driven Thresholding for Extreme Event Detection in Geosciences

Milton Gomez

Abstract: Extreme weather events are a well known source of human suffering, loss of life, and financial hardship. Amongst these, tropical cyclones are notoriously impactful, leading to significant interest in predicting the genesis, tracks, and intensity of these storms - a task which continues to present significant challenges. In particular, tropical cyclogenesis (TCG) can be described as “a needle in a haystack” problem, and steps must be taken to make predictions tractable. Previously, the filtering of non-genesis points by thresholding predictive variables has been described, with thresholds being selected to reduce the number of discarded TCG cases. In the art, this thresholding has often been carried out empirically, that while effective relies on domain knowledge. This talk instead proposes the development of a systematic, machine-learning-based approach implemented in Python using the SciPy optimization library. The method is designed to be interpretable to the point of becoming transparent machine learning. Threshold values that minimize the false-alarm rate and maintain a high recall are found, and then combined in a forward selection algorithm. As other extreme events in the geosciences are considered needle in the haystack problems, the described approach can be of use in reducing the variable space in which to study and predict the events. Finally, the transparent nature of the proposed approach can provide simple insight into the conditions in which these events occur.

Deep learning at the Radiology & Nuclear Medicine Clinic / University Hospital Basel

Joshy Cyriac, Jakob Wasserthal

Abstract: Deep learning can assist radiology doctors in interpreting and analyzing radiology images. We will present use cases which are used today in clinical practice. These range from organ segmentation to image classification.

Renku-Python: Reproducible and Reusable Workflows

Ralf Grubenmann

Abstract: Renku is a platform that bundles together various tools for reproducible and collaborative data analysis projects. Here we take a deep dive into the Python CLI and library component of the Renku platform, highlighting its functionality for recording and executing workflows both locally and remotely, as well as its architecture for storing recorded metadata in a knowledge graph and how this can be extended by third-party plugins.

dirty_cat : a Python package for Machine Learning on Dirty Categorical Data

Lilian Boulard

Abstract: In this talk, we will introduce “dirty_cat”, a Python library for encoding dirty, non-curated categorical features into numerical features while preserving similarities. We will focus on a few methods implemented in the similarity encoder, the Gamma-Poisson encoder, the min-hash encoder and the super-vectorizer.

Optimizing inference for state of the art python models

Ed Shee

Abstract: This talk will take state of the art python models and show how, through advanced inference techniques, we can drastically increase the performance of the models at runtime. You’ll learn about the open source MLServer project and see live how easily it helps serve python-based machine learning models.

Array expressions and symbolic gradients in SymPy

Francesco Bonazzi

Abstract: SymPy is an open source computer algebra system (CAS) written in Python.

The recent addition of the array expression module provides an alternative to the matrix expression module, with generalized support to higher dimensions (matrices are constrained to 2 dimensions).

Given the importance of multidimensional arrays in machine learning and mathematical optimization problems, this talk will illustrate examples of tensorial expressions in mathematics and how they can be manipulated using either module or in the index-explicit way.

Conversion tools have been provided to SymPy to allow users to switch an expression between the array form and either the matrix or index-explicit form. In particular, the conversion from array to matrix form attempts to represent contractions, diagonalizations and axis-permutations with operations commonly used in matrix algebra, such as matrix multiplication, transposition, trace, Hadamard and Kronecker products.

A gradient algorithm for array expressions has been implemented, returning a closed-form array expression equivalent to the derivative of arrays by arrays. The derivative algorithm for matrix expressions now uses this algorithm, attempting to convert the array back to matrix form if trivial dimensions can be dropped.

Scipp: Multi-dimensional data arrays with labeled dimensions for dense and binned data

Simon Heybrock

Abstract: Inspired by Xarray, Scipp enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. For an even more intuitive and less error-prone user experience, Scipp furthermore adds physical units to arrays and their coordinates. Scipp data arrays additionally support a dictionary of masks, basic propagation of uncertainties, and bin-edge coordinates.

On top of the above, Scipp’s key feature is support for multi-dimensional non-destructive binning of record-based “tabular” data into arrays of bins. The use of labeled arrays with coordinates to represent the table of records allows for clear conceptual association of a record’s metadata with dimensions and coordinates of the array of bins. Based on this, Scipp can provide fast, flexible, and efficient binning, rebinning, and filtering operations, all while preserving the original individual records.

Scipp ships with data display and visualization features for Jupyter notebooks, including a powerful plotting interface.

What is Contributor Experience?

Noa Tamir

Abstract: In my current work as a contributor experience lead, I am supporting and growing Matplotlib’s and Pandas’ communities by organizing events, meetings, and proactive engagement with a focus on equity and inclusion of historically marginalized groups. In my talk I’ll give an introduction to this new role, the grant that supports it, and some of the work done so far…

I will share takeaways for maintainers, and contributors; from simple changes that can be implemented relatively easily, to bigger topics, which one might want to learn more about, and slowly yet proactively, facilitate changes to tweak the contributor experience for a project.

Elephants, ibises and a more Pythonic way to work with databases

Marlene Mhangami

Abstract: In this talk, I will be sharing about Ibis, a software package that provides a more Pythonic way of interacting with multiple database engines. In my own adventures living in Zimbabwe, I’ve always encountered ibises (the bird versions) perched on top of elephants. If you’ve never seen an elephant in real life I can confirm that they are huge, complex creatures. The image of a small bird sitting on top of a large elephant serves as a metaphor for how ibis (the package) provides a less complex, more performant way for Pythonistas to interact with multiple big data engines.

I’ll use the metaphor of elephants and ibises to show how this package can make a data workflow more Pythonic. The Zen of Python lets us know that simple is better than complex. The bigger and more complex your data, the more of an argument there is to use Ibis. Raw SQL can be quite difficult to maintain when your queries are very complex. For Python programmers, Ibis offers a way to write SQL in Python that allows for unit-testing, composability, and abstraction over specific query engines (e.g.BigQuery)! You can carry out joins, filters, and other operations on your data in a familiar, Pandas-like syntax. Overall, using Ibis simplifies your workflows, makes you more productive, and keeps your code readable

ReservoirPy: Efficient Training of Recurrent Neural Networks for Timeseries Processing

Xavier Hinaut

Abstract: ReservoirPy is a simple user-friendly library based on Python scientific modules. It provides a flexible interface to implement efficient Reservoir Computing (RC) architectures with a particular focus on Echo State Networks (ESN). Advanced features of ReservoirPy allow to improve computation time efficiency on a simple laptop compared to basic Python implementation, with datasets of any size.

Some of its features are: offline and online training, parallel implementation, sparse matrix computation, fast spectral initialization, advanced learning rules (e.g. Intrinsic Plasticity) etc. It also makes possible to easily create complex architectures with multiple reservoirs (e.g. deep reservoirs), readouts, and complex feedback loops. Moreover, graphical tools are included to easily explore hyperparameters with the help of the hyperopt library. It includes several tutorials exploring exotic architectures and examples of scientific papers reproduction. Moreover, graphical tools are included to easily explore hyperparameters with the help of the hyperopt library. ReservoirPy is available on GitHub with the open source MIT license, it includes a detailed documentation and a pypi package for easy installation.

Introduction to Python for scientific programming

To Be Defined

Abstract: This tutorial will provide an introduction to Python intended for beginners.

It will notably introduce the following aspects:

built-in types
controls flow (i.e. conditions, loops, etc.)
built-in functions
basic Python class

Interactive Data Science in the browser with JupyterLite and Emscripten Forge

Jeremy Tuloup, Martin Renou, Thorsten Beier

Abstract: JupyterLite is a Jupyter distribution that runs entirely in the web browser, backed by in-browser language kernels including WebAssembly powered Jupyter Xeus kernels and Pyodide.

JupyterLite enables data science and interactive computing with the PyData scientific stack, directly in the browser, without installing anything or running a server.

JupyterLite leverages the Emscripten and Conda Forge infrastructure, making it possible to easily install custom packages with binary extensions in the browser, such as numpy, scipy and scikit-learn.

conda-forge, mamba, boa and quetz - the evolution of package management for data science and beyond

Wolf Vollprecht

Abstract: Mamba is a fast, cross-platform and language independent package manager that is fully compatible with conda packages. It has enabled the conda-forge project to scale way beyond what was previously possible. In this talk we present further innovations in the mamba ecosystem, including boa, a new build tool based on mamba and quetz, an open-source and extensible package server for conda packages.

Time Series Forecasting with scikit-learn’s Quantile Gradient Boosted Regression Trees

Olivier Grisel

Abstract: This tutorial will introduce how to leverage scikit-learn’s powerful histogram-based gradient boosted regression trees with various loss functions (Least squares, Poisson and the pinball loss for quantile estimation) on a time series forecasting problem. We will see how to leverage pandas to build lag and windowing features and scikit-learn time-series cross-validation tools and other model evaluation tools.

Introduction to matplolib

To Be Defined

Abstract: This tutorial is an introduction to matplotlib intended for beginners.

conda-forge: supporting the growth of the volunteer-driven, community-based packaging project

Wolf Vollprecht, Jannis Leidel

Abstract: The conda-forge project is one of the fastest growing Open Source communities out there – and most data scientists have probably heard of it. In this talk we explain the inner workings of conda-forge, its relationship to conda and PyPI, and we will explain how everyone can package software with conda-forge.

Evaluating your machine learning models: beyond the basics

Gaël Varoquaux

Abstract: This tutorial will guide towards good evaluation of machine-learning models, choosing metrics and procedures that match the intended usage, with code examples using the latest scikit-learn’s features. We will discuss how good metrics should characterize all aspects of error, e.g. on the positive and negative class; the probability of a detection, or the probability of a true event given a detection; as they may need to catter for class imbalance. Metrics may also evaluate confidence scores, e.g. calibration. Model-evaluation procedures should gauge not only the expected generalization performance, but also its variations.

Lessions learned from 10 years of Python in industrial reseach and development

Tim Hoffmann

Abstract: This talk explains why Python is a good choice for research and development. It spans the arch from a conceptual, almost philosophical, understanding of the software needs of reseach and development up to concrete organiziational strategies.

Getting started with JupyterLab

Mike Müller

Abstract: JupyterLab is very widely used in the Python scientific community. Most, if not all, of the other tutorials will use Jupyter as a tool. Therefore, a solid understanding of the basics is very helpful for the rest of the conference as well as for your later daily work. This tutorial provides an overview of important basic Jupyter features.

pyLife – a python package for mechanical lifetime assessment

Johannes Mueller

Abstract: pyLife is a Python package covering state of the art algorithms of mechanical lifetime assessment and material fatigue. In this talk we will see a very quick glance of mechanical lifetime estimation and how we can combine classical methods from mechanical engineering with methods from data science. We will see how pyLife’s modules can be used to build versatile solutions for the engineer’s desktop as well as server based solutions for manufacturing and quality assurance with a high degree of automation. As pyLife is an Open Source project, everyone is welcome to collaborate. We are curious if we can establish a developer community in the realm of mechanical engineering. We are aiming especially towards university teachers using pyLife for teaching and research purposes.

Decsion making under uncertainty

Christian Barz

Abstract: Python is the most popular programming language in the data space and is one of the major driver of many advancements in machine learning. However, it’s much less know that the Python library Pyomo is a great tool for solving mathematical optimization problems common in operations research. In this talk I’m demonstrating how Pyomo can be used to combine data driven forecasts with optimal decision making. To this end I’m giving a short introduction to stochastic programming that allows us to solve a vehicle routing problem with uncertain demand.

Scientific Python in the browser with Pyodide

Roman Yurchak

Abstract: In this talk, we will look at the growing Python in the browser ecosystem, with a focus on the Pyodide project. We will discuss the remaining challenges as well as new possibilities it offers for scientific computing, education, and research.

Memory maps to accelerate machine learning training

Hristo Vrigazov

Abstract: Memory-mapped files are an underused tool in machine learning projects, which offer very fast I/O operations, making them suitable for storing datasets during training that don’t fit into memory. In this talk, we will discuss the benefits of using memory maps, their downsides, and how to address them.

Image processing with scikit-image

Emmanuelle Gouillart

Abstract: Image data are used in many scientific fields such as astronomy, life sciences or material sciences. This tutorial will walk you through image processing with the scikit-image library, which is the numpy-native image processing library of the scientific python ecosystem.

The first hour of the tutorial will be accessible to beginners in image processing (some experience with numpy array is a pre-requisite), and will focus on some basic concepts of digital image manipulation and processing (filters, segmentation, measures). In the last half hour, I will focus on more advanced aspects and in particular I will speak about performance and acceleration of image processing.

napari: a multi-dimensional image visualization, annotation, and analysis platform in Python

Kevin Yamauchi

Abstract: Napari is an interactive, GPU-accelerated, nD image viewer written in python. It displays images in a 2D or 3D canvas, then provides sliders for any additional dimensions in a dataset. It can also overlay associated data such as segmentations, points, polygons, surfaces, vectors, and tracks. Finally, napari is well-integrated with the scientific python ecosystem: NumPy arrays are the primary data structure used for visualization, and other standard arrays (such as Zarr or Dask arrays) are also supported. This makes it easy to insert interactive visualization, curation, and annotation steps into any workflow using standard SciPy libraries such as NumPy, SciPy, dask, and scikit-image. In this talk, I will introduce napari and demonstrate how napari can be used for interactive image analysis.

Introduction to SciPy

To Be Defined

Abstract: This tutorial will provide an introduction SciPy intended for beginners.

SciPy is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.