{"id":2083,"date":"2024-08-09T13:46:17","date_gmt":"2024-08-09T13:46:17","guid":{"rendered":"https:\/\/2024.automl.cc\/?page_id=2083"},"modified":"2024-09-09T12:18:13","modified_gmt":"2024-09-09T12:18:13","slug":"manual-and-automatic-preprocessing-tables-for-machine-learning-with-scikit-learn-and-skrub","status":"publish","type":"page","link":"https:\/\/2024.automl.cc\/?page_id=2083","title":{"rendered":"Manual and automatic preprocessing tables for machine learning, with scikit-learn and skrub"},"content":{"rendered":"<div data-colibri-id=\"2083-c1\" class=\"style-959 style-local-2083-c1 position-relative\">\n  <!---->\n  <div data-colibri-component=\"section\" data-colibri-id=\"2083-c2\" id=\"custom\" class=\"h-section h-section-global-spacing d-flex align-items-lg-center align-items-md-center align-items-center style-964 style-local-2083-c2 position-relative\">\n    <!---->\n    <!---->\n    <div class=\"h-section-grid-container h-section-boxed-container\">\n      <!---->\n      <div data-colibri-id=\"2083-c3\" class=\"h-row-container gutters-row-lg-2 gutters-row-md-2 gutters-row-0 gutters-row-v-lg-2 gutters-row-v-md-2 gutters-row-v-2 style-965 style-local-2083-c3 position-relative\">\n        <!---->\n        <div class=\"h-row justify-content-lg-center justify-content-md-center justify-content-center align-items-lg-stretch align-items-md-stretch align-items-stretch gutters-col-lg-2 gutters-col-md-2 gutters-col-0 gutters-col-v-lg-2 gutters-col-v-md-2 gutters-col-v-2\">\n          <!---->\n          <div class=\"h-column h-column-container d-flex h-col-lg-auto h-col-md-auto h-col-auto style-966-outer style-local-2083-c4-outer\">\n            <div data-colibri-id=\"2083-c4\" class=\"d-flex h-flex-basis h-column__inner h-px-lg-2 h-px-md-2 h-px-2 v-inner-lg-2 v-inner-md-2 v-inner-2 style-966 style-local-2083-c4 position-relative\">\n              <!---->\n              <!---->\n              <div class=\"w-100 h-y-container h-column__content h-column__v-align flex-basis-100 align-self-lg-start align-self-md-start align-self-start\">\n                <!---->\n                <div data-colibri-id=\"2083-c5\" class=\"h-text h-text-component style-967 style-local-2083-c5 position-relative h-element\">\n                  <!---->\n                  <!---->\n                  <div class=\"\">\n                    <p><strong>Date:<\/strong>&nbsp;10.09.2024, 15:30-17:00<\/p>\n                    <p>Room: Auditorium<\/p>\n                    <h2>Speakers<\/h2>\n                    <p>\n                      <a href=\"https:\/\/gael-varoquaux.info\/about.html\" style=\"font-family: &quot;Open Sans&quot;; font-weight: 400; font-size: 1em; color: rgb(3, 169, 244);\" class=\"customize-unpreviewable\">Ga\u00ebl Varoquaux<\/a>, INRIA<\/p>\n                    <h2>Motivation<\/h2>\n                    <p>Tables typically require much data preparation before feeding to a machine learning model. I will explore this preprocessing and how skrub can help. In particular, I will hint at some advanced features coming up in skrub to bridge\n                      ML to database practice.<\/p>\n                    <ul>\n                      <li>1. Particularities of tabular data<\/li>\n                      <li>2. Missing values<\/li>\n                      <li>3. Categorical and string values<\/li>\n                      <li>4. Advanced pipelining<\/li>\n                    <\/ul>\n                    <h2>Speakers<\/h2>\n                    <p>\n                      <br>\n                    <\/p>\n                  <\/div>\n                <\/div>\n              <\/div>\n            <\/div>\n          <\/div>\n        <\/div>\n      <\/div>\n    <\/div>\n  <\/div>\n  <div data-colibri-component=\"section\" data-colibri-id=\"2083-c6\" id=\"overlappable\" class=\"h-section h-section-global-spacing d-flex align-items-lg-center align-items-md-center align-items-center style-1014 style-local-2083-c6 position-relative\">\n    <!---->\n    <!---->\n    <div class=\"h-section-grid-container h-section-boxed-container\">\n      <!---->\n      <div data-colibri-id=\"2083-c7\" class=\"h-row-container gutters-row-lg-0 gutters-row-md-0 gutters-row-0 gutters-row-v-lg-0 gutters-row-v-md-0 gutters-row-v-0 style-1015 style-local-2083-c7 position-relative\">\n        <!---->\n        <div class=\"h-row justify-content-lg-center justify-content-md-center justify-content-center align-items-lg-stretch align-items-md-stretch align-items-stretch gutters-col-lg-0 gutters-col-md-0 gutters-col-0 gutters-col-v-lg-0 gutters-col-v-md-0 gutters-col-v-0\">\n          <!---->\n          <div class=\"h-column h-column-container d-flex h-col-lg-auto h-col-md-auto h-col-auto style-1016-outer style-local-2083-c8-outer\">\n            <div data-colibri-id=\"2083-c8\" class=\"d-flex h-flex-basis h-column__inner h-ui-empty-state-container h-px-lg-0 h-px-md-0 h-px-0 v-inner-lg-0 v-inner-md-0 v-inner-0 style-1016 style-local-2083-c8 position-relative\">\n              <!---->\n              <!---->\n              <div class=\"w-100 h-y-container h-column__content h-column__v-align flex-basis-100\">\n                <!---->\n              <\/div>\n            <\/div>\n          <\/div>\n          <div class=\"h-column h-column-container d-flex h-col-lg-auto h-col-md-auto h-col-auto style-1017-outer style-local-2083-c9-outer\">\n            <div data-colibri-id=\"2083-c9\" class=\"d-flex h-flex-basis h-column__inner h-px-lg-2 h-px-md-2 h-px-2 v-inner-lg-2 v-inner-md-2 v-inner-2 style-1017 style-local-2083-c9 position-relative\">\n              <!---->\n              <!---->\n              <div class=\"w-100 h-y-container h-column__content h-column__v-align flex-basis-100 align-self-lg-center align-self-md-center align-self-center\">\n                <!---->\n                <div data-colibri-id=\"2083-c10\" class=\"h-global-transition-all h-heading style-1018 style-local-2083-c10 position-relative h-element\">\n                  <!---->\n                  <div class=\"h-heading__outer style-1018 style-local-2083-c10\">\n                    <!---->\n                    <!---->\n                    <h4 class=\"\">Ga\u00ebl Varoquaux<\/h4>\n                  <\/div>\n                <\/div>\n                <div data-colibri-id=\"2083-c11\" class=\"h-text h-text-component style-1019 style-local-2083-c11 position-relative h-element\">\n                  <!---->\n                  <!---->\n                  <div class=\"\">\n                    <p>\n                      <a href=\"https:\/\/gael-varoquaux.info\/about.html\" style=\"color: rgb(3, 169, 244); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">Ga\u00ebl Varoquaux<\/a><span style=\"color: rgb(70, 112, 127); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">&nbsp;is a research director working on data science at Inria (French computer science national research) where he leads the&nbsp;<\/span>\n                      <span\n                        style=\"color: rgb(3, 169, 244); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">Soda team<\/span><span style=\"color: rgb(70, 112, 127); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">. Varoquaux\u2019s research covers fundamentals of artificial intelligence, statistical learning, natural language processing, causal inference, as well as applications to health, with a current focus on public health and epidemiology. He also creates technology: he co-funded scikit-learn, one of the reference machine-learning toolboxes, and helped build various central tools for data analysis in Python. Varoquaux has worked at UC Berkeley, McGill, and university of Florence. He did a PhD in quantum physics supervised by&nbsp;<\/span>\n                        <a\n                          href=\"https:\/\/fr.wikipedia.org\/wiki\/Alain_Aspect\" style=\"color: rgb(3, 169, 244); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">Alain Aspect<\/a><span style=\"color: rgb(70, 112, 127); font-size: 16px; font-weight: 400; font-family: &quot;Open Sans&quot;;\">&nbsp;and is a graduate from Ecole Normale Superieure, Paris.<\/span><\/p>\n                  <\/div>\n                <\/div>\n              <\/div>\n            <\/div>\n          <\/div>\n        <\/div>\n      <\/div>\n    <\/div>\n  <\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Date:&nbsp;10.09.2024, 15:30-17:00 Room: Auditorium Speakers Ga\u00ebl Varoquaux, INRIA Motivation Tables typically require much data preparation before feeding to a machine learning model. I will explore this preprocessing and how skrub can help. In particular, I will hint at some advanced features coming up in skrub to bridge ML to database practice. 1. Particularities of tabular [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width-page.php","meta":{"footnotes":""},"class_list":["post-2083","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/pages\/2083","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/2024.automl.cc\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2083"}],"version-history":[{"count":3,"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/pages\/2083\/revisions"}],"predecessor-version":[{"id":2584,"href":"https:\/\/2024.automl.cc\/index.php?rest_route=\/wp\/v2\/pages\/2083\/revisions\/2584"}],"wp:attachment":[{"href":"https:\/\/2024.automl.cc\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2083"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}