GETTING STARTED | DATA SCIENCE | KNIME ANALYTICS PLATFORM

KNIME for the Holidays

25 days of data science treats for the holidays

Rosaria Silipo
Low Code for Data Science
11 min readJan 9, 2023

--

In December 2022, I ran this advent calendar series on social media. Every day I would propose a public resource to learn more about data science and KNIME Software. The initiative has received quite some appreciation (or number of likes for our marketing analytics fellows) and I received a few requests to publish the summary into a blog post. Well … here it is! Enjoy it!

Day 0. 25 days to Christmas! Starting tomorrow December 1st it will be 25 days to Christmas!

So, I thought of creating an advent calendar of little sweet data science and KNIME treats. One treat per day, every day, till December 25th.

Fig. 1. Let’s open together a door per day.

Day 1. The JOIN Operation. For the first day I chose a classic ETL operation: the JOIN. When teaching, I often notice that the concept of JOIN is a hard one to grasp. In 2021, in an attempt to join fun and education, I created this very short video, practically showing how the decoration of a Xmas tree changes based on the adopted JOIN mode.

Fig. 2. A festive JOIN operation.

Day 2. The Gradient Descent Algorithm. For day 2, I chose another common concept at the base of many machine learning algorithms: the gradient descent. You have heard of it, for sure. Do you know what it is and how it works? If not, here is a popular, short, and clear explanation by Roberto Cadili. The video is part of the “Data Science Pronto!” series available on the KNIME TV channel on YouTube.

Fig. 3. Gradient Descent in 3 minutes.

Day 3. Will they blend?. For day 3, I chose something old and something new: the “Will they blend?” e-book from the KNIME Press. Something old, because this book contains a collection of past articles about performing data blending with KNIME. Something new, because recently this book got a new look & feel, a new order of the stories, and a new introduction. 35 use cases showing the power of KNIME for data blending. The book is still free to download from.

Fig. 4. The “Will They Blend?” ebook for more data blending examples.

Day 4. The Beginners Space. Some ETL, some machine learning, some data blending, but where can you go to learn how to assemble KNIME workflows from zero? Learning by doing and not by studying. For today, I chose the Beginners’ space on the KNIME Hub. The Beginners space is for the hands-on hackers, for people who learn by doing. It contains numerous example workflows for beginners to read, explore, transform, and analyze your data and to deploy the final application. Let’s build a workflow or two today!

Fig. 5. The KNIME Beginners space on the KNIME Hub.

Day 5. Choropleth Map. For day 5, let’s get into the wonderful world of components. You know, a component is a node encapsulating other nodes or code. A component also combines the views of the inner nodes into its composite view. As an example of a useful component with a beautiful view, I chose today the Choropleth Map component, available for free on the KNIME Hub. This component gets numbers and countries as input and produces the choropleth according to a color heatmap of choice.

Fig. 6. The Choropleth Map component.

Day 6. Beginners Cheat Sheet. Remember the Beginners’ space from day 5? Well, it is built around the Beginners cheat sheet. So, it is only natural that after the Beginners’ space, I show the Beginners cheat sheet. Here are the nodes that every beginner (and expert) KNIME user should know: nodes for data access, data transformation, data visualization, analysis, and result export. It is free! Download it, and keep it always at hand. You never know when you’ll need it!

Fig. 7. The beginners cheat sheet for KNIME beginners.

Day 7. The Ungroup node. There is a node that everybody has used: the GroupBy node. The GroupBy node groups records according to values in selected columns; then it calculates a selected metric on the obtained groups. A metric that could be an average, a sum, a simple count, or the List of the values in other selected columns. Once a List is created, how can we extract the values back from it?

The Ungroup node does exactly that: takes a List object, extracts the values, and creates a table with rows for all values from the List. That works not only for Lists created by a GroupBy node, but for all kinds of Lists, including those resulting from web service calls. It is like squeezing oranges and being able to retrieve the original oranges from the squeezed juice! Let’s not cry anymore on squeezed oranges…

Fig. 8. Use the Ungroup node to recover items from a List object.

Day 8. Bagging & Boosting. For day 8, I would like to go back to some machine learning algorithms. No, no. Not yet the neural networks … Today let’s learn something more about ensemble algorithms, and in particular the difference between bagging and boosting algorithms. This video by Satoru Hayasaka explains it in less than 2 minutes. Do you have 2 minutes to spare?

Fig. 9. Bagging and Boosting as part of the “Data Science Pronto!” video series.

Day 9. Data Connect Events. For day 9, let’s take a rest and chill with other KNIME users. Let’s see what they are working on, whether they have advice on best practices, if they are hiring or looking for a new job; that is let’s do some networking. Where can I meet like-minded KNIME expert users? In the Data Connect events, of course! Data Connects are local events, in your language, organized for and by the community, repeated a few times over the year. Usually, one or two presentations are offered about use cases, tutorials, or solutions, followed by some networking time. After COVID, the format has changed from fully in person on site events to hybrid events. Check for KNIME on Meetup.com to see where the next Data Connect event near you will take place! Read the article “KNIME Data Connects” to learn more about the Data Connect series.

Fig. 10. The blog post announcing and describing the KNIME Data Connect events around the world.

Day 10. Animated Bar Chart. For day 10, another interesting verified component with a popular interactive view: the Animated Bar Chart. In the GIF below, you can see the evolution of the most popular music artists throughout the 70s.

Fig. 11. The Animated Bar Chart component.

Day 11. Connectors Cheat Sheet. One thing that KNIME is famous for is data blending. You can connect to a very large variety of data sources, local or remote, on premise or on the cloud, file based or database, SQL or noSQL, and so on … Today is dedicated to all KNIME connector nodes and, to represent that, I chose the Connector Cheat Sheet. Download it, It is free!

Fig. 12. Another popular cheat sheet: the Connector cheat sheet for data blending.

Day 12. Codeless Deep Learning. One of the big topics of the past few years, probably the topic that rekindled the interest in data science, is surely deep learning. As such, a little treat about deep learning could not be missing in this advent calendar. I re-propose here the book “Codeless Deep Learning with KNIME” written by me and Kathrin Melcher and published by Packt. The book covers basic and advanced concepts of neural networks, in theory and in practice with KNIME Analytics Platform and its integration with Keras. The exploration of different architectures and algorithms proceeds with practical examples: every neural architecture is investigated in depth within a common use case. The book can be purchased on Packt web site, on Amazon, or on any book e-shop.

Fig. 13. The book “Codeless Deep Learning” introduces you to building neural networks for different tasks, all (or almost all) codeless.

Day 13. Global Thresholder node. Today’s treat belongs to the KNIME Image Processing extension: the Global Thresholder node. This node implements a based threshold algorithm to distinguish background from foreground and bring back to life objects hidden within the background of an image. Below is an example created for last Halloween!

Fig. 14. The Global Thresholder node can reveal scary things… use at your own risk!

Day 14. Convolutional Neural Networks (CNN). Let’s remain in the field of image processing, but let’s move from traditional techniques to more recent algorithms, like for example Convolutional Neural Networks (CNN). What is a CNN, how can it be used, and what are 1D, 2D, and 3D convolutions? It is all explained in less than 3 minutes in this “Data Science Pronto!” video.

Fig. 15. Convolutional Neural Networks (CNN) quick and easy.

Day 15. Codeless Time Series Analysis. There are many specific niches in data science. One of the least known areas is time series analysis. Time series analysis, like many other areas in data science, comes from the union of two different fields: the classic statistics based algorithms, such as ARIMA, and the machine learning based algorithms, such as LSTM networks. The piece of wisdom for today consists of the book “Codeless Time Series Analysis with KNIME”, written by Corey Weisinger, Maarit Widmann, and Daniele Tonini, and published by Packt. This book offers a practical introduction to time series analysis from all perspectives: classical algorithms, neural networks, data visualization, and data preparation. Like the other book “Codeless Deep Learning with KNIME”, theoretical knowledge is associated with practical solutions for common use cases. All solutions are based on the KNIME time series components, hence the word “codeless” in the title. The book can be purchased on Packt web site, on Amazon, or on any book e-shop.

Fig. 16. The book “Codeless Time Series Analysis” introduces you to the rudiments of classic and modern techniques for time series analysis.

Daqy 16. Widget nodes. Today I recommend a post from the KNIME blog “Explore the Wonderful World of KNIME Widgets”. Indeed, KNIME Analytics Platform is a well established tool for codeless data blending, ETL operations, machine learning algorithms, text mining, image processing, and also to build graphical UI. Every application, more or less smart, more or less AI-driven, needs a User Interface, possibly a simple UI easy to use, that dramatically improves the user experience. To code a UI takes time and experience. The KNIME Widget nodes allow you to build a dashboard, a form, or any interactive web page with just a few nodes wrapped in a component! With KNIME, you can easily make your ETL application or your AI engine more appealing with the right user interface!

Fig. 17. Widget nodes can give a handy user interface to make your web data app more user friendly.

Day 17. Auto-SARIMA. This is the last arrival in the time series component family. This component implements a SARIMA algorithm, that is an ARIMA model for the time series and for its seasonality pattern. The orders of the SARIMA model are determined automatically, so that you can avoid this tedious phase where you try to guess the best values for (p,d,q) and (P, D, Q). If you are into time series analysis, you cannot miss this component!

Fig. 18. The auto-SARIMA component for time series analysis

Day 18. Component Cheat Sheet. In the past few days, we talked about components: components in general, components for data visualization, components for time series analysis, widget nodes for components, and there is more! Today, I would like to summarize what a component is, how to build one, how to use it, how to give it a UI, how to give it a view, and how to share it. I cannot do this obviously with this short post. So, I will point you to another cheat sheet, the cheat sheet about components.

Fig. 19. The Component cheat sheet contains all you need to build any KNIME component.

Day 19. Best of KNIME. Time for another pause to dedicate to the KNIME community. Today I would like to highlight the top experts in the KNIME community. They are just a handful: it takes time to reach that level of expertise in both data science and KNIME usage. Every month we reward one or more of such experts with the KNIME COTM award, every month since August 2020. Some of them are support experts, some are educators, some are social media influencers, some are workflow builders, some are bloggers, some are YouTubers. We have collected the contributions of the first COTMs in a booklet “Best of KNIME”. Download it. It’s free!

Fig. 20. Celebrating top KNIME users in “Best of KNIME” collection booklet.

Day 20. Linear & Logistic Regression. Back to data science algorithms. The classic of the classics: linear regression and logistic regression. They are related, but one produces numbers and the other classes. So, how are they related? What is their common root? Can I use one instead of the other? All your questions will be answered in this great short video “Linear vs Logistic Regression” from the “Data Science Pronto!” series on the KNIME TV channel on YouTube.

Fig. 21. Linear vs. Logistic Regression, again from the “Data Science Pronto!” series on YouTube.

Day 21. Component Building. We know a lot about components by now. But are we truly component experts? Today I want to share a blog post that will allow you to make that last step to become really proficient in KNIME component building. This article on the KNIME blog shares “11 best practices for component building”.

Fig. 21. Blog post on best practices for component building

Day 22. XAI. Is AI ethical? How can we make sure that our AI model does not hurt anybody? How can we know what kind of decisions our model implements? Well, there is a full branch of data science that implements methods to explain the decision process that the so-called black-box algorithms implement. This branch is called eXplainable AI or for short XAI. Today I want to feature one of the XAI components available on the KNIME Hub: the Global Feature Importance component. This component implements a few surrogate models — like for example a surrogate decision tree — to simulate the decisions that the black-box algorithm is implementing.

Fig. 23. The Global Feature Importance component for XAI.

Day 23. Machine Learning Cheat Sheet. We are getting close to the end. What about a summary of the most common machine learning algorithms? Here, the last cheat sheet of the series: The machine learning cheat sheet. In this cheat sheet, the most common machine learning algorithms are summarized and the corresponding nodes are described. It includes algorithms for classification, numeric prediction, clustering, ensemble learning, and more.

Fig. 24. The cheat sheet about machine learning algorithms could not miss in this list of KNIME resources.

Day 24. From SPSS Modeler to KNIME. Today something new! A new booklet by the KNIME Press: “From SPSS Modeler to KNIME”. Long awaited book, it is now available for free download thanks to Robin Richter from Avantum Consult. This e-book will leverage your SPSS pre-existing knowledge to teach you more about KNIME usage. Download it now for free from the KNIME Press page! If you still need a Christmas gift…

Fig. 25. This is the book to help you transition from SPSS Modeler to KNIME.

Day 25. The Rule Engine node and Santa Claus. Happy holidays everybody! Have you ever wondered how Santa delivers the gifts to the good kids and ignores the bad ones? He executes a KNIME workflow … of course! With a good old Rule Engine node in it, to decide whether a kid in this holiday season deserves a gift or not. How else did you think he was getting the job done for millions of kids around the world? :-) Happy Holidays!

Fig. 26. And to conclude … even Santa Claus uses KNIME!

I hope you enjoyed this short walk around the learning resources available for KNIME and data science! At this point all that is left to do is to wish you a happy new year 2023!

--

--

Rosaria Silipo
Low Code for Data Science

Rosaria has been mining data since her master degree, through her doctorate and job positions after that . She is now a data scientist and KNIME evangelist.