Ten hidden gems from the KNIME community in 2020
Surfing the web for blog posts and journal articles about KNIME software and data science
Author: Rosaria Silipo
If you want to learn more about KNIME Analytics Platform, you can of course explore the KNIME Web site. There you can find a whole LEARNING page, including links to in-house courses, external courses, certification exams, and YouTube videos. In addition to those, you can find more resources out there on the web, provided by the KNIME community.
It is the end of the year: time for summaries and rankings. Here is my very personal list of the top 10 most interesting blog posts about KNIME software, published in 2020 by the KNIME community. As you all know, KNIME Analytics Platform is an open-source platform for all your data science needs: from data access to machine learning, from data visualization to deep learning, … but why am I telling you this? Let the community speak!
# 10 — dkyto — “KNIME — The undermined tool for reporting productivity” — Medium — Mar 29, 2020
“Start your learning journey today to walk away from reporting nightmare to a self-driven insight analytics.”
This is an introductory blog post to the usage of KNIME Analytics Platform. It does not describe how to use KNIME Analytics Platform in detail. You will not find here a step-by-step guide to build your first workflow. However, it states why KNIME Analytics Platform can help you in all stages of a data science task, be it in the usage of machine learning algorithms or in the preparation of data for reporting, all without having to write one single line of code. The post also gives a few hints on where to find help, if needed.
# 9 — Nattapat Juthaprachakul, Rui Wang, Siyu, Wu — Yihan Lan, “Want to do Data Analysis without coding? Use KNIME!” — Students’ blog at Simon Fraser University — Feb 3, 2020
“However, there is some GOOD NEWS! With great development in GUI-based applications, the introduction of KNIME is a major game changer for common people who generally do not identify themselves as a programmer.”
More than a blog, this is a full tutorial on what KNIME Analytics Platform is, how it works, why to use it, and what it can do. Especially for the last part — what it can do — it shows a number of solutions for common data science tasks, such as topic detection, simple classification, churn prediction, and credit scoring. Those are all solutions found and already available on the KNIME Hub. After that, as an example, it shows through the Titanic data set how to read, clean, visualize the data, and train and evaluate a machine learning model. If you want to get a quick tutorial, yet detailed and thorough, I will definitely recommend you to read this blog post.
# 8 — Fabio Rebecchi– “Codeless Data Science with KNIME” — LinkedIn — Dec 28, 2020
“In this article I create a step by step data science pipeline using a visual and codeless workflow with KNIME.”
This blog post I almost missed, since it came out just a few days ago. Thanks to Alexander Fillbrunn for bringing it to my attention. This is another great tutorial on how to build a full data science pipeline with KNIME Analytics Platform. It includes all steps: from data access to the training of a decision tree, from data preparation to model evaluation, from data exploration to model visualization. The task is to predict employee attrition using the “IBM HR Analytics Employee Attrition & Performance” dataset. It is a must read, if you want to learn the basics on how to implement a full data science or data wrangling pipeline.
# 7 — Jitendra Kumar Signh– “Knime: Accessing a REST API with dynamic query param” — Knoldus blog — Jul 2, 2020
“In this post, we will learn how to generate dynamic URLs by adding query parameters and get data. Knime platform supports Rest interface with Get-Request and Post-Request Node.”
A blog post by Knoldus could not be missing in this list. Indeed, their blog is a large repository of posts to learn more about KNIME Analytics Platform, data science, data wrangling, and data blending. I chose this post, because here Jitendra Kumar Singh is able to explain very clearly some quite important concepts, while describing a simple, yet necessary task, as accessing external REST services through multiple queries. The blog contains many more similar posts, which makes it a useful resource for the newbies. I advise you to take a look at it, if you are indeed a newbie.
# 6 — Ulrich Johannes– “It will go away with the heat — or it won’t Comparing infection rates and temperature” — Medium — May 5
“The data is in a slightly unpleasant format, so we need to perform some preprocessing, …”
For the mid-position of this list, I chose this blog post. It shows how easy it can be to perform some data blending (he uses three data sources) and data pre-processing. With a loop and a few joining and aggregation nodes, the final structure of the data is easily achieved. Note the usage of the node for moving aggregation. The pre-processing here is not limited to classic operations on random static observations in the dataset, it operates on time series as well.
# 5 — Tate Lowry– “An in-depth guide for cleaning Server Log Data in KNIME” — Medium — Mar 23, 2020
“KNIME excels at allowing users to visually create data workflows without code.”
With this blog post, we leave the realm of the generic usage of KNIME Analytics Platform and we get into specific solutions for specific tasks. The specific task in object is the extraction of data from a log file, after accessing, reading, parsing, and cleaning the same log file. Beyond that, however, everybody can benefit from a tip or two about data cleaning and data extraction. This blog post offers a useful description of data cleaning operations for anybody working with data at any level, especially if dealing with String data.
# 4 — Abhishek Kumar– “Eliciting important features impacting COVID-19 cases through ML algorithms” — Medium — Aug 24
“I thought to investigate and decipher the features/variables which are impacting the total number of COVID cases.”
We are now entering the top part of the list. Here articles’ authors focus more on the machine learning part of the data science cycle. In addition, being in 2020, it is inevitable that we start talking more and more about COVIID-19. This blog post focuses on the five European countries most impacted by COVID-19 at the beginning of the pandemic: Italy, France, Spain, Germany, and UK. The statistics of candidate and split attributes from a trained random forest is investigated to understand the key factors in predicting the spread of the virus.
# 3 — Israel Fernandez Pina — “UMAP dimension reduction and DBSCAN for clustering MNIST database within KNIME” — Towards Data Science — Nov 13
This is a great blog post! It really is. It combines together dimensionality reduction and visualization, the UMAP algorithm and the DBSCAN algorithm, and finally KNIME Analytics Platform and Python. The goal is to visualize clusters of data from the MNIST dataset, containing images of handwritten digits. Visualization is performed via 2-D or 3-D scatter plots available from the KNIME Plotly integration; clustering is performed via the DBSCAN algorithm through native KNIME nodes; and finally, the dimensionality reduction to just two or three attributes is performed via the UMAP algorithm from Python libraries. Indeed, the Python code is written in the Python Source node — available from the KNIME Python integration — and becomes just one new node inside the KNIME workflow. If you are interested in data visualization via Plotly, in integrating your Python script within a KNIME workflow, or just in dimensionality reduction and clustering, this is a must read.
# 2 — Angus Veitch — “TweetKollidR — A Knime workflow for creating text-rich visualisations of Twitter data” — seenanotherway blog — Oct 5
“Since writing that post, I have revised and tidied up the workflow so that anyone can use it, and I have made it available on the Knime Hub.”
This was an easy placement in the list. Thanks to this blog post, Angus Veitch has been the Contributor of the month at KNIME for the month of November. It is a full detailed description of Angus’ application — TweetKollidR — to analyze and visualize tweets. The application connects to Twitter, performs the required text processing operations, and visualizes user communities and activities. It is an interesting and powerful application. Even if you do not need to use the application itself, by reading the blog post you might learn a thing or two about connecting to Twitter, text processing, and network visualization.
Curious about the blog post at position #1? Let’s see …
# 1 — Dennis Ganzaroli — “Covid 19-Projections with Knime, Jupyter and Tableau” — The Startup — Nov 19
“Make projections for covid 19 for the next 30 days by combining KNIME for data integration, Jupyter to fit models and Tableau to create visualizations.”
Another great story about projections of COVID-19 obtained via a logistic model and visualized on a dashboard. It is a great story of technical integration as well, since the data was on a Google Drive, the data preparation was implemented with KNIME Analytics Platform, the logistic model with Jupyter, and the dashboard with Tableau. The art director of the whole movie, controlling the data pipeline, is a KNIME workflow. Great read, also to know more about how far we are from the end of the pandemic.
This is my list of the 10 topmost interesting blog posts using KNIME Analytics Platform published in 2020 by the KNIME community. The list was compiled keeping two criteria in mind: how much there is to learn, how interesting the topic and the results are. Please leave a comment below, to signal other important articles or blog posts that I might have missed.
I leave you with this reading list for the holidays. In the meantime, I wish all of you a happy and healthy new year!