Python Wrangling for Beginners — Where and How?

Patcha Pangatungan
7 min readSep 24, 2020

--

A reticulated python

When I was a kid, I had a close encounter with a venomous snake at our doorstep. I was just watching a few people who were cutting the grasses around our house when a snake slithered into our main doorway. When the snake curled itself on a corner of the cabinet a few feet away from me, I stood on top of our sofa stunned, scared and at the verge of crying. Luckily, the people who were cutting the grass came to the rescue and unfortunately killed the snake with their blades.

Well, in this blog, we’ll not be dealing with actual snakes. Instead, I’ll be discussing coding in Python and how not to be stunned and scared when faced with this high-level programming language by dealing and handling this general-purpose coding language in two different environments —through a Colab notebook and a Spyder Integrated Development Environment.

In my previous blogs, I tackled some basics in coding with Python using Spyder and basics in using Colab separately. Now, let’s delve into the comparisons and differences between these two tools and see their advantages and disadvantages.

1. Where can you access them?

Google Colab operates like a Jupyter notebook via remote servers and Colab notebooks are stored in Google Drive. What’s good with Colab is that it works on Google’s cloud servers where you can code and execute your Python codes without any required setup on your computer (FREE stuff alert!). You can access it by creating a new notebook from your Google Drive. Just click New > More > Google Colaboratory.

You can also just type colab.research.google.com on your browser and voila — you can already create a new notebook or access several notebooks from your drive, computer or Github account.

On the other hand, you will need to install a Python distribution platform in your local computer called Anaconda to be able to access Spyder.

2. What do they look like?

At first glance, a Colab notebook looks just like a simple Google document. (There are some kitties and corgis running around my notebook because I set the miscellaneous settings located on the upper right hand side beside share button to kitty and corgi modes)

As previously mentioned, Colab is an executable document and similar to a Jupyter notebook. It looks like a standard text editor where you can type your code using the code cells and outputs of codes are directly shown below each code block. What’s good with Colab is that you can add texts and images on your notebook using the text cells which lets you explain or create a story on what you are doing, instead of just writing all these down through comments in the code blocks. These texts are formatted using Markdown language for better understanding and organization. What’s great with Colab is that it’s not only an IDE where you can type and execute your code, but it is also a good presentation and educational tool to explain and show how to code or how your program works. (See example of Python basics explained using Google colab through one of our mentors lectures in Github)

You will also find on the sidebar a Table of Contents to help you access and organize different sections efficiently without having to scroll the whole document to find what you are looking for.

By default, you will see three separate windows on one whole window when you access Spyder. Codes and comments are written on the code block while outputs are shown in the console once the code is executed. Compared with Colab, Spyder doesn’t have the feature of having text cells and Table of Contents to organize and access sections of the codes efficiently. You will have to scroll down the whole code block some sections/codes you want to see. All other texts that are not part of the program must also be written as comments.

What’s good with Spyder that is not present with Colab notebook is the variable explorer where you can monitor different variables and see their type, size and value. This is really helpful especially in debugging or going through the code step by step which I’ll discuss in the later section.

3. How do you execute a code?

You have the option to write codes in different code blocks using Colab. Variables that you define in one cell can later be used in other cells so you can easily run the code piece by piece instead of always running the code in one go.

For Spyder on the other hand, all codes are written in one code block unlike Colab. However, this doesn’t mean that you should always run the whole program in one go while you are still on the building stage of your code. There is an option in Spyder where you can run the file, run current cell, run selection or current line and debug file.

The debug feature is one of the most helpful especially when dealing with lenghty codes since you can step into different functions or methods on several lines while monitoring the values of the variables through the Variable Explorer. You can inspect the program at specific points by inserting a breakpoint through pressing F12 on the line or double clicking on the line number where you will see a red dot that specify it as a breakpoint.

Stepping into specific lines and monitoring the values of the variables

4. How do you work with data?

It is easy to work with datasets in Colab from other Google products such as Google Sheets and files uploaded in your Google Drive. You will be able to access these files by mounting your Google Drive.

With Spyder, you will need to have your dataset on your current working directory or state the filepath while accessing the dataset you want to work on. As discussed in my previous blog, clicking the dataframe on the variable explorer will show you your dataset.

This is one of the major differences with Colab and Spyder. Spyder can easily show you the values of the variables and datasets using the variable explorer while for Colab, you will need to run a couple of print commands on several sections for you to be able to see the current values of your variables or datasets. The variable explorer of Spyder comes pretty handy to keep track of all your variables throughout the whole program.

5. What are their other features?

With Colab, you are able to load code snippets which are readily available from Google databases by accessing it on the sidebar. Here, you can see an example of a dataset imported where code snippet inserted showed an interactive data visualization which shows information on cars from three different places.

Colab and Spyder also have file explorers in its interface which lets the user access different files, notebooks, datasets or programs from different locations.

In terms of speed, since Spyder is ran and executed on your local computer, it is faster in runtime compared to Colab that is found in Google’s cloud servers. Terminating other sessions that are not in use will help speed up Colab runtime.

6. Where to save and who can access it?

Since Google Colab is readily available in the cloud, notebooks can be created, uploaded and stored on your Google Drive. As it is a Google Drive document, it is very simple to share a Colab notebook by only clicking the share button in the upper right hand corner and inputting the email address of people you want to have access on your notebook. You also have the option to save a copy of this notebook to your Github account in just a few clicks!

With Colab, people have the convenience of sharing their notebooks publicly and collaborate online. You can find and explore different useful notebooks everywhere in Github or in Google Seedbank where you can run, modify and study other people’s codes.

Colab and Spyder are powerful, easy-to-use and interactive tools for writing and executing different programs. I’m grateful that Google offers Colab as a free platform and Spyder is an easily accessible program to be installed in the local computer which I can both use in learning more and moving forward in my data science journey.

Even with its similarities and differences that contribute to advantages of one over the other, both of these IDEs are perfect for anyone who is starting out with data science!

--

--