22 Mar, 2023
A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.
Wrong data format is a common challenge when dealing with real-world ? data.
For instance, you might have a numerical value that is stored as a string such as “34” instead of 34.
✅ Using the astypefunction, you can easily convert data from one type to another (e.g. string to numerical).
Below is an illustration ?
Two columns with the same name may not contain the same values, and two rows with the same index may not be identical.
To know if two DataFrames are equal, you need to go deeper ? to check if they have the same shape and same elements.
This is where the Pandas ??????() function comes in handy.
✅ It returns True if the two DataFrames are equal.
❌ It returns False if they are not equal.
Below is an illustration ?
Sometimes it is necessary to go beyond the default output provided by Python to make it more understandable by humans ????????????.
✅ This can be achieved using the humanize library.
The full video tutorial is available here for more examples.
Natural language ?️ is everywhere ?, even in our DataFrames.
This is not a bad thing itself because it is the perfect ?? type of data when performing natural language processing tasks.
However, their limitations ?? become obvious when trying to perform numerical computation.
?️✅ To tackle this issue, you can use the ????????() function from the python library ?????????.
✨ It converts natural language expressions of numbers into their actual numerical values.
Below is an illustration ?
Using the + sign is probably the most common approach to combine ? lists.
However, typing the + sign all the time can become easily boring when you have to deal with multiple lists.
✅ Instead, you can use the add and reduce functions respectively from the operator and functools modules.
Below is an illustration ?
If you have been using the zip() function, then you might be aware of this limitation: it does not work with iterables of different sizes, which can lead to information loss.
?️✅ You can tackle this issue with zip function’s cousin: zip_longest() function from the itertools module.
Instead of ignoring the remaining items, their values are replaced with None
That’s good, but can be even amazing using the fillvalue parameter to replace the None with a meaningful value.
Below is an illustration ?
Thank you for reading! ? ?
I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.
Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.
Would you like to buy me a coffee ☕️? → Here you go!
Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!
Before you leave find the last two parts of this series below:
Pandas & Python Tricks for Data Science & Data Analysis — Part 1
Pandas & Python Tricks for Data Science & Data Analysis — Part 2
Pandas & Python Tricks for Data Science & Data Analysis — Part 3