If you are a data scientist then most likely you use pandas on a regular basis. As a result, it is important to stay up to date on the latest features. This article will go over a few highlights of pandas 1.1.0. For more information see the official release notes.
It is interesting to see the release frequency of pandas is two releases per year:
Pandas Version | Release Date |
1.1.0 | July 28, 2020 |
1.0.0 | January 29,2020 |
0.25.0 | July 19, 2019 |
0.24.0 | January 25, 2019 |
0.23.0 | May 15, 2018 |
0.22.0 | December 29, 2017 |
0.21.0 | October 27, 2017 |
0.20.0 | May 5, 2017 |
0.19.0 | October 2, 2016 |
0.18.0 | March 13, 2016 |
If you are wondering what the main differences are between 0.25.0 and 1.0.0, pandas states:
Starting with 1.0.0, pandas will adopt a variant of SemVer to version releases. Briefly: 1) Deprecations will be introduced in minor releases (e.g. 1.1.0, 1.2.0, 2.1.0, …). 2) Deprecations will be enforced in major releases (e.g. 1.0.0, 2.0.0, 3.0.0, …) 3) API-breaking changes will be made only in major releases (except for experimental features). See Version policy for more.
From Pandas Official Documentation
There are several interesting new features and enhancements in pandas 1.1.0 however, this article will focus on these 2:
DataFrame.compare
andSeries.compare
- Sorting with Keys
Here is a simple example of comparing DataFrames and Series.
import pandas as pd df1 = pd.DataFrame({"col1": ["a", "b", "c"], "col2": [1.0, 2.0, 3.0], "col3": [1.0, 2.0, 3.0] },columns=["col1", "col2", "col3"],) df2 = df1.copy() df2.loc[0, 'col1'] = 'c' df2.loc[1, 'col3'] = 4.0 print(df1.compare(df2)) # Returns # col1 col3 # self other self other #0 a c NaN NaN #1 NaN NaN 2.0 4.0
This is a powerful new feature as it makes it easier than writing the following code:
df1['col?1'] = np.where(df1['col1'] == df2['col1'], 'True', 'False') df1['col?2'] = np.where(df1['col2'] == df2['col2'], 'True', 'False') df1['col?3'] = np.where(df1['col3'] == df2['col3'], 'True', 'False') print(df1) # col1 col2 col3 col?1 col?2 col?3 # 0 a 1.0 1.0 False True True # 1 b 2.0 2.0 True True False
Another interesting new feature is the fact that sort_values
now accepts a key
argument to the DataFrame
and Series
sorting methods.
s = pd.Series(['C', 'a', 'B','1','aA','A']) print(s.sort_values()) # 3 1 # 5 A # 2 B # 0 C # 1 a # 4 aA print(s.sort_values(key=lambda x: x.str.lower())) # 3 1 # 1 a # 5 A # 4 aA # 2 B # 0 C
This allows more control of the sorting criteria used. Notice how by default pandas sorts the strings based on numeric value of the characters, but that can be misleading if you expect ‘a’ to be sorted before ‘B’.
If you are interested in learning more about Python and engaging with VersionBay Consultants Contact Us and we can elaborate more on our Python Services.