If you are a data scientist then most likely you use pandas on a regular basis. As a result, it is important to stay up to date on the latest features. This article will go over a few highlights of pandas 1.1.0. For more information see the official release notes.

It is interesting to see the release frequency of pandas is two releases per year:

Pandas VersionRelease Date
1.1.0July 28, 2020
1.0.0January 29,2020
0.25.0July 19, 2019
0.24.0January 25, 2019
0.23.0May 15, 2018
0.22.0December 29, 2017
0.21.0October 27, 2017
0.20.0May 5, 2017
0.19.0October 2, 2016
0.18.0March 13, 2016

If you are wondering what the main differences are between 0.25.0 and 1.0.0, pandas states:

Starting with 1.0.0, pandas will adopt a variant of SemVer to version releases. Briefly: 1) Deprecations will be introduced in minor releases (e.g. 1.1.0, 1.2.0, 2.1.0, …). 2) Deprecations will be enforced in major releases (e.g. 1.0.0, 2.0.0, 3.0.0, …) 3) API-breaking changes will be made only in major releases (except for experimental features). See Version policy for more.

From Pandas Official Documentation

There are several interesting new features and enhancements in pandas 1.1.0 however, this article will focus on these 2:

  • DataFrame.compare and Series.compare
  • Sorting with Keys

Here is a simple example of comparing DataFrames and Series.

import pandas as pd


df1 = pd.DataFrame({"col1": ["a", "b", "c"],
                   "col2": [1.0, 2.0, 3.0],
                   "col3": [1.0, 2.0, 3.0]
                   },columns=["col1", "col2", "col3"],)

df2 = df1.copy()

df2.loc[0, 'col1'] = 'c'
df2.loc[1, 'col3'] = 4.0

print(df1.compare(df2))
# Returns
#  col1       col3      
#  self other self other
#0    a     c  NaN   NaN
#1  NaN   NaN  2.0   4.0

This is a powerful new feature as it makes it easier than writing the following code:

df1['col?1'] = np.where(df1['col1'] == df2['col1'], 'True', 'False')
df1['col?2'] = np.where(df1['col2'] == df2['col2'], 'True', 'False')
df1['col?3'] = np.where(df1['col3'] == df2['col3'], 'True', 'False')
print(df1)
#  col1  col2  col3  col?1 col?2  col?3
# 0    a   1.0   1.0  False  True   True
# 1    b   2.0   2.0   True  True  False

Another interesting new feature is the fact that sort_values now accepts a key argument to the DataFrame and Series sorting methods.

s = pd.Series(['C', 'a', 'B','1','aA','A'])
print(s.sort_values())
# 3     1
# 5     A
# 2     B
# 0     C
# 1     a
# 4    aA
print(s.sort_values(key=lambda x: x.str.lower()))
# 3     1
# 1     a
# 5     A
# 4    aA
# 2     B
# 0     C

This allows more control of the sorting criteria used. Notice how by default pandas sorts the strings based on numeric value of the characters, but that can be misleading if you expect ‘a’ to be sorted before ‘B’.

If you are interested in learning more about Python and engaging with VersionBay Consultants Contact Us and we can elaborate more on our Python Services.