Python: lambda functions.
Posted: September 4th, 2023, 2:15 am
I'm trying to find an easy way of calling a UDF in a lambda function in dataframe:
Here's the dataframe:
Now the objective is to search for all rows in both columns that meet a specific criteria (user defined) and create two new columns based on that. The first new col will have a boolean value, and the second column will have a specific text value.
First: here's what works: I write the functions directly and they work. Col 2 and 3 both have the correct boolean values; based on the search conditions, only the first row returns True, all the other rows are False.
Of course, in the real world dataset, conditions become more convoluted. It's easier to have a separate UDF listing all the combos and then call it separately. To that end, I wrote the UDF:
and then called it two different ways.
As you can see from the output, the function spits out True for all rows - clearly wrong. I suspected that maybe the "any()" operator is creating some problems.....but if I remove that...I get the "...truth value of a series is ambiguous..." error.
Would appreciate any suggestions!!
Here's the dataframe:
Code: Select all
df = pd.DataFrame({'A': ['foo', 'bar', 'baz', 'foo'],
'B': ['qux', 'quux', 'quuz', 'xyz']})
df
First: here's what works: I write the functions directly and they work. Col 2 and 3 both have the correct boolean values; based on the search conditions, only the first row returns True, all the other rows are False.
Code: Select all
df['col2'] = (df['A'] == 'foo') & (df['B'] == 'qux' )
df = df.assign(col3 = lambda x: (df['A'] == 'foo') & (df['B'] == 'qux' ))
Code: Select all
def test(df):
if ( (df['A'] == 'foo') & (df['B'] == 'qux' ) ).any():
x = True
else:
x = False
return x
Code: Select all
df = df.assign(col4 = lambda x: test(df))
# df['col5'] = df.apply(test, axis=1)
df['col6'] = df.apply(lambda x: test(df), axis=1)
Would appreciate any suggestions!!