wilmott.com

Posted: **September 4th, 2023, 2:15 am**

I'm trying to find an easy way of calling a UDF in a lambda function in dataframe:
Here's the dataframe:

df = pd.DataFrame({'A': ['foo', 'bar', 'baz', 'foo'], 
                   'B': ['qux', 'quux', 'quuz', 'xyz']})
df

Now the objective is to search for all rows in both columns that meet a specific criteria (user defined) and create two new columns based on that. The first new col will have a boolean value, and the second column will have a specific text value.
First: here's what works: I write the functions directly and they work. Col 2 and 3 both have the correct boolean values; based on the search conditions, only the first row returns True, all the other rows are False.

df['col2'] = (df['A'] == 'foo') & (df['B'] == 'qux' )

df = df.assign(col3 = lambda x: (df['A'] == 'foo') & (df['B'] == 'qux' ))

Of course, in the real world dataset, conditions become more convoluted. It's easier to have a separate UDF listing all the combos and then call it separately. To that end, I wrote the UDF:

def test(df):
    if (  (df['A'] == 'foo') & (df['B'] == 'qux' ) ).any():
        x = True
    else:
        x = False
    return x

and then called it two different ways.

df = df.assign(col4 = lambda x: test(df))
# df['col5'] = df.apply(test, axis=1)
df['col6'] = df.apply(lambda x: test(df), axis=1)

As you can see from the output, the function spits out True for all rows - clearly wrong. I suspected that maybe the "any()" operator is creating some problems.....but if I remove that...I get the "...truth value of a series is ambiguous..." error.

Would appreciate any suggestions!!

wilmott.com

Python: lambda functions.

Python: lambda functions.