Given the new Spark Data Frame API, it is not clear whether it is possible to modify the dataframe column.
How do I go about changing a value in the line in the x
column y
?
A new data frame with the desired modifications.
If you want to change the value in a column based on a condition, such as np.where
:
from pyspark.sql import as the function F update_func = (F.when (F.col ( 'update_col') == replace_val, new_value) or (F.col ( 'update_col'))) df = df.with column ( 'New_column_name', update_func )
If you want to do some work on a column and create a new column that is added to the dataframe:
Important T-Pyspark. F. Import pyspark.sql.types as sql.f as T DEF my_func G) Return on material for columns transformed_value # if we believe that my_func a string my_udf = F.UserDefinedFunction (my_func, T.StringType ()) Df = df.with column ( 'new_column_name', my_udf ( 'update_col '))
If you want the new column to have the same name as the old column, then you have additional steps:
df = df.drop (' Update_col '). With Colmrenamit ( 'new_column_name', 'update_col')
When you have a column You can work on a column and return a new dataframe reflecting that change. For this you can implement the operation to implement a UserDefinedFunction
first and then select that function selectively for the selected column. Python:
userdefinedFunction import pyspark.sql.functions to StringType name pyspark.sql.types = 'target_column' udf = UserDefinedFunction (lambda x: 'new_value', StringType ()) new_df = Old_df.select (* [ udf (column) .alias (name) if the column == name and column to column old_df.column])
new_df
now old_df
is the same schema (as that assuming old_df.target_column
also StringType
anyway) but the column all values target_column
Will be new_value
.
No comments:
Post a Comment