Filter rows

 create sample dataset. I want to apply filter rows which has unwanted phrases like "[code]Task currently being worked on by automation" and like "<h3/>", "<p>", "<th>". Give me a python code

ChatGPT said:
ChatGPT

Here's how you can create a sample dataset and then filter out rows that contain unwanted phrases in the RTSK Worknote column. I’ll include some entries with specific unwanted phrases like "[code]Task currently being worked on by automation", "<h3/>", "<p>", and "<th>".

After generating the sample data, I'll show you how to filter out rows containing any of these unwanted phrases.

Step 1: Generate Sample Data

I'll create a small sample with 20 rows for demonstration, and you can adjust it to 100 if needed.

Step 2: Apply the Filter

python
import pandas as pd import random # Define unwanted phrases for filtering unwanted_phrases = ["[code]Task currently being worked on by automation", "<h3/>", "<p>", "<th>"] # Generate sample data data = { "RTSK Number": [f"RTSK-{i+1:05}" for i in range(20)], "RTSK Short Desc": [ random.choice(["Request for DBaaS Database - Create Oracle", "Request for Physical Server - Install"]) for _ in range(20) ], "RTSK Worknote": [ random.choice([ "Automation completed successfully", "[code]Task currently being worked on by automation", "Automation failed due to timeout <h3/>", "<p>Task requires manual intervention</p>", "Issue: Server unreachable <th>" ]) for _ in range(20) ], "Execution Type": [random.choice(["Automated", "Manual"]) for _ in range(20)], "Automation Status": [random.choice(["Completed", "Pending", "In Progress"]) for _ in range(20)], "Automation Execution Status": [random.choice(["Success", "Failed"]) for _ in range(20)] } # Convert to DataFrame df = pd.DataFrame(data) # Display the original data print("Original Dataset:") print(df) # Filter out rows that contain any of the unwanted phrases in 'RTSK Worknote' pattern = '|'.join(map(re.escape, unwanted_phrases)) filtered_df = df[~df['RTSK Worknote'].str.contains(pattern, na=False)] # Display the filtered dataset print("\nFiltered Dataset:") print(filtered_df)

Explanation

  1. Define Unwanted Phrases: The unwanted_phrases list contains phrases you want to filter out.
  2. Generate Sample Data: Creates a sample dataset with mixed values in the RTSK Worknote column.
  3. Apply Filter: The filter uses a regular expression pattern that combines all unwanted phrases with an OR (|) operator and then excludes rows that contain any of those phrases in RTSK Worknote.

This code will output both the original and filtered datasets so you can see the effect of the filtering. Let me know if you need further customization!

No comments

Theme images by tjasam. Powered by Blogger.