create sample dataset. I want to apply filter rows which has unwanted phrases like "[code]Task currently being worked on by automation" and like "<h3/>", "<p>", "<th>". Give me a python code
ChatGPT said:
Here's how you can create a sample dataset and then filter out rows that contain unwanted phrases in the RTSK Worknote column. I’ll include some entries with specific unwanted phrases like "[code]Task currently being worked on by automation", "<h3/>", "<p>", and "<th>".
After generating the sample data, I'll show you how to filter out rows containing any of these unwanted phrases.
Step 1: Generate Sample Data
I'll create a small sample with 20 rows for demonstration, and you can adjust it to 100 if needed.
Step 2: Apply the Filter
pythonimport pandas as pd
import random
# Define unwanted phrases for filtering
unwanted_phrases = ["[code]Task currently being worked on by automation", "<h3/>", "<p>", "<th>"]
# Generate sample data
data = {
"RTSK Number": [f"RTSK-{i+1:05}" for i in range(20)],
"RTSK Short Desc": [
random.choice(["Request for DBaaS Database - Create Oracle", "Request for Physical Server - Install"])
for _ in range(20)
],
"RTSK Worknote": [
random.choice([
"Automation completed successfully",
"[code]Task currently being worked on by automation",
"Automation failed due to timeout <h3/>",
"<p>Task requires manual intervention</p>",
"Issue: Server unreachable <th>"
])
for _ in range(20)
],
"Execution Type": [random.choice(["Automated", "Manual"]) for _ in range(20)],
"Automation Status": [random.choice(["Completed", "Pending", "In Progress"]) for _ in range(20)],
"Automation Execution Status": [random.choice(["Success", "Failed"]) for _ in range(20)]
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Display the original data
print("Original Dataset:")
print(df)
# Filter out rows that contain any of the unwanted phrases in 'RTSK Worknote'
pattern = '|'.join(map(re.escape, unwanted_phrases))
filtered_df = df[~df['RTSK Worknote'].str.contains(pattern, na=False)]
# Display the filtered dataset
print("\nFiltered Dataset:")
print(filtered_df)
Explanation
- Define Unwanted Phrases: The
unwanted_phraseslist contains phrases you want to filter out. - Generate Sample Data: Creates a sample dataset with mixed values in the
RTSK Worknotecolumn. - Apply Filter: The filter uses a regular expression pattern that combines all unwanted phrases with an OR (
|) operator and then excludes rows that contain any of those phrases inRTSK Worknote.
This code will output both the original and filtered datasets so you can see the effect of the filtering. Let me know if you need further customization!
No comments