Dynamic code summary1

 You can modify the script to remove any spaces in the Error Code column after extracting it. This will ensure that the Error Code values are clean and consistent. Here's the updated code:

python
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import re

# Assume 'dataset' holds the input data for this script
df = dataset

# Convert RTSK Worknote to string
df["RTSK Worknote"] = df["RTSK Worknote"].astype(str)

# Function to parse HTML content
def parse_html(content):
    if '<' in content and '>' in content:
        soup = BeautifulSoup(content, 'html.parser')
        return soup.get_text(separator="", strip=True)
    return content

# Function to extract error codes and issue keywords
def extract_issues(text):
    # Regular expression to find patterns like #WD__ where __ are digits and optionally a space
    match = re.search(r'(#WD\s?\d{2,3})\s*(.*)', text)
    if match:
        error_code = match.group(1).replace(" ", "")  # Extract the error code and remove spaces
        issue_keyword = match.group(2)  # Extract the issue keyword
        return pd.Series([error_code, issue_keyword])
    return pd.Series(["No error code", "No issue keyword"])

# Apply HTML parsing to RTSK Worknote
df['RTSK Worknote'] = df['RTSK Worknote'].apply(parse_html)

# Apply the extract_issues function to RTSK Worknote
df[['Error Code', 'Issue Keywords']] = df['RTSK Worknote'].apply(extract_issues)

# Display the updated DataFrame
print(df.head())

# The final dataset
dataset = df

Explanation:

  • match.group(1).replace(" ", ""): This part of the code extracts the error code and removes any spaces using the replace method. This ensures that the Error Code column contains clean, space-free values.

By applying this updated script, you should be able to dynamically extract error codes and issue keywords, while also ensuring that any spaces in the error codes are removed.

No comments

Theme images by tjasam. Powered by Blogger.