Dynamic code summary

 Here’s the updated code:

python
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import re

# Assume 'dataset' holds the input data for this script
df = dataset

# Convert RTSK Worknote to string
df["RTSK Worknote"] = df["RTSK Worknote"].astype(str)

# Function to parse HTML content
def parse_html(content):
    if '<' in content and '>' in content:
        soup = BeautifulSoup(content, 'html.parser')
        return soup.get_text(separator="", strip=True)
    return content

# Function to extract error codes and issue keywords
def extract_issues(text):
    # Regular expression to find patterns like #WD__ where __ are digits
    match = re.search(r'(#WD\d{2,3})\s*(.*)', text)
    if match:
        error_code = match.group(1)  # Extract the error code
        issue_keyword = match.group(2)  # Extract the issue keyword
        return pd.Series([error_code, issue_keyword])
    return pd.Series(["No error code", "No issue keyword"])

# Apply HTML parsing to RTSK Worknote
df['RTSK Worknote'] = df['RTSK Worknote'].apply(parse_html)

# Apply the extract_issues function to RTSK Worknote
df[['Error Code', 'Issue Keywords']] = df['RTSK Worknote'].apply(extract_issues)

# Display the updated DataFrame
print(df.head())

# The final dataset
dataset = df

Explanation:

  1. Regular Expression (Regex):

    • r'(#WD\d{2,3})\s*(.*)' is used to identify the pattern #WD__ where _ represents 2 or 3 digit numbers, followed by any issue keywords.

    • #WD\d{2,3} matches #WD followed by 2 or 3 digits.

    • \s* matches any whitespace after the error code.

    • (.*) captures the issue keywords that follow.

  2. extract_issues Function:

    • Uses the regex to extract the error code and issue keywords.

    • Returns these values as a Series to be added as new columns in the DataFrame.

  3. Applying the Function:

    • The apply function is used to apply extract_issues to each RTSK Worknote.

    • The results are stored in new columns Error Code and Issue Keywords

No comments

Theme images by tjasam. Powered by Blogger.