Summarization_gensim

You can add a filtering step to remove specific lines before summarizing the text. Here's how you can update the script to skip comments containing specific phrases like "Integrator Record Response" and "Function Name: UPDATE_SYSTEM":

Updated Script with Filtering

python
import pandas as pd
from gensim.summarization import summarize

# Load dataset
df = pd.read_csv('path_to_your_dataset.csv')

# Function to prioritize specific comments in the summary
def prioritize_comments(text, keyword="Automation Failed"):
    sentences = text.split('.')
    prioritized = [sentence for sentence in sentences if keyword in sentence]
    remaining = [sentence for sentence in sentences if keyword not in sentence]
    return '. '.join(prioritized + remaining)

# Function to filter out specific lines
def filter_comments(text, unwanted_phrases):
    sentences = text.split('.')
    filtered = [sentence for sentence in sentences if all(phrase not in sentence for phrase in unwanted_phrases)]
    return '. '.join(filtered)

# Group by "RTSK Short Desc" and "Automation Execution Status"
grouped = df.groupby(['RTSK Short Desc', 'Automation Execution Status'])

unwanted_phrases = ["Integrator Record Response", "Function Name: UPDATE_SYSTEM"]

# Summarize worknotes for each group
summaries = []
for name, group in grouped:
    combined_notes = '. '.join(group['RTSK Worknote'].tolist())
    # Filter out specific lines
    filtered_notes = filter_comments(combined_notes, unwanted_phrases)
    # Generate summary
    summary = summarize(filtered_notes, ratio=0.2)
    # Customize summary
    customized_summary = prioritize_comments(summary)
    summaries.append({
        'RTSK Short Desc': name[0],
        'Automation Execution Status': name[1],
        'Summary': customized_summary
    })

# Convert summaries to DataFrame
summary_df = pd.DataFrame(summaries)

# Save to CSV
summary_df.to_csv('summarized_worknotes.csv', index=False)
print("Summarization complete. Check the summarized_worknotes.csv file.")

Explanation:

  1. Filtering Function: filter_comments function filters out sentences containing any of the unwanted phrases.

  2. Integration: The script integrates the filtering step before generating the summary, ensuring that unwanted lines are excluded from the summarized content.

Feel free to tweak the filtering and summarization parameters further to suit your needs. If there's anything more specific you need help with, I'm here to assist! 

No comments

Theme images by tjasam. Powered by Blogger.