Summarization_transformers
Absolutely! Let's use the transformers library from Hugging Face, which provides access to advanced transformer models for text summarization. We'll also use Sumy for additional summarization methods. Here's how you can do it:
Updated Script with Transformers and Sumy
import pandas as pd
from transformers import pipeline
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
# Load dataset
df = pd.read_csv('path_to_your_dataset.csv')
# Function to prioritize specific comments in the summary
def prioritize_comments(text, keyword="Automation Failed"):
sentences = text.split('.')
prioritized = [sentence for sentence in sentences if keyword in sentence]
remaining = [sentence for sentence in sentences if keyword not in sentence]
return '. '.join(prioritized + remaining)
# Function to filter out specific lines
def filter_comments(text, unwanted_phrases):
sentences = text.split('.')
filtered = [sentence for sentence in sentences if all(phrase not in sentence for phrase in unwanted_phrases)]
return '. '.join(filtered)
# Initialize summarization pipeline from transformers
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# Group by "RTSK Short Desc" and "Automation Execution Status"
grouped = df.groupby(['RTSK Short Desc', 'Automation Execution Status'])
unwanted_phrases = ["Integrator Record Response", "Function Name: UPDATE_SYSTEM"]
# Summarize worknotes for each group
summaries = []
for name, group in grouped:
combined_notes = '. '.join(group['RTSK Worknote'].tolist())
# Filter out specific lines
filtered_notes = filter_comments(combined_notes, unwanted_phrases)
# Generate summary using transformers
summary = summarizer(filtered_notes, max_length=130, min_length=30, do_sample=False)[0]['summary_text']
# Customize summary
customized_summary = prioritize_comments(summary)
summaries.append({
'RTSK Short Desc': name[0],
'Automation Execution Status': name[1],
'Summary': customized_summary
})
# Convert summaries to DataFrame
summary_df = pd.DataFrame(summaries)
# Save to CSV
summary_df.to_csv('summarized_worknotes.csv', index=False)
print("Summarization complete. Check the summarized_worknotes.csv file.")
Explanation:
Transformers: The
transformerslibrary provides an advanced summarization model. Here, we use the BART model (facebook/bart-large-cnn).Prioritizing Comments: The function
prioritize_commentsbrings the specified keyword to the top of the summary.Filtering Specific Lines: The
filter_commentsfunction removes sentences containing unwanted phrases before summarizing.Summarize and Customize: The script combines, filters, and summarizes the worknotes, then customizes the output based on the presence of specific comments.
This approach provides a more advanced and potentially more accurate summarization using transformer models. If you have any more specific requirements or need further assistance, feel free to let me know!
No comments