Summarization_transformers

 Absolutely! Let's use the transformers library from Hugging Face, which provides access to advanced transformer models for text summarization. We'll also use Sumy for additional summarization methods. Here's how you can do it:

Updated Script with Transformers and Sumy

python
import pandas as pd
from transformers import pipeline
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

# Load dataset
df = pd.read_csv('path_to_your_dataset.csv')

# Function to prioritize specific comments in the summary
def prioritize_comments(text, keyword="Automation Failed"):
    sentences = text.split('.')
    prioritized = [sentence for sentence in sentences if keyword in sentence]
    remaining = [sentence for sentence in sentences if keyword not in sentence]
    return '. '.join(prioritized + remaining)

# Function to filter out specific lines
def filter_comments(text, unwanted_phrases):
    sentences = text.split('.')
    filtered = [sentence for sentence in sentences if all(phrase not in sentence for phrase in unwanted_phrases)]
    return '. '.join(filtered)

# Initialize summarization pipeline from transformers
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Group by "RTSK Short Desc" and "Automation Execution Status"
grouped = df.groupby(['RTSK Short Desc', 'Automation Execution Status'])

unwanted_phrases = ["Integrator Record Response", "Function Name: UPDATE_SYSTEM"]

# Summarize worknotes for each group
summaries = []
for name, group in grouped:
    combined_notes = '. '.join(group['RTSK Worknote'].tolist())
    # Filter out specific lines
    filtered_notes = filter_comments(combined_notes, unwanted_phrases)
    # Generate summary using transformers
    summary = summarizer(filtered_notes, max_length=130, min_length=30, do_sample=False)[0]['summary_text']
    # Customize summary
    customized_summary = prioritize_comments(summary)
    summaries.append({
        'RTSK Short Desc': name[0],
        'Automation Execution Status': name[1],
        'Summary': customized_summary
    })

# Convert summaries to DataFrame
summary_df = pd.DataFrame(summaries)

# Save to CSV
summary_df.to_csv('summarized_worknotes.csv', index=False)
print("Summarization complete. Check the summarized_worknotes.csv file.")

Explanation:

  1. Transformers: The transformers library provides an advanced summarization model. Here, we use the BART model (facebook/bart-large-cnn).

  2. Prioritizing Comments: The function prioritize_comments brings the specified keyword to the top of the summary.

  3. Filtering Specific Lines: The filter_comments function removes sentences containing unwanted phrases before summarizing.

  4. Summarize and Customize: The script combines, filters, and summarizes the worknotes, then customizes the output based on the presence of specific comments.

This approach provides a more advanced and potentially more accurate summarization using transformer models. If you have any more specific requirements or need further assistance, feel free to let me know!

No comments

Theme images by tjasam. Powered by Blogger.