Summarization with Transformers

 Certainly! Let's use the Transformers library by Hugging Face for text summarization. This approach provides flexibility and can yield high-quality summaries using state-of-the-art transformer models.

Here's the updated code using the transformers library:

Updated Code with Transformers

  1. Install Transformers if you haven’t already:

    bash
    pip install transformers
  2. Updated Python Code:

    python
    from transformers import pipeline import pandas as pd # Load the summarization pipeline using a BART model (or T5 model for smaller size) summarizer = pipeline("summarization", model="facebook/bart-large-cnn") # Function to generate summary def summarize_text(text): try: # Adjust max_length and min_length as needed for optimal summary summary = summarizer(text, max_length=50, min_length=10, do_sample=False)[0]['summary_text'] return summary except Exception as e: # Handle exceptions such as if the text is too short print(f"Error summarizing text: {e}") return text # Apply summarization to each individual 'RTSK Worknote' df['RTSK Worknote Summary'] = df['RTSK Worknote'].apply(summarize_text) # Group by 'RTSK Number' and concatenate the worknotes into one large text block per RTSK grouped_worknotes = df.groupby('RTSK Number')['RTSK Worknote'].apply(lambda x: ' '.join(x)) # Apply summarization to the grouped worknotes grouped_worknotes_summary = grouped_worknotes.apply(summarize_text) # If you want to add this back to the dataframe summary_df = pd.DataFrame(grouped_worknotes_summary, columns=['Grouped RTSK Worknote Summary']).reset_index() df = pd.merge(df, summary_df, on='RTSK Number', how='left')

Explanation of Key Changes:

  1. Using the pipeline object: This loads a pre-trained summarization model for summarizing text blocks.
  2. max_length and min_length Parameters: These control the output length of the summary.
  3. Error Handling: If the text is too short or an exception arises, the function returns the original text.

This modified code provides a flexible approach to generating summaries for each RTSK Worknote and grouped worknotes by RTSK Number. Let me know if you need further customization!

No comments

Theme images by tjasam. Powered by Blogger.