sumy summarization

 To apply summarization on grouped RTSK Worknote data for each unique RTSK Number, here’s how you could use the Sumy library to perform summarization after grouping by RTSK Number. In this example, we concatenate the worknotes for each RTSK Number into a single text block, apply summarization, and then store the result back in a summary column.

Step-by-Step Code

1. Install Sumy

First, install sumy if it’s not already installed:

python
!pip install sumy

2. Import Libraries and Define Summarization Function

python
from sumy.parsers.plaintext import PlaintextParser from sumy.nlp.tokenizers import Tokenizer from sumy.summarizers.lsa import LsaSummarizer import pandas as pd

3. Define the Summarization Function

This function uses Sumy’s LSA Summarizer (you can also choose other algorithms from Sumy) to summarize the concatenated worknotes for each RTSK Number.

python
def summarize_text(text, sentence_count=3): # Adjust the number of sentences as needed try: parser = PlaintextParser.from_string(text, Tokenizer("english")) summarizer = LsaSummarizer() summary = summarizer(parser.document, sentence_count) return " ".join(str(sentence) for sentence in summary) except ValueError: return text # Return original text if it's too short to summarize

4. Group by RTSK Number, Concatenate Worknotes, and Apply Summarization

python
# Concatenate the worknotes for each RTSK Number grouped_worknotes = df.groupby('RTSK Number')['RTSK Worknote'].apply(lambda x: ' '.join(x)).reset_index() # Apply summarization to the grouped worknotes grouped_worknotes['RTSK Worknote Summary'] = grouped_worknotes['RTSK Worknote'].apply(summarize_text) # Merge the summary back to the original dataframe, if needed df = pd.merge(df, grouped_worknotes[['RTSK Number', 'RTSK Worknote Summary']], on='RTSK Number', how='left')

Explanation of the Code

  1. Summarize Each Group: The code groups the data by RTSK Number, concatenates all RTSK Worknote values in each group, and applies summarize_text to generate a summary.
  2. Merge Summary with Original Data: If you want each row to have its RTSK Worknote Summary in the original dataset, merge it back using the RTSK Number.

This will add a new column, RTSK Worknote Summary, with summarized comments for each RTSK Number.

No comments

Theme images by tjasam. Powered by Blogger.