Pysummarization
Got it! Here’s how you can group by RTSK Number and create summaries for each RTSK Number in your dataframe df using the pysummarization library.
First, make sure you’ve got pandas and pysummarization installed:
pip install pandas pysummarization
Then, here’s a script to get you started:
import pandas as pd
from pysummarization.nlpbase.auto_abstractor import AutoAbstractor
from pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer
from pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor
# Your sample dataframe
data = {
'RTSK Number': ['001', '001', '002', '002'],
'RTSK Worknote': [
'Worknote 1 for RTSK 001.',
'Worknote 2 for RTSK 001.',
'Worknote 1 for RTSK 002.',
'Worknote 2 for RTSK 002.'
]
}
df = pd.DataFrame(data)
# Initialize summarization components
auto_abstractor = AutoAbstractor()
auto_abstractor.tokenizable_doc = SimpleTokenizer()
auto_abstractor.delimiter_list = [".", "\n"]
abstractable_doc = TopNRankAbstractor()
def summarize_comments(comments):
combined_comments = ' '.join(comments)
result_dict = auto_abstractor.summarize(combined_comments, abstractable_doc)
summary = ' '.join(result_dict["summarize_result"])
return summary
# Group by RTSK Number and summarize
summarized_df = df.groupby('RTSK Number')['RTSK Worknote'].apply(summarize_comments).reset_index()
summarized_df.columns = ['RTSK Number', 'Summary']
print(summarized_df)
This will give you a dataframe with RTSK Number and their corresponding Summary. This example joins the comments for each RTSK Number and summarizes them.
---------------------------------------------------------------------------------------------------------------
def summarize_comments(comments):
# Convert all comments to strings
comments = [str(comment) for comment in comments]
combined_comments = ' '.join(comments)
result_dict = auto_abstractor.summarize(combined_comments, abstractable_doc)
summary = ' '.join(result_dict["summarize_result"])
return summary
No comments