I have dataset table with column names "RTSK Number", "RTSK Short Descr", "RTSK Worknote", "Automation Execution Status".
"RTSK Number" is a task number. "RTSK Short Descr" is the short description of the task or task type. "RTSK Worknote" has the comments from automation and some are manually written comments.
Below are three examples of "RTSK Short Descr"
1. IaaS Provisioning - Unix
2. IaaS Provisioning - Windows
3. IaaS QA & Configure - Windows
I have identified different buckets based on keywords present in "RTSK Worknote" for each task type. So I want to group all the "RTSK Worknote" comments in each bucket based on the keywords mentioned below. Further I want to count the total comments in each bucket and sort it in descending order.
For IaaS Provisioning - Unix
Keywords:
Bucket 1 - "operational_environment: TE2 / Test", "operational_environment: Production", "operational_environment: Development", "operational_environment: TE1 / UAT", "operational_environment: Disaster Recovery", "networkzone",
Bucket 2 - "IaaS API call failed"
Bucket 3- "Timed Out", "troubleshoot"
Bucket 4 - "IaaS VM creation failed", "VISM team""
Bucket 5 - "APAC DataCenter not one of: "
For IaaS Provisioning - Windows
Keywords:
Bucket 1- "Timed Out", "troubleshoot"
Bucket 2- "Mismatch in naming convention"
Bucket 3- "IaaS API call failed"
Bucket 4-"The following LOB not found", "APAC DataCenter not one of: "
For IaaS QA & Configure - Windows
Keywords:
Bucket 1-" has encountered issues", " unable to connect"
Bucket 2- "Automation details:tbd", "DiskInfo from Server"
Bucket 3 - "Automation details:Restricted Zone QA under development. Kindly perform the QA manually"
Help me with the solution in power BI using DAX queries (calculated columns and measure) and also with the python code like text summarization or any other which will give me desired result.
ChatGPT said:
To tackle this project in both Power BI (using DAX for calculated columns and measures) and Python, we’ll approach it as follows:
- Identify the appropriate bucket for each comment based on keywords in the "RTSK Worknote" column.
- Count and summarize the comments in each bucket, per task type.
- Sort the counts in descending order.
Let’s start with the Power BI solution, followed by Python.
Power BI Solution (DAX Queries)
Step 1: Create Calculated Columns for Buckets
Each bucket will be created as a calculated column in Power BI based on the keywords you provided.
- Go to the Modeling tab and select New Column.
- Create each bucket as a calculated column with DAX using nested
IF statements and SEARCH to check if any keywords match.
For example, for "IaaS Provisioning - Unix", create columns:
Repeat similar SWITCH logic for "IaaS Provisioning - Windows" and "IaaS QA & Configure - Windows" tasks based on their respective keywords and task descriptions.
Step 2: Create Measures to Count Comments in Each Bucket
Create DAX measures to count the comments in each bucket:
Go to the Modeling tab and select New Measure.
For example, to count comments in Bucket 1 for "IaaS Provisioning - Unix":
Repeat similar measures for each bucket for each task type.
Step 3: Create a Summary Table to Display and Sort
To display and sort the counts in descending order:
Go to Modeling > New Table and create a summary table with the counts:
Use Sort By Column to sort TotalCount in descending order.
This will give you a table that shows each task type, bucket, and the count of comments in descending order.
Python Solution (for Summarization and Counting)
If you want a more flexible solution that can handle text processing directly in Python, use the following approach:
- Tokenize and Match Buckets: Identify keywords and group by buckets.
- Count Occurrences: Summarize counts in each bucket.
- Summarize Comments by Bucket and Task Type: Generate a summary for each bucket.
Step 1: Define Keywords and Buckets
Define your keywords and task types as dictionaries to categorize comments.
Step 2: Apply Categorization
Create a function to classify each "RTSK Worknote" comment into buckets based on keywords.
Step 3: Summarize and Count by Bucket
Now, group and count the occurrences of each bucket per task type.
This will output a table showing each task type, bucket, and the count of comments, sorted by count in descending order.
Final Summary
- In Power BI: Create calculated columns for buckets using DAX and count comments in each bucket using measures. Display results in a summary table.
- In Python: Use keyword matching to categorize comments into buckets, then group and count each bucket for further analysis.
-----------------------------------------------------------------------------------------------------------
Here's how you can filter and display the result for a specific task type:
------------------------------------------------------------------------------------
Here's how to create the measure in DAX:
No comments