HTML Parser Summarization

 Certainly! Below is a Python script that uses the BeautifulSoup library to parse the HTML comments and sumy for summarization. This script will format and summarize the content in a structured way similar to what I provided earlier.

Step-by-Step Script

  1. Install Necessary Libraries:

    • You can install the required libraries using pip.

      sh
      pip install beautifulsoup4
      pip install sumy
      
  2. Python Code:

    python
    from bs4 import BeautifulSoup
    from sumy.parsers.plaintext import PlaintextParser
    from sumy.nlp.tokenizers import Tokenizer
    from sumy.summarizers.lsa import LsaSummarizer
    
    html_content = '''
    <h2>Automation Results</h2><br />====== =================<h3>Parameters:</h3><br/>dbaas_realm: GBLPRD<br
    />db_version: 19c<br/>target_operational_env: PRD<br />==========================<br/><br/><h3>Status of DB Reservation job:
    </h3><br/><br/><p><table border="1"> <tbody> <tr> <th>Job Name</th><th>Status</th> <th>OEM Job URL (SSO)</th> </
    tr><tr><td>IPSOFT_SS_DBReservation_RITM0112482729_QK20Hzlok8</td><td style="color:#008000">SUCCEEDED
    </td><td><a href="https://dbaas-oem-prd.swissbank.com:7301/em/faces/core-jobs-
    procedure ExecutionTracking?execution GUID=22F1DA36F786B2C2E0630685380AC80A&instance GUID=22F1DA36F783B2C2E0630685380
    AC80A&showProcActLink=yes">Link</a></td></tr></tbody> </table> </p>Rsvname: PDECOM6Q.PRD.GBL.UBS.NET
    Tier: BRONZE+
    Primary Host 1: xldn30846por.ubsglobal-prod.msad.ubs.net
    Standby Host 1: xldn30821por.ubsglobal-prod.msad.ubs.net
    AutomationSuccessCode=![{"reservationName":"PDECOM6Q.PRD.GBL.UBS.NET", "primary_1":"xldn30846por.ubsglobal-
    prod.msad.ubs.net", "standby_a1":"xldn30821por.ubsglobal-
    prod.msad.ubs.net", "standby_a2":"n/a","primary_2":"n/a","standby_b1":"n/a","standby_b2":"n/a"}]!
    SUCCESS.
    Task currently being worked on by automation<br /><p style="margin-left: 40px"><table border="1"> <tbody> <tr> <th>Automation
    prod.ldn.swissbank.com/IPradar/update.htm?ticketID=172254162" target="_blank">172254162</a><br/></td> </ tr> <tr>
    <th>Execution</th> <td><a href="https://ipcenter-prod.ldn.swissbank.com/IPautomata/executionDetails.htm?executionID=136408170"
    D
    A
    A
    '''
    
    def clean_html(html):
        soup = BeautifulSoup(html, 'html.parser')
        return soup.get_text(separator=" ", strip=True)
    
    def summarize_text(text, num_sentences=3):
        parser = PlaintextParser.from_string(text, Tokenizer("english"))
        summarizer = LsaSummarizer()
        summary = summarizer(parser.document, num_sentences)
        return ' '.join(str(sentence) for sentence in summary)
    
    raw_text = clean_html(html_content)
    summary = summarize_text(raw_text)
    
    print("Summary:")
    print(summary)
    

Explanation:

  1. BeautifulSoup: This library is used to parse and clean the HTML content, extracting the text while removing HTML tags.

  2. Sumy: This library performs the summarization. The LsaSummarizer is used for summarizing the cleaned text.

  3. Integration: The script cleans the HTML content, summarizes the extracted text, and prints the summary.

Output:

When you run the script, it will produce a summarized version of the provided comments similar to the detailed summary I initially provided.



---------------------------------------------------------------------------------------------------------------------------

I understand you want to create a more structured and detailed summary, similar to the one I initially provided. To achieve this, we can manually parse the relevant sections and format them accordingly. Here’s how you can enhance your script to produce a summary with specific sections and formatted details:

Enhanced Script for Structured Summary

  1. Install Necessary Libraries:

    sh
    pip install beautifulsoup4
    
  2. Python Code:

    python
    from bs4 import BeautifulSoup
    
    html_content = '''
    <h2>Automation Results</h2><br />====== =================<h3>Parameters:</h3><br/>dbaas_realm: GBLPRD<br
    />db_version: 19c<br/>target_operational_env: PRD<br />==========================<br/><br/><h3>Status of DB Reservation job:
    </h3><br/><br/><p><table border="1"> <tbody> <tr> <th>Job Name</th><th>Status</th> <th>OEM Job URL (SSO)</th> </
    tr><tr><td>IPSOFT_SS_DBReservation_RITM0112482729_QK20Hzlok8</td><td style="color:#008000">SUCCEEDED
    </td><td><a href="https://dbaas-oem-prd.swissbank.com:7301/em/faces/core-jobs-
    procedure ExecutionTracking?execution GUID=22F1DA36F786B2C2E0630685380AC80A&instance GUID=22F1DA36F783B2C2E0630685380
    AC80A&showProcActLink=yes">Link</a></td></tr></tbody> </table> </p>Rsvname: PDECOM6Q.PRD.GBL.UBS.NET
    Tier: BRONZE+
    Primary Host 1: xldn30846por.ubsglobal-prod.msad.ubs.net
    Standby Host 1: xldn30821por.ubsglobal-prod.msad.ubs.net
    AutomationSuccessCode=![{"reservationName":"PDECOM6Q.PRD.GBL.UBS.NET", "primary_1":"xldn30846por.ubsglobal-
    prod.msad.ubs.net", "standby_a1":"xldn30821por.ubsglobal-
    prod.msad.ubs.net", "standby_a2":"n/a","primary_2":"n/a","standby_b1":"n/a","standby_b2":"n/a"}]!
    SUCCESS.
    Task currently being worked on by automation<br /><p style="margin-left: 40px"><table border="1"> <tbody> <tr> <th>Automation
    prod.ldn.swissbank.com/IPradar/update.htm?ticketID=172254162" target="_blank">172254162</a><br/></td> </ tr> <tr>
    <th>Execution</th> <td><a href="https://ipcenter-prod.ldn.swissbank.com/IPautomata/executionDetails.htm?executionID=136408170"
    D
    A
    A
    '''
    
    def clean_html(html):
        soup = BeautifulSoup(html, 'html.parser')
        return soup.get_text(separator=" ", strip=True)
    
    def format_summary(cleaned_text):
        sections = cleaned_text.split(" ")
        summary = ""
        
        # Parameters
        parameters_index = sections.index("Parameters:") + 1
        parameters = "Parameters:\n"
        while sections[parameters_index] != "Status":
            parameters += sections[parameters_index] + " "
            parameters_index += 1
        summary += parameters.strip() + "\n\n"
        
        # Status of DB Reservation job
        status_index = sections.index("Status") + 1
        status = "Status of DB Reservation job:\n"
        while sections[status_index] != "Rsvname:":
            status += sections[status_index] + " "
            status_index += 1
        summary += status.strip() + "\n\n"
        
        # Reservation Details
        reservation_index = sections.index("Rsvname:") + 1
        reservation = "Reservation Details:\n"
        while sections[reservation_index] != "AutomationSuccessCode":
            reservation += sections[reservation_index] + " "
            reservation_index += 1
        summary += reservation.strip() + "\n\n"
        
        # Automation Success Code
        success_code_index = sections.index("AutomationSuccessCode") + 1
        success_code = "Automation Success Code:\n"
        while sections[success_code_index] != "Task":
            success_code += sections[success_code_index] + " "
            success_code_index += 1
        summary += success_code.strip() + "\n\n"
        
        # Current Task
        task_index = sections.index("Task") + 1
        task = "Current Task:\n"
        while task_index < len(sections):
            task += sections[task_index] + " "
            task_index += 1
        summary += task.strip()
        
        return summary
    
    cleaned_text = clean_html(html_content)
    structured_summary = format_summary(cleaned_text)
    
    print("Structured Summary:\n")
    print(structured_summary)
    

Explanation:

  1. BeautifulSoup: Parses and cleans the HTML content.

  2. Format Summary Function: Manually processes and structures the text into specific sections such as Parameters, Status of DB Reservation job, Reservation Details, Automation Success Code, and Current Task.

  3. Structured Output: The script formats and prints a more human-readable summary similar to the one I provided.



No comments

Theme images by tjasam. Powered by Blogger.