Pattern Matching Using Python, in CSC250

Due Tuesday 11:59pm February 19, 2013 (new due date)

Shannon's new help session is 3-5pm Tuesday.
It should be noted that both 212 and 111 have help sessions starting at 6-7 pm Tuesdays. It is definitely ok to share the room - it may be noisy!

You are very much encouraged to work with one or two partners.

Refer to the Pattern Matching Lab held on February 13-15.
Answer the following questions. The whole thing should be stored in a file called relab.py. Include proper documentation at the top of the file, using python's comment character #.
Something like:
# Names: Judy Franklin, Jean-Luc Ponty, and Stanley Clarke
# Class: csc250
# Contents: functions and text answers for relab
# Date: February 20, 2012

import re
Don't forget to put in the import re statement to import the re functions. Use python function definitions to test your regular expressions. This is easier than retyping and editing on the python interpreter command line. You will submit this file electronically, by Friday February 18, class time, by typing
rsubmit relab relab.py

from your 250b-?? account on beowulf. Of course, make sure you have placed your file, relab.py in your class account by then.
  1. Question 1
    When we left the lab Friday, we used backreferencing to match two html tags (see this web page, http://www.regular-expressions.info/named.html). Write a more complex expression, using two backreferences to match two sets of html tags, one embedded in the other. Get this to produce a match on the string
    >>> as = r'<html><title> The spring 2011 foundations class</title></html>'
    as well as
    >>> as = r'<body><h3> The spring 2011 foundations class</H3></BODY>'
    
    Don't forget to turn off case sensitivity.
    Recall that for a single set of tags we used
    >>> match = re.search(r'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>', as, re.IGNORECASE)
    
    and typed both
    >>> print match.groups(0)
    or
    >>> print match.group(0)
    and
    >>> print match.group(1)
    
    to see the results. Do this in a function definition in python, in your file called relab.py.

  2. Question 2
    Type all of your answers to this part into the same file, relab.py. Start each line of text with python's comment symbol, #.
    For example:
    # \b is a word boundary
    # \d{1,3} indicates between 1 and 3 digits
    # etc.
    
    In the IP address example on the same examples web site, http://www.regular-expressions.info/examples.html, explain exactly how the three regular expressions work:
    1.
     \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
    2.
    \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
             (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
    (all on one line)

    3.
     \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
  3. Question 3
    Read http://www.regular-expressions.info/completelines.html, information on using regular expressions to find lines of text.
    We've already started looking at this, with our brief description of negative and positive lookahead in the lab (PatternLab.html). Write a regular expression that matches a complete line of text that contains all of the words
    "melody", "similarity", and "computer", in any order. Use the regular expression and examples within a function definition in your file relab.py.
    Describe how your regular expression works, in detail. Again
    # use python's comments to answer the text
    #   part of this homework.
    

    This web site discussion may be helpful

    Don't forget to submit by 11:59 p.m. Tuesday Feb 19:
    rsubmit relab relab.py