Python @programming.dev learnbyexample @programming.dev 3mo ago

Interactive TUI app with 100+ Python re(gex)? exercises

github.com TUI-apps/PyRegexExercises at main · learnbyexample/TUI-apps

Terminal User Interface (TUI) apps. Contribute to learnbyexample/TUI-apps development by creating an account on GitHub.

I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re and third-party regex module.

If you have pipx, use pipx install regexexercises to install the app. See the repo for source code and other details.

You're viewing a single thread.

3 comments

Thanks for sharing this. I took the time to read through the documentation of the re module. Here's my review of the functions.

Useful:

re.finditer returns an iterator over all Match objects

re.search returns the first Match object or None if there are no matches.

r'' use raw strings for patters so you don't have to worry about backslashes

the optional flags argument modifies the behaviour (case insensitive, multiline)

Utility:

re.sub replace each match in the string

re.split split a string by a regular expression

The Match object:

match.groups(0) returns the portion of text matched by the pattern

match.groups(1) returns the first capturing group

match.groups(2) returns the second capturing group, and so on

I don't understand why these exist:

re.match like search, but only matches at the beginning of the string. why not just use '^' or '\A' in the pattern you pass to 'search'?

re.fullmatch like 'search', but only if the full string matches. Why not just use '\A' and '\Z' in the pattern you pass to 'search'?

re.findall Returns all matches. It seems like a shitty version of 'finditer'. The function has three different return types which depend on the pattern you pattern you pass to the function. Who wants to work with that?
- I would argue that having distinct match and search helps readability. The difference between match('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) and search('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you have PHONE_NUMBER_REGEX defined somewhere. If you only had a method to "search" but not to "match", you would have to do something like search(f"\A{PHONE_NUMBER_REGEX}\Z", s), which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e. PHONE_NUMBER_REGEX and PHONE_NUMBER_FULLMATCH_REGEX). It is also a fairly common practice in other languages' regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].
  
  Regarding re.findall, I see what you mean, however I don't agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I've found these usages from my code, and I'm quite happy that this method was available[4]:
  
  digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)] [(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)
  
  [1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html
  
  [2] https://en.cppreference.com/w/cpp/regex
  
  [3] https://pkg.go.dev/regexp
  
  [4] https://github.com/search?q=repo%3Ahades%2Faoc23 findall&type=code
  
  Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.