Ruby Strings

Strings

Enumerable – see all at Ruby Doc: String
  • .count => returns the number of characters
  • .include? ‘sub’ => returns true/false if sub found
  • .scan(/regexp/) => returns array of array with all occurrences found
  • .split (/pattern/)=> returns array of sub strings, with whitespaces
  • .sub(/regexp/, replacement s) => replaces first occurrence, returns string
  • .gsub => replaces all occurrences, returns string
  • .match (/pattern/, pos) . captures =>search from  pos returns matchData
    .char => converts into an array of characters
  • .slice(arg) => arg can be integer, range, regexp, string – returns sub string or nil
  • .slice!(arg) => returns deleted part
  • .capitalize, .downcase, .uppercase, .swapcase: manipulate case
  • .center (n), .ljuts, .rjust=> adds whitespace like padding
    .replace(scalc) =>replace a string with a string  already calculated
Examples
Slice a string to isolate the part of interest:
a = "hello hello my dear"
>> a.slice(-7,8)
=> "my dear"

Split a string using regular expressions patterns:

We are given a string with multiple records of interest. Let’s split each substring using a pattern.

s = “Mon 03:00-9:00 Tue 12:00-24:00 Mon 15:00-18:00″
regxp=”([A-Z]{1}[a-z]{2}\s\d{2}[:]\d*[-]\d*[:]\d*)”

s_split = s.split(/regxp/)

=>[“”, “Mon 03:00-9:00″, ” “, “Tue 12:00-24:00″, ” “, “Mon 15:00-18:00”]

s_split.delete(” “)
s_split.delete(“”)

=> [“Mon 03:00-9:00”, “Tue 12:00-24:00”, “Mon 15:00-18:00”]

 

Regular expression pattern:

  • uppercase letters (1) + lowercase letters (2) => [A-Z]{1}[a-z]{2}
  • space (1)=> \s
  • integers=>\d{2}
  • special char => [:]
  • integers => d*
  • special char => [-]

Tip: Test this pattern at rubular!


Cleaning Whitespaces
s="There is an extra space  here. A change of line as well. \nThird line here."
s.sub(/\s{2}/, ' ') #replace 2 spaces by one.
s.gsub(/\n/, '')

What counts as whitespace? see discussion in Ruby Cook Book


Extracting a substring between tags < and > (see: Stockoverflow)

s=”<ants> <pants>”

simplest regexp= <(\S+)>

< ( ) > =>capture chars between tags
+ => zero or one
\S  => any non whitespace character
      other regex working too = <([^>]*)>

[^>] => any character except closing tag
* zero or more of

.scan will return an array of arrays of all matches:
subs=s.scan(/<(\S+)>/)
=> [[“ants”], [“pants”]]
.match will return  the first match as a MatchData
match=s.match(/<(\S+)>/)
=> #<MatchData “<ants>” 1:”ants”>
.match will return  the 2 occurrences with this regexp
.match=s.match(/<(\S+)> <(\S+)>/)
#<MatchData “<ants> <pants>” 1:”ants” 2:”pants”>
.slice will return  first match  including its surrounding tags
s.slice(/<(\S+)>/)
=> “<ants>”
.split will returns array of matching strings including whitespaces
s.split(/<(\S+)>/)
=> [“”, “ants”, ” “, “pants”]
To retrieve “ants”:

subs[0][0]
subs.first.first
matchdata[1]
matchdata.captures[0]

To retrieve “pants”:
subs[0][1]

subs.last.first

match=s.match(/<(\S+)><(\S+)>/)
match.captures => [“ants”, “pants”]
match[1] =>”ants”

more about MatchData at geeks for geeks