Pattern Matching Interfaces in Ruby

Pattern Matching Interfaces in Ruby

dev.to - Jan 24

Defining a standard for implementing pattern matching interfaces in Ruby, originally from the 2021 document "Pattern Matching Interfaces in Ruby"

Author: Brandon Weaver (@baweaver)

Last Updated: January 20, 2021

Status: RFC

Contributors:

Overview

What does this document intend to achieve?

Pattern Matching is a powerful new syntax in Ruby that allows us more flexibility in retrieving data from nested structures and more power in making assertions about the structure of that data.

This document intends to define a set of best practices for defining pattern matching interfaces in Ruby code.

As it is very easy to add methods to Ruby, but very hard to remove them, it would be in the best interest of the community to agree on a set of best practices in adding these methods to popular gems and standard library features.

Prerequisite Reading

If you are not familiar with Pattern Matching in Ruby, start here

Tools and Utilities

Pattern Matching tools and utilities to make working with the interface easier

  • Dio - Fakes pattern matching interfaces on any class via a wrapper
  • Matchable - Class-method porcelain methods for deconstruct and keys
  • Deconstructable - Simpler version of Matchable, same idea behind it

Pattern Matching Interfaces

Proposed common interfaces and considerations for pattern matching interfaces

While there are some significant commonalities and ideas for common pattern matching interfaces there can be a lot of variance and nuance depending on the class that the interfaces are being applied to. This section seeks to explore some of these concerns.

Array-Like Structures

Matching against Array-like structures

Implementations

Array-like Types and Tuples

The deconstruct method is the established interface for matching against Array-like structures that respond to methods like to_a,to_ary, each, or other collection-like methods. It allows for the following syntax:

class Cons
  attr_reader :value, :children

  def initialize(value, *children)
    @value = value
    @children = children
  end

  def self.[](...) = new(...)

  def to_a() = [@value, @children]

  alias_method :deconstruct, :to_a
end

list = Cons[1, Cons[2], Cons[3, Cons[4]]]
list in [1, children]
# => true
Enter fullscreen mode Exit fullscreen mode

Given the overlap with common array coercion methods it is recommended to default to to_a for implementations of deconstruct.

Alternate Types

There are cases which exist, such as s-expressions, where this recommendation does not necessarily hold true. Marc-Andre had noted this in a pull request to the Whitequark AST gem, using the following implementation instead:

def deconstruct() = [type, *children]
Enter fullscreen mode Exit fullscreen mode

In this representation the structure is flattened to allow for easier access to the child nodes, which could also be applied to our above Cons class.

Paralleling Constructors

Official documentation also mentions the existence of the class-match syntax:

case person
in Person[/^B/, 20..]
  true
else
  false
end
Enter fullscreen mode Exit fullscreen mode

This makes a case for implementing deconstruct in terms of object constructors properties:

class Person
  def initialize(name, age)
    @name = name
    @age  = age
  end

  def deconstruct() = [@name, @age]
end
Enter fullscreen mode Exit fullscreen mode

But it should be noted that this syntax may be confusing for constructors with an arity greater than 3 unless the order is given special value like in the case of a 3D Point (x, y, z)

Considerations

When Order is Important

Array-like entities rely on order for potential matches meaning that deconstructed entities may require sorting before being matched against.

This is a form of connascence in which the data is coupled to its order, meaning that array-like matches do not make sense unless order is irrelevant or the data can be safely sorted before comparison.

Array-like data types are recommended for this type of matching to be effective. Notable exceptions to this rule would be Tuple-like types and paralleling class constructors with positional arguments.

When Order is Not Important

Arrays themselves do not require consistent order, and as such can be very expensive to match against with find patterns. Larger arrays compound this problem.

Hash-Like Structures

Matching against Hash-like structures

While array-like matches have minimal potential variance, hash-like matches introduce substantial possibilities, and with them substantial considerations that must be accounted for in defining what an interface should look like.

The deconstruct_keys method for pattern matching against a hash-like structure also introduces arguments into the mix, further complicating potential solutions.

⚠️ Warning: Pattern matching with no arguments or
**rest supplied will have nil passed for the keys
argument. In this case all keys should be returned.
Enter fullscreen mode Exit fullscreen mode

Implementations

Hash-like Type

If a type can be coerced to a Hash through either to_h or to_hash it can use this as a method to match against:

class Person
  attr_reader :name, :age

  def initialize(name, age)
    @name = name
    @age = age
  end

  def to_h() = { name: @name, age: @age }
end
Enter fullscreen mode Exit fullscreen mode

This provides a viable hook to add the deconstruct_keys method as we already have a hash-like interface exposed for the class.

The simple solution would be to use alias to point deconstruct_keys at to_h, but this would not account for the keys passed in. Instead, we should extract the keys from the to_h method result:

class Person
  def deconstruct_keys(keys) = keys ? to_h.slice(*keys) : to_h
end
Enter fullscreen mode Exit fullscreen mode
ℹ️ Note: There is a check for the presence of
keys here, as keys can be nil. Passing
**rest or no arguments to a match will result
in keys being nil, causing potential issues.
The intended behavior here should be to return
all possible keys rather than nothing.
Enter fullscreen mode Exit fullscreen mode

As the hash-like interface already exposes all properties at once we can potentially avoid some of the performance considerations mentioned below in loading expensive methods.

Public Interfaces

The next implementation would rely on styling deconstruct_keys after the public interface of a class. This could be defined as the attr_* methods, or potentially all public zero-arity (or n-arity with defaults) methods defined on the class itself:

class Person
  attr_reader :name, :age

  def initialize(name, age)
    @name = name
    @age = age
  end

  def adult?() = @age >= 18
  def me?()    = @name == 'Brandon'

  def deconstruct_keys(keys)
    deconstruction = {}

    deconstruction[:name]   = @name  if keys.nil? || keys.include?(:name)
    deconstruction[:age]    = @age   if keys.nil? || keys.include?(:age)
    deconstruction[:adult?] = adult? if keys.nil? || keys.include?(:adult?)
    deconstruction[:me?]    = me?    if keys.nil? || keys.include?(:me?)

    deconstruction
  end
end
Enter fullscreen mode Exit fullscreen mode

As with the above case, if keys is nil we want to return all possible keys, making this interface potentially very cumbersome.

The below Constant Guard may help to alleviate some of this while also being more explicit.

Instance Variables

The next potential is to rely on instance variables to define the interface:

def deconstruct_keys(keys)
  ivars = instance_variables.to_h { [_1.to_s.delete('@').to_sym, _1] }
  valid_keys = keys.intersection(ivars.keys)

  valid_keys.to_h { [_1, instance_variable_get(ivars[_1])] }
end
Enter fullscreen mode Exit fullscreen mode

This may, however, expose parts of a classes API which may not make sense.

Constant Guard

Using a constant to define valid keys to guard against loading the entirety of the deconstructable keys is recommended when working with very large APIs with many keys:

class Time
  VALID_KEYS = %i(year month day wday)

  def deconstruct_keys(keys)
    # If `keys` is `nil` return back all valid keys
    valid_keys = keys ? VALID_KEYS & keys : VALID_KEYS
    valid_keys.to_h { [_1, public_send(_1)] }
  end
end
Enter fullscreen mode Exit fullscreen mode

This prevents loading keys that are not required by the match, but requires each key to be tied to a method. In the case of an instance variable with a corresponding attr_reader or attr_accessor method this is a non-issue, but should be considered.

In the case of most applications the performance implication of using public_send should not be enough to cause concern, but there are tricks to dynamically compile against this using eval in case performance becomes an issue.

Manual

The most likely correct case for more complicated classes is to manually decide between a combination of the above three. While the public interface of a class may be the best base, some methods may not make sense for pattern matching due to their arity, requiring more discretion.

Considerations

Keys can be nil

The deconstruct_keys method can receive keys as a nil rather than an Array. This is done in cases of no arguments being passed or **rest being passed, and should result in all potential keys being returned rather than no keys.

Expensive Methods

As keys are passed into the deconstruct_keys method it gives a unique chance to not calculate unnecessary and potentially expensive keys. Consider the retrieval of a body from an HTTP request that has not been realized. Calculating this on each match would be expensive, and should be avoided unless the key is explicitly required.

Arity

As mentioned above, zero-arity of n-arity with defaulted arguments is required to add a method as a potential value to match against.

Boolean / Predicate Methods

Boolean methods introduce an interesting conundrum. They are within the public interface and have the correct arity, however do we want to allow matching against them? I would say yes as this could be a very useful signal to capture:

http_response in { ok?: true, body?: true, text?: true }
Enter fullscreen mode Exit fullscreen mode

Though we could also match directly against the underlying attributes as well.

String Keys

There are some cases in which a hash with String keys is returned, introducing another conundrum: Do we conflate Symbol and String keys to be able to match against this? Rely on things like symbolize_names or with_indifferent_access?

This will require much thought, as the act of letting Rubyists see these as interchangeable in a more official capacity will lead to problems down the road with confusion.

Missing Keys and Key Validation

There are current implementations in which missing keys will result in no match, but there could be an argument for raising an exception on unknown keys for matching. Some cases of matching will use constants for known keys, which can be useful for finding expected versus unexpected values.

Testing in Older Versions

How to test Pattern Matching in gems and applications below Ruby 2.7

Pattern Matching introduces a new syntax to Ruby, and with it a break in syntax for all previous versions. This means that testing it in any gem or application that can still plausibly run on earlier versions of Ruby will cause issues.

Eval Guards

Using eval and rescue to avoid syntax errors

The eval method can be used to avoid the parser finding syntax errors in code. It is recommended to also rescue SyntaxError afterwards to avoid the need for version locks which can be error-prone. Consider:

begin
  instance_eval <<~RUBY, __FILE__, __LINE__ + 1
    response in [200, *]
  RUBY
rescue SyntaxError
  # Not Ruby 2.7+
end
Enter fullscreen mode Exit fullscreen mode

While not the most elegant code it serves the purpose of preventing execution in older Ruby versions from crashing. It is also the least invasive method to a gem or application's testing structure.

Versioned Requires

Loading version specific testing files to prevent syntax errors

The more invasive technique would be to add all pattern matching tests to a separate file that is locked behind version checks and not loaded unless the current Ruby version is MRI-equivalent of 2.7+.

Proposed Best Practices

Best Practices in implementing pattern matching interfaces

As there is a significant amount of variance in Ruby code and how it is written, this document also proposes a set of best practices addressing both array and hash-like pattern matches as separate entities.

General Best Practices

Best Practices for matching against any type

Least Needed Power

Avoid using items like expressions when another check can be used first. Consider first using the Ruby concept with the least power that still solves the same problem.

Nest Patterns Sparingly

While you can indeed nest pattern matches several layers deep, do so sparingly as it decreases the readability of code very quickly after more than 3 layers. If nesting cannot be avoided, use multiple lines to express the pattern:

data = { a: { b: 1, c: { d: { e: 6, f: 7} } } }

# Bad
data => { a: { b: Integer, c: { d: _ => last_node } } }
# => true

# Good
data => {
  a: {
    b: Integer,
    c: {
      d: _ => last_node
    }
  }
}
# => true
Enter fullscreen mode Exit fullscreen mode

Avoid Mutation

Pattern matching should not mutate the data under the match via method, as this will lead to very confusing results. As such bang! methods should be avoided in interfaces as well as ones that mutate the underlying data in any manner:

class Person
  def birthday!
    @age += 1
  end

  # AVOID - Do not add `birthday!` as it causes mutation
  def to_h() = { name: @name, age: @age, birthday!: birthday! }

  def deconstruct_keys(keys) = keys ? to_h.slice(*keys) : to_h
end
Enter fullscreen mode Exit fullscreen mode

Avoid Shadowing Variables

Shadowing an outer variable with a pattern match will result in it being overwritten, leading to potentially confusing results:

v = 1        # => 1
h = { v: 3 } # => {:v=>3}
h in { v: }  # => true
v            # => 3
Enter fullscreen mode Exit fullscreen mode

Name Captures Well

Captured variables should be named well to reflect their intention. Pattern matching syntax may be difficult to read with single-letter capture names or abbreviations, which will lead to greater confusion by those reading the code.

This is especially true in the case of right-hand assignment using the rocket operator:

Card = Struct.new(:suit, :rank)

# Bad
Card['S', 'A'] => Card['S' => s, r]

# Good
Card['S', 'A'] => Card['S' => suit, rank]
Enter fullscreen mode Exit fullscreen mode

Prefer Underscore to Asterisk

Underscores (_) serve the purpose of matching any one value. While asterisk (*) can do this, it constitutes a find pattern and is substantially more expensive. Only use it if you intend to have a relative position you're matching against:

# Bad
Card['S', 'A'] in Card[*, r]

# Good
Card['S', 'A'] in Card[_, r]
Enter fullscreen mode Exit fullscreen mode

Prefer One-Line Matches for Boolean Queries

If your match is meant to return true or false based on a single branch consider using a one-line pattern match instead:

data = { a: 1, b: 2, c: 3 }

# Bad
case data
in a: 1..3, b: 2..5
  true
else
  false
end

# Good
data in { a: 1..3, b: 2..5 }

# Good
case data
in a: 1..3, b: 2..5
  true
in a: 1..3, c:, 3..10
  true
else
  false
end

# Bad

return true if data in { a: 1..3, b: 2..5 }
return true if data in { a: 1..3, c:, 3..10 }

false
Enter fullscreen mode Exit fullscreen mode

Whitespace is Free - Use It

Avoid dense pattern match stanzas that push the limits of how much code you can put on one line. Prefer to break up patterns, especially nested ones, into multiple lines:

# Bad
response in { status:  200, body: /Hello/, version: /^1\.\d/, headers: [] }

# Good
response in {
  status:  200,
  body:    /Hello/,
  version: /^1\.\d/,
  headers: []
}
Enter fullscreen mode Exit fullscreen mode

Array-Like Best Practices

Best Practices for matching against Array-like data

Avoid Types Without Array Interfaces

Array-like pattern matching should not be added to classes which cannot be cleanly represented as an array first. Consider using Hash-like matching instead in these cases. There is one exception, the next item.

Constructor Parallels Work for Non-Array-like Classes

The exception to the above rule is that paralleling class constructors is a valid use for deconstruction, as the following is an array-like match:

class Person
  def initialize(name, age)
    @name = name
    @age  = age
  end

  def deconstruct() = [@name, @age]
end

case Person.new("Brandon", 30)
in Person[/^B/, 20..]
  true
else
  false
end
Enter fullscreen mode Exit fullscreen mode

It should be noted that constructors with several positional arguments are not recommended, either for pattern matching or in general, as positionality of arguments with no relevance to position are confusing.

Be Conscious of Order Dependency

Be conscious and aware of any order dependencies in your matches to reduce the number of necessary matches to work with the data under match. Consider sorting before matching, or on insertion or creation of the instance:

class Hand
  def deconstruct() = @cards.sort
end
Enter fullscreen mode Exit fullscreen mode

Use Find-Pattern Sparingly

While find-pattern is indeed powerful, consider the above point on order dependency and whether doing so may solve the same problem. While it is powerful it comes at a speed cost. It should be noted that this is the find pattern:

value in [*, match_we_want, *]
Enter fullscreen mode Exit fullscreen mode

...and this is not:

value in [match_we_want, *]
Enter fullscreen mode Exit fullscreen mode

Hash-Like Best Practices

Best Practices for matching against Hash-like data

Use Constants for Valid Keys

Constants can help to clarify which keys are valid to be passed to the pattern matching method deconstruct_keys:

class Person
  attr_reader :name, :age

  VALID_KEYS = %i(name age)

  def initialize(name, age)
    @name = name
    @age = age
  end

  def to_h() = { name: @name, age: @age }

  def deconstruct_keys(keys)
    # If `keys` is nil, return all valid keys here, otherwise intersect
    # provided keys with valid ones
    valid_keys = keys ? VALID_KEYS & keys : VALID_KEYS

    to_h.slice(*valid_keys)
  end
end
Enter fullscreen mode Exit fullscreen mode

Return All Possible Keys when keys is nil

keys can be nil in the case of no arguments being passed as well as a **rest style argument being present. In both cases all possible keys should be returned to the interface rather than raising an error or returning nothing:

def deconstruct_keys(keys) = keys ? to_h.slice(*keys) : to_h
Enter fullscreen mode Exit fullscreen mode

Maintain Public Interfaces

Pattern matching should not violate or break encapsulation for the sake of matching against data.

def deconstruct_keys(keys)
  return {} unless keys

  # Avoid `send`, as it breaks public interface
  keys.to_h { [_1, send(_1)] }
end
Enter fullscreen mode Exit fullscreen mode

Add Slowly, Avoid Removal

More potential keys can always be added, but removing them will become very difficult once someone uses them in their code. Consider slowly adding the most obvious keys to match against until a reasonable case has been made to add more.

Minimize Exposed Key Hooks

As with the above item about adding slowly, avoid over encumbering your pattern matching interface with every possible key, instead preferring a minimal subset needed to be effective to match against that are well tested and documented.

Be Cautious of Expensive Keys

Evaluation of some requested key values may be expensive, such as an unevaluated HTTP response body, and as such should not be generated unless that key is specifically requested.

Use Expressions Sparingly

While expressions are powerful and open up substantial flexibility in pattern matching they should be used infrequently as they can slow down matches substantially. This is especially true when done in tight loops.

Avoid Complicated Expressions

Expressions should be kept short and concise. If you've added more than a few method calls the expression should be broken out into a variable or a method instead to minimize the complexity of the match.

Be Explicit About Keys

As much as possible be explicit about what keys are allowed to match against for documentation and validation reasons of the match itself. Implicit keys are hard to quantify, cache, and error-check against. They can also lead to confusing behavior.

Right-Hand Assign to Clarify Names

When a Hash key does not properly reflect the intention of a capture, rename it using the rocket (=>) to clarify the intended purpose of the capture in relation to the key name:

case response
in status: 400.., body: => error
  Failure(error)
else
  Failure('Unknown')
end
Enter fullscreen mode Exit fullscreen mode

One-Liner Best Practices

Best Practices for One-Line Pattern Matching

Rocket for Assignment, in for Truthiness

While in will expose captures from a pattern match it should only be used for boolean expressions. Prefer the rocket (=>) for assignments in which you need to access these values later:

data = { a: 1, b: 2, c: 3 }

# Bad
data in { a: 1..3 => a, b: 2..10 => b }
[a, b]

# Good
data => { a: 1..3 => a, b: 2..10 => b }
[a, b]
Enter fullscreen mode Exit fullscreen mode

Whitespace is Free

Despite the one-liner name these cases can be broken into multiple lines, and should be preferred for readability. The name may be better served as "one-branch" pattern match instead to reflect this:

# Bad
response in { status:  200, body: /Hello/, version: /^1\.\d/, headers: [] }

# Good
response in {
  status:  200,
  body:    /Hello/,
  version: /^1\.\d/,
  headers: []
}
Enter fullscreen mode Exit fullscreen mode

Names Still Matter

Just because this is a one-liner you should still consider naming things appropriately and expressing intent clearly.

Implementations of Interface

What would these interfaces look like applied to popular gems and STDLIB?

More examples will be added over the next few days to demonstrate the potential of this interface. Currently planned gems and techniques to evaluate are:

  • ActiveRecord
  • ActionController Params
  • ActiveSupport Duration
  • Nokogiri / Oga
  • GraphQL
  • Protobuf
  • Date / Time / DateTime
  • Ripper / RubyVM::AbstractSyntaxTree
  • Set
  • CSV
  • TracePoint
  • File / Dir / IO
  • Net::HTTP and other HTTP clients / servers

Pattern matching interfaces can be simulated using gems such as Dio for testing purposes. It does not act as a replacement, but rather a stand-in to test interfaces without full implementations.

HTTP.rb

Popular HTTP client

I had submitted a PR against this repo, but I believe the two most interesting types to match against are responses and requests:

# Response

def to_h
  {
    version:       @version,
    request:       @request,
    status:        @status,
    headers:       @headers,
    proxy_headers: @proxy_headers,
    body:          @body
  }
end

def deconstruct_keys(keys) = to_h.slice(*keys)

response in {
  status:  200,
  body:    /Hello/,
  version: /^1\.\d/
}
Enter fullscreen mode Exit fullscreen mode

They fit the spirit of pattern matching, especially from an Elixir point of view.

Rack

Standard Ruby Server

Much like HTTP.rb I believe classes such as response make ideal candidates for matching:

def to_h
  { status: @status, body: @body }
end

# Hash Pattern Matching interface:
#
#     case response
#     in status: 200..299, body:
#       Success(body)
#     in status: 400.., body: => error
#       Failure(error)
#     else
#       Failure('Unhandled code')
#     end
def deconstruct_keys(keys)
  to_h.slice(*keys)
end
Enter fullscreen mode Exit fullscreen mode

Regexp MatchData

Core Ruby Regular Expressions Implementation

Note: This was merged into Ruby: https://github.com/ruby/ruby/pull/6216

MatchData from regular expressions, especially those with named captures, have an interface very similar to both arrays and hashes, making it an ideal type to target:

class MatchData
  alias_method :deconstruct, :to_a

  def deconstruct_keys(keys)
    named_captures.transform_keys(&:to_sym).slice(*keys)
  end
end

IP_REGEX = /
  (?<first_octet>\d{1,3})\.
  (?<second_octet>\d{1,3})\.
  (?<third_octet>\d{1,3})\.
  (?<fourth_octet>\d{1,3})
/x

'192.168.1.1'.match(IP_REGEX) in {
  first_octet: '198',
  fourth_octet: '1'
}
# => true
Enter fullscreen mode Exit fullscreen mode

Allowing for skipping intermediate variables in checking capture groups for relevant data.

OpenStruct

Quick Struct-like class initializer

OpenStruct has a very similar interface to Struct, meaning that it can implement a similar pattern matching interface:

class OpenStruct
  def deconstruct_keys(keys) = keys ? to_h.slice(*keys) : to_h
end

me = OpenStruct.new(name: 'Brandon', age: 30)
me in { name: /^B/ }
# => true
Enter fullscreen mode Exit fullscreen mode

Matrix

Ruby implementation of mathematical matrices

Matrix is an inherently array-like interface, making it an ideal candidate for aliasing to_a:

class Matrix
  alias_method :deconstruct, :to_a
end
# => :deconstruct

Matrix[[25, 93], [-1, 66]] in [[20..30, _], [..0, _]]
# => true
Enter fullscreen mode Exit fullscreen mode

This allows for very powerful queries against matrices.

Final Thoughts

As mentioned above, this is a (mostly 1-1) transcription of Pattern Matching Interfaces in Ruby from 2021, but the content is still valuable today for those wanting to learn new pattern matching ideas, and find ways to implement interfaces in their own programs.

Some of these things have even been merged into the language since the doc, such as regexp, and some may still be at a later date.

MORE ARTICLES