Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds compilation/evaluation support for UNION/INTERSECT/EXCEPT ALL/DISTINCT #1430

Merged
merged 12 commits into from
Apr 24, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Apr 12, 2024

Relevant Issues

  • None

Description

  • Adds compilation/evaluation support for UNION/INTERSECT/EXCEPT ALL/DISTINCT
  • I have used Substrait's modelling of Set Operations as a guide.

Specification and RFC Insight

  • This PR does NOT pass some of the conformance tests in regards to coercion of scalar values. I'm unsure of what to do here. According to the PartiQL Specification, Section 5.1.1, "Mistyping Cases":

In the following cases the expression in the FROM clause item has the wrong type. Under the type checking option, all of these cases raise an error and the query fails. Under the permissive option, the cases proceed as follows:

Iteration over a scalar value: Consider the query:
FROM s AS v AT p
or the query:
FROM s AS v

where s is a scalar value. Then s coerces into the bag << s >>, i.e., the bag that has a single element, the s. The
rest of the semantics is identical to what happens when the lhs of the FROM item is a bag.

  • The above essentially states that, in strict mode, the query will FAIL when the RHS of a FROM is a scalar. In permissive, it'll be coerced to a bag holding the scalar.
  • As I was implementing RFC-0007, I remembered the above scenario since I recently implemented the scalar coercion of FROM. I looked to RFC-0007 as a guide, however, the wording isn't clear when it comes to typing modes:

A scalar value is coerced into a singleton bag; a list is coerced to a bag by discarding ordering; NULL and MISSING are coerced into the empty bag.
...
... PartiQL has chosen to produce heterogeneous bags in permissive mode and homogenous bags or error in conventional typing mode.

  • For this PR, I am assuming the SAME rules of coercion apply for both the RHS of a FROM and the operands of the OUTER UNION.
  • ALSO, for naming of output bindings, from RFC-0007:

If the i-th column of T1 and T2 does not have the same name, then the name of i-th column of TR is implementation-dependent and unique among all other column names.

Examples

  • If you'd like some insight into the ops, feel free to play with this DB Fiddle Example.

Other Information

  • Updated Unreleased Section in CHANGELOG: NO
  • Any backward-incompatible changes? YES -- plan representation now more closely matches Substrait's model
  • Any new external dependencies? NO
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? YES

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@johnedquinn johnedquinn force-pushed the v1-union branch 2 times, most recently from 2e82c38 to b6b0dc7 Compare April 16, 2024 22:39
@johnedquinn johnedquinn marked this pull request as ready for review April 16, 2024 22:43
Comment on lines 269 to 285
type: [
UNION_ALL,
UNION_DISTINCT,
INTERSECT_ALL,
INTERSECT_DISTINCT,
EXCEPT_ALL,
EXCEPT_DISTINCT
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've gone back and forth for this modeling. I believe I did "set" initially, but PIG modeled as (union ...) (intersect ...) ... etc. hence them being their own operators.

I'm keen on keeping them separate mostly for staying similar to PartiQL 0.14, but this is entirely preference. However, I would like the set quantifier to be split.

   union::{
      setq: set_quantifier,
      lhs: rel,
      rhs: rel,
    },

    intersect::{
      setq: set_quantifier,
      lhs: rel,
      rhs: rel,
    },

    except::{
      setq: set_quantifier,
      lhs: rel,
      rhs: rel,
   }

Interestingly, substrait uses a list of inputs rather than lhs, rhs. Seems like an optimization

override fun visitRelOpExcept(node: Rel.Op.Except, ctx: Rel.Type?): Rel {
TODO("Type RelOp Except")
// Compute Output Schema
val size = max(lhs.type.schema.size, rhs.type.schema.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not depend on both the OUTER boolean and the operator type?

Like OUTER UNION size may not be computable with open tuples.

Comment on lines +264 to +269
set::[
union::{
quantifier: quantifier,
lhs: rel,
rhs: rel,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This seems like a good middle ground.

Copy link
Contributor

@RCHowell RCHowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change, otherwise let's get this in.

Comment on lines 18 to 20
override fun close() {
_next = null
next = null
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but it would be better to have the abstract method hold the close control so that inheritors don't need to call super.close(). In some languages you can do something like @mustCallSuper but we can do better with a concrete close that does

override fun close() {
     close1()
     next = null
}

abstract fun close1()

It's a nit, but I wouldn't want to track down a bug because an inheritor didn't call super. Perhaps we just add a comment to keep the change small.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was going to do that in this PR, but I didn't want the diff to affect implementers of RelPeeking. Seeing that you've reviewed the rest of the PR, I'll update. Thanks!

@johnedquinn johnedquinn merged commit d6e8471 into partiql:v1 Apr 24, 2024
7 checks passed
@johnedquinn johnedquinn deleted the v1-union branch April 24, 2024 18:45
alancai98 pushed a commit that referenced this pull request Jul 8, 2024
Adds compilation/evaluation support for UNION/INTERSECT/EXCEPT ALL/DISTINCT
alancai98 pushed a commit that referenced this pull request Jul 10, 2024
Adds compilation/evaluation support for UNION/INTERSECT/EXCEPT ALL/DISTINCT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants