Relational Algebra
- constrained algebra of set theory for querying relations [tuple calculus?] or [APL is to arrays as RA is to tuple sets]
- uses sets of tuples (rows) in a relation (table) which share attributes (columns) [relation is tuple set over attribute]
- Symbols:
- ∪ union
- − set difference
- × cartesian product
- σ selection (
WHERE) - π projection (
SELECT) - ρ rename (naming is important, after all lambda calculus is just about naming things)
- ∩ intersection
- ⋈ join
- γ for grouping
- ∪ − × σ π ρ are minimally relationally complete as
R ∩ S = R − (R − S)andR ⋈ S = σ cond (R × S). You can also doR ∩ S = R −(R − S)
SQL diverges from RA
- duplicate rows while relations are sets. Tables are multisets because removing duplicates is computationally expensive
- RA lacks order
- null/3-valued logic - RA is 2-valued logic and assumes a closed world (no unknowns)
- aggegation and group - Codd’s RA only works on wholes (but many use γ)
RA resembles APL
- declaratively compose bulk ops
- RA works on unordered structural records (like maps), APL uses ordered multi dimensional grids
- Tuple (row) resembles an immutable map. Relation is set/bag of these maps. (In first normal form) RA uses joins, not nested structures, which we can see as precomputed joins!
| RA | APL | FP | Note |
|---|---|---|---|
| σ | ⊃/ | filter | By predicate |
| π | indexing | map | Implicit in APL |
| ∪ | ∪ | concat | |
| × | ∘., or just , | mapcat | |
| ρ | ← | def or let | |
| ⋈ | ⌿ | mapcat then filter | |
| − | / or `~ | filter | By neg. predicate |
Also see Relational Thinking in J
Prolog emerges from RA
- state facts/relations and query
- Prolog can create new facts with rules or infer them from rules on facts
- Datalog is Prolog without funcs, so all querries resolve [are finite]
- Set theory defines set explicitly (enumerate or comprehension). RA defines set with operations on existing sets. Prolog intensionally defines sets with rules, describing the properties of members.
- extensional - list set’s actual members
- intensional - list rules members must satisfy by relationship to others
Functional Programming Pipelines/Higher Order Functions
- Different paradigm but same goals?
- Query plan is tree, steps/transducers are a pipeline
- Functional programming is operative via composition [you write how to transform, not describing what relation/answer you want. FP is like writing an execution plan, which the query planner does…] RA describes the result while FP describes transformations to reach the result (what the query planner does)
WHEREis like filter,SELECTlike map- “collection programming” without loops declaring operations on set or composing traversal patterns like map, called “whole-meal programming” like APL too. But HOF are sequential and nested unlike APL’s. [IDK how to phrase this yet]
- in RA reorderinng ops is semantically preserving (algebra has rewrite rules)
- APL is set at a time like RA and has global rewrite laws
- While FP pipelines work element by element, RA works on whole/arrays sets enabling algebraic rewrites and APL works on entire arrays (i.e. on whole relations)
Questions
- Is idiomatic APL declarative?
- Is it possible to write all programs via map/reduce/filter with ops/predicates inside? You do need mapcat
- Is the minimum
map,filter,mapcat,reduce,zipcolandcomb?
- Is the minimum
- To what extent is this true: FP pipelines are element by element, RA is row by row of elements, APL is table by table of rows of elements.
- Can we abstract them all into 1 paradigm, operating on different dimensions?
- Can we enrich any of the paradigms with operations from the others?
Look into
- Wholemeal programming - working with the entire datainstead of element sequence or one path
- Projective programming - solve a general problem, then transform solution into specific program
- https://www.rntz.net/datafun/
- LogicBlox and LogicQL “combine transactional, analytical, graph, probabilistic, and mathematical programming”
- MonetDB
- Dedalus: Datalog in Time and Space formalizes Datalog to include time, specifically to handle async behavior. https://www.youtube.com/watch?v=R2Aa4PivG0g or https://taylor.town/reactive-relational-algebra which independently found it?
- People mention differential data flow like Missionary:
- LINQ - RA in func host, with query trees vs. execution pipelines - discusses them together
- stream fusion ??? describes database algebra like whole-meal programming …fusion is vectorized execution?
- dataparallel/array sql