|
It seems like a normal graph problem. Single hierarchical data are trees, each node have only one parent multiple hierarchical data can have more than one parents. Don't know if that is the technical term to refer to it.
My initial data structure: each node contain a list of it's direct parent nodes and direct child nodes.
There are a few operations. 1. Select all nodes that's are child/parent node of A & C | B... 2. Select all nodes that's are direct child/parent node of A & C | B... 3. Select all highest parent(that is not root)/lowest child of A & C | B... 4. Is a node child/parent node of A & C | B... 5. Insert a new node provides it's direct parents and direct child.
But even the simplest operation can take a long time with my algorithm. If one ask "is Z a child of Y and X" It will start by locate Z(a hash table pointing to each node, so finding a node is O(1)). P(x) is a function return a set of x's direct parent, where x can be a set of nodes. Do P(Z). If Y and X are in it, then return true. If not, it do P(P(Z)) until it reach the root(all element are the child of the root), which will return false, or find X and Y.
Insertion is pretty fast, O(n) where n is the number of parents+child.
But we can see that if a node has 2 direct parents, and each direct parents have 2 more direct parents... the time complexity is exponential.
So did some improvement, and the current data structure: Each node maintains list of ALL parents and ALL children and mark which ones are direct parents/children.
then it seems everything can be done much faster. Costing some more storage space though.
Does anyone else know better data structure for this?
Or is there already libraries does the exact same job out there?
|