这是一个段的向量
class Segment
{
public:
size_t left;
size_t right;
char ID;
Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left<right);}
};
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}}
向量a
基于属性left
进行排序。 可以将该向量的内容绘制为
------- 'A'
--------------- 'B'
--- 'C'
--- 'D'
------ 'E'
---- 'F'
--- 'G'
--- 'H'
--- 'I'
------- 'J'
我想创建对象b
,它包含来自a
中那些大段的不重叠的小段。 要进入b
,我们必须收缩彼此重叠的段,并为所有重叠的地方创建一个ID为x
的新段。 矢量b
也需要基于left
属性进行排序。
对于上面的示例,预期的输出是
std::vector<Segment> B = {{3, 7, 'A'}, {7, 10, `X`}, {10, 14, 'B'}, {14, 19, 'X'}, {19, 22, 'B'} , {25, 28, 'E'}, {28, 31, 'X'}, {31, 32, 'F'}, {34, 37, 'X'}, {46, 49, 'I'}, {52, 59, 'J'}}
---- 'A'
--- 'X' (overlap between 'A' and 'B')
---- 'B'
-------- 'X' (overlap between 'B', 'C' and 'D')
--- 'B' -> Note that 'B' is now split in two
--- 'E'
--- 'X' (overlap between 'E' and 'F')
- 'F'
--- 'X' (overlap between 'G' and 'H')
--- 'I'
------- 'J'
有人能帮我一把吗?
注意,与上面的示例不同,a
中的两个段实际上可以具有相同的ID(无论它们是否重叠)。 还要注意,a
不是const
,可以在操作过程中修改。 出于性能考虑,请注意向量a
通常相对较短; 长度在1到大约100(或几百)段之间。 值left
和right
通常非常大(在0到大约1e9),并且这些段中只有很少部分会相交。 通常,当有很少的段时,那些段会相当宽(当size为1时,单个段的宽度通常约为1e9)。 最后,您可以用
void print(std::vector<Segment>& v)
{
for (auto& elem : v)
{
std::cout << "{" << elem.left << ", " << elem.right << ", " << elem.ID << "} ";
}
std::cout << "\n";
for (auto& elem : v)
{
for (size_t i = 0 ; i < elem.left ; ++i)
std::cout << " ";
for (size_t i = 0 ; i < elem.right - elem.left ; ++i)
std::cout << "-";
std::cout << " " << elem.ID << "\n";
}
std::cout << "\n\n\n";
}
尝试
为了显示努力,这里有一个尝试,1)是错误的,2)将是一个相对缓慢的实现。让我们调用“断点”,在下面的b
向量中的任何左边的右边。其思想是通过系统地在前面和后面的段中搜索潜在的下一个断点,从一个断点跳到下一个断点。在这样做的时候,它应该跟踪应该在新的段中给出什么ID(如果断点之间的距离与a
中的至少一个段匹配)。
std::vector<Segment> foo(std::vector<Segment>& A)
{
if (A.size() <= 1) return A;
std::vector<Segment> B;
B.reserve(A.size());
size_t A_index = 0;
size_t currentPos = A[A_index].left;
while ( A_index < A.size())
{
auto nextPos = A[A_index].right;
//std::cout << "currentPos = " << currentPos << "\n";
//std::cout << "nextPos before search = " << nextPos << "\n";
bool isIntersection = false;
// Search in preceding Segments
for (size_t i = A_index - 1 ; i < A.size() ; --i)
{
if (A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found " << nextPos << " in preceding segment\n";
}
}
// Search in following Segments
for (size_t i = A_index+1 ; i < A.size() ; ++i)
{
if ( A[i].left > currentPos && A[i].left < nextPos)
{
nextPos = A[i].left;
//std::cout << "Found left of " << nextPos << " in following segment\n";
break;
}
if ( A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found right of " << nextPos << " in following segment\n";
break;
}
}
// create new Segment
if (!isIntersection)
{
B.push_back({currentPos, nextPos, A[A_index].ID});
} else
{
B.push_back({currentPos, nextPos, 'X'});
}
if (nextPos == A[A_index].right)
{
++A_index;
nextPos = A[A_index].left;
}
currentPos = nextPos;
}
return B;
}
int main()
{
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
print(A);
auto B = foo(A);
print(B);
}
下面的并不是最有效率的,但它产生了预期的产出。 战略如下:
为了方便起见,我使用了
struct Overlap {
std::vector<char> IDs;
Overlap() {}
void add(char id) {IDs.push_back(id);}
};
最后一步的实现需要删除构造函数中的left<;right
要求。
int main() {
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
// dissect
std::map<size_t,Overlap> over;
for (const auto& s : A) {
for (size_t i = s.left; i < s.right; ++i) {
over[i].add(s.ID);
}
}
// assign new segments
std::map<size_t,char> pieces;
for (const auto& o : over) {
if (o.second.IDs.size() == 1) {
pieces[o.first] = o.second.IDs.front();
} else {
pieces[o.first] = 'X';
}
}
// glue them
std::vector<Segment> result;
auto it = pieces.begin();
Segment current(it->first,it->first,it->second); // here left==right !
++it;
for ( ; it != pieces.end(); ++it) {
if (it->second == current.ID) continue;
current.right = it->first -1;
result.push_back(current);
current = Segment{it->first,it->first,it->second};
}
print(result);
}
输出:
{3, 6, A} {7, 9, X} {10, 13, B} {14, 18, X} {19, 24, B} {25, 27, E} {28, 30, X} {31, 33, F} {34, 45, X} {46, 51, I}
--- A
-- X
--- B
---- X
----- B
-- E
-- X
-- F
----------- X
----- I
活生生的例子
实际上唯一昂贵的部分是1(O(段数x它们的宽度)。我相信可以通过使用两个容器来提高效率,一个根据left
排序,另一个根据right
排序,以便更容易地检测重叠。然而,上面的简单实现是我要开始的。另外,如果宽度相对于段数来说相当小,那么对于排序来说,O(段数x宽度)可能比O(段数x段数)更好。