首頁猿問用匹配的鍵合并行

用匹配的鍵合并行

Python

一只名叫tom的貓 2021-03-30 12:43:11

我有一個具有以下結構的文本文件ID,operator,a,b,c,d,trueWCBP12236,J1,75.7,80.6,65.9,83.2,82.1WCBP12236,J2,76.3,79.6,61.7,81.9,82.1WCBP12236,S1,77.2,81.5,69.4,84.1,82.1WCBP12236,S2,68.0,68.0,53.2,68.5,82.1WCBP12234,J1,63.7,67.7,72.2,71.6,75.3WCBP12234,J2,68.6,68.4,41.4,68.9,75.3WCBP12234,S1,81.8,82.7,67.0,87.5,75.3WCBP12234,S2,66.6,67.9,53.0,70.7,75.3WCBP12238,J1,78.6,79.0,56.2,82.1,84.1WCBP12239,J2,66.6,72.9,79.5,76.6,82.1WCBP12239,S1,86.6,87.8,23.0,23.0,82.1WCBP12239,S2,86.0,86.9,62.3,89.7,82.1WCBP12239,J1,70.9,71.3,66.0,73.7,82.1WCBP12238,J2,75.1,75.2,54.3,76.4,84.1WCBP12238,S1,65.9,66.0,40.2,66.5,84.1WCBP12238,S2,72.7,73.2,52.6,73.9,84.1每個ID數據集都對應一個數據集，操作員會對其進行多次分析。即J1和J2是由操作者J的措施，第一和第二次嘗試a，b，c和d使用4個略有不同的算法來測量其真正價值在于所述列中的值true我想做的是創建3個新的文本文件，比較J1vs J2，S1vsS2和J1vs的結果S1。J1vs的示例輸出J2：ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,trueWCBP12236,75.7,76.3,80.6,79.6,65.9,61.7,83.2,81.9,82.1WCBP12234,63.7,68.6,67.7,68.4,72.2,41.4,71.6,68.9,75.3其中a1被測量a為J1等另一個例子是S1vs S2：ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,trueWCBP12236,77.2,68.0,81.5,68.0,69.4,53.2,84.1,68.5,82.1WCBP12234,81.8,66.6,82.7,67.9,67.0,53,87.5,70.7,75.3這些ID不會按字母數字順序排列，也不會為同一ID聚集運算符。我不確定如何最好地完成此任務-使用linux工具或像perl / python這樣的腳本語言。我最初使用linux的嘗試很快就碰壁了首先找到所有唯一ID（已排序）awk -F, '/^WCBP/ {print $1}' file | uniq | sort -k 1.5n > unique_ids通過這些ID循環和排序J1，J2：foreach i (`more unique_ids`) grep $i test.txt | egrep 'J[1-2]' | sort -t',' -k2end這給我排序的數據WCBP12234,J1,63.7,67.7,72.2,71.6,75.3WCBP12234,J2,68.6,68.4,41.4,68.9,80.4WCBP12236,J1,75.7,80.6,65.9,83.2,82.1WCBP12236,J2,76.3,79.6,61.7,81.9,82.1WCBP12238,J1,78.6,79.0,56.2,82.1,82.1WCBP12238,J2,75.1,75.2,54.3,76.4,82.1WCBP12239,J1,70.9,71.3,66.0,73.7,75.3WCBP12239,J2,66.6,72.9,79.5,76.6,75.3我不確定如何重新排列這些數據以獲得所需的結構。我試圖awk在foreach循環中添加一個額外的管道awk 'BEGIN {RS="\n\n"} {print $1, $3,$10,$4,$11,$5,$12,$6,$13,$7}'有任何想法嗎？我敢肯定，可以使用awk，以較少麻煩的方式完成此操作，盡管使用適當的腳本語言可能會更好。

查看完整描述

3 回答

紅糖糍粑

TA貢獻1815條經驗獲得超6個贊

您可以使用Perl csv模塊Text :: CSV提取字段，然后將它們存儲在散列中，其中ID是主鍵，第二個字段是輔助鍵，所有字段都存儲為值。這樣，您可以輕松進行所需的比較。如果要保留行的原始順序，可以在第一個循環內使用數組。

use strict;

use warnings;

use Text::CSV;

my %data;

my $csv = Text::CSV->new({

binary => 1, # safety precaution

eol => $/, # important when using $csv->print()

});

while ( my $row = $csv->getline(*ARGV) ) {

my ($id, $J) = @$row; # first two fields

$data{$id}{$J} = $row; # store line

}

反對回復 2021-04-05

SMILET

TA貢獻1796條經驗獲得超4個贊

我沒有像TLP那樣使用Text :: CSV。如果需要，但對于此示例，我認為由于字段中沒有嵌入的逗號，因此對'，'進行了簡單的拆分。另外，列出了兩個運算符的真實字段（而不是僅列出1），因為我認為最后一個值的特殊情況會使解決方案復雜化。

#!/usr/bin/perl

use strict;

use warnings;

use List::MoreUtils qw/ mesh /;

my %data;

while (<DATA>) {

chomp;

my ($id, $op, @vals) = split /,/;

$data{$id}{$op} = \@vals;

}

my @ops = ([qw/J1 J2/], [qw/S1 S2/], [qw/J1 S1/]);

for my $id (sort keys %data) {

for my $comb (@ops) {

open my $fh, ">>", "@$comb.txt" or die $!;

my $a1 = $data{$id}{ $comb->[0] };

my $a2 = $data{$id}{ $comb->[1] };

print $fh join(",", $id, mesh(@$a1, @$a2)), "\n";

close $fh or die $!;

}

__DATA__

WCBP12236,J1,75.7,80.6,65.9,83.2,82.1

WCBP12236,J2,76.3,79.6,61.7,81.9,82.1

WCBP12236,S1,77.2,81.5,69.4,84.1,82.1

WCBP12236,S2,68.0,68.0,53.2,68.5,82.1

WCBP12234,J1,63.7,67.7,72.2,71.6,75.3

WCBP12234,J2,68.6,68.4,41.4,68.9,75.3

WCBP12234,S1,81.8,82.7,67.0,87.5,75.3

WCBP12234,S2,66.6,67.9,53.0,70.7,75.3

WCBP12239,J1,78.6,79.0,56.2,82.1,82.1

WCBP12239,J2,66.6,72.9,79.5,76.6,82.1

WCBP12239,S1,86.6,87.8,23.0,23.0,82.1

WCBP12239,S2,86.0,86.9,62.3,89.7,82.1

WCBP12238,J1,70.9,71.3,66.0,73.7,84.1

WCBP12238,J2,75.1,75.2,54.3,76.4,84.1

WCBP12238,S1,65.9,66.0,40.2,66.5,84.1

WCBP12238,S2,72.7,73.2,52.6,73.9,84.1

產生的輸出文件如下

J1 J2.txt

WCBP12234,63.7,68.6,67.7,68.4,72.2,41.4,71.6,68.9,75.3,75.3

WCBP12236,75.7,76.3,80.6,79.6,65.9,61.7,83.2,81.9,82.1,82.1

WCBP12238,70.9,75.1,71.3,75.2,66.0,54.3,73.7,76.4,84.1,84.1

WCBP12239,78.6,66.6,79.0,72.9,56.2,79.5,82.1,76.6,82.1,82.1

S1 S2.txt

WCBP12234,81.8,66.6,82.7,67.9,67.0,53.0,87.5,70.7,75.3,75.3

WCBP12236,77.2,68.0,81.5,68.0,69.4,53.2,84.1,68.5,82.1,82.1

WCBP12238,65.9,72.7,66.0,73.2,40.2,52.6,66.5,73.9,84.1,84.1

WCBP12239,86.6,86.0,87.8,86.9,23.0,62.3,23.0,89.7,82.1,82.1

J1 S1.txt

WCBP12234,63.7,81.8,67.7,82.7,72.2,67.0,71.6,87.5,75.3,75.3

WCBP12236,75.7,77.2,80.6,81.5,65.9,69.4,83.2,84.1,82.1,82.1

WCBP12238,70.9,65.9,71.3,66.0,66.0,40.2,73.7,66.5,84.1,84.1

WCBP12239,78.6,86.6,79.0,87.8,56.2,23.0,82.1,23.0,82.1,82.1

更新：要僅獲得1個真值，可以將for循環編寫為：

for my $id (sort keys %data) {

for my $comb (@ops) {

local $" = '';

open my $fh, ">>", "@$comb.txt" or die $!;

my $a1 = $data{$id}{ $comb->[0] };

my $a2 = $data{$id}{ $comb->[1] };

pop @$a2;

my @mesh = grep defined, mesh(@$a1, @$a2);

print $fh join(",", $id, @mesh), "\n";

close $fh or die $!;

}

更新：在grep expr中添加了“定義”以進行測試。因為這是正確的方法（而不是僅測試'$ _'，它可能為0并被grep錯誤地排除在列表之外）。

反對回復 2021-04-05

慕哥6287543

TA貢獻1831條經驗獲得超10個贊

Python方式：

import os,sys, re, itertools

info=["WCBP12236,J1,75.7,80.6,65.9,83.2,82.1",

"WCBP12236,J2,76.3,79.6,61.7,81.9,82.1",

"WCBP12236,S1,77.2,81.5,69.4,84.1,82.1",

"WCBP12236,S2,68.0,68.0,53.2,68.5,82.1",

"WCBP12234,J1,63.7,67.7,72.2,71.6,75.3",

"WCBP12234,J2,68.6,68.4,41.4,68.9,80.4",

"WCBP12234,S1,81.8,82.7,67.0,87.5,75.3",

"WCBP12234,S2,66.6,67.9,53.0,70.7,72.7",

"WCBP12238,J1,78.6,79.0,56.2,82.1,82.1",

"WCBP12239,J2,66.6,72.9,79.5,76.6,75.3",

"WCBP12239,S1,86.6,87.8,23.0,23.0,82.1",

"WCBP12239,S2,86.0,86.9,62.3,89.7,82.1",

"WCBP12239,J1,70.9,71.3,66.0,73.7,75.3",

"WCBP12238,J2,75.1,75.2,54.3,76.4,82.1",

"WCBP12238,S1,65.9,66.0,40.2,66.5,80.4",

"WCBP12238,S2,72.7,73.2,52.6,73.9,72.7" ]

def extract_data(operator_1, operator_2):

operator_index=1

id_index=0

data={}

result=[]

ret=[]

for line in info:

conv_list=line.split(",")

if len(conv_list) > operator_index and ((operator_1.strip().upper() == conv_list[operator_index].strip().upper()) or (operator_2.strip().upper() == conv_list[operator_index].strip().upper()) ):

if data.has_key(conv_list[id_index]):

iters = [iter(conv_list[int(operator_index)+1:]), iter(data[conv_list[id_index]])]

data[conv_list[id_index]]=list(it.next() for it in itertools.cycle(iters))

continue

data[conv_list[id_index]]=conv_list[int(operator_index)+1:]

return data

ret=extract_data("j1", "s2")

print ret

O / P：

{'WCBP12239'：['70 .9'，'86 .0'，'71 .3'，'86 .9'，'66 .0'，'62 .3'，'73 .7'，'89 .7'，'75 .3'，'82 .1']，'WCBP12238' ：['72.7'，'78.6'，'73.2'，'79.0'，'52.6'，'56.2'，'73.9'，'82.1'，'72.7'，'82.1']，'WCBP12234'：['66.6 '，'63 .7'，'67.9'，'67.7'，'53.0'，'72.2'，'70.7'，'71.6'，'72.7'，'75.3']，'WCBP12236'：['68.0'，'75.7 '，'68 .0'，'80 .6'，'53 .2'，'65 .9'，'68 .5'，'83 .2'，'82 .1'，'82 .1']}

反對回復 2021-04-05

3 回答
0 關注
207 瀏覽

關注

添加回答

舉報

0/150

提交

取消

亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

熱搜

最近搜索清空

用匹配的鍵合并行

用匹配的鍵合并行

3 回答

添加回答