I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it.

The data: I concatenated all 22 chromosome .gprobs files. To do so, I did replace the '---' at the beginning of the individual .gprobs files with the corresponding chromosome number (so now each line starts CHR SNP BP A1 A2...) . I also removed the SNPs that weren't imputed well (INFO scores below 0.7)


plink --gen data_chrALL.gprobs_chrcol_below0.7inforemoved --sample data_chr1.sample --out data_chrALL.gprobs_plink

The error message:

--data: 13404k variants converted.Error: Line 13404781 of .gen file has fewer tokens than expected.

As I said above, I removed that specific line and reran it, and got the same exact error message. I tried removing the lines above and below (in case the numbering was off by a header or something?) but again, same exact error.

Any thoughts or suggestions would be greatly appreciated!!! I'm not sure if this is the best place to post this, but I'm in desperate need of help.

  • how did you remove the SNPs that weren't imputed well? Probably you can use qctools to remove those SNPs and then try again – Peter Chung Dec 13 '18 at 8:21

Plink is trying to tell you that it expects a certain number of items on each line (3N+5 fields where N is the number of samples) and on some lines it doesn´t see them. So,

(1) First of all, I would try to compare the lines causing errors to the ones which do not to see that the number of tockens/columns is actually the same, that it is correct and that there are no extra spaces or special characters which could cause escaping or misreading of the line. Also I would check which variants are causing troubles: maybe they are multiallelic or indels or something else and Plink doesn´t know how to deal with them. Or maybe there are no minor allele homozygotes at all for that variant and it is expressed in incorrect manner.

(2) I would check the specifications for the input files, both .gen and .sample to see that they are correct. As the files originate from Impute2 there might be some subtle differences.

(3) I would also update Plink version. From the code it seems that you are using either version 1.07 or 1.09. 1.x versions cannot represent probabilities and will make hard-calls so your lose a lot of information because of that. Plink 2.0 can utilize the probabilities and also should have better support for them. You will still be able to use hard-calls if you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.