Data mining of public SNP databases for the selection of intragenic SNPs
Aerts, Jan × Wetzels, Yves Cohen, Nadine Aerssens, Jeroen #
Human Mutation vol:20 issue:3 pages:162-173
Different strategies to search public single nucleotide polymorphism (SNP) databases for intragenic SNPs were evaluated. First, we assembled a strategy to annotate SNPs onto candidate genes based on a BLAST search of public SNP databases (Intragenic SNP Annotation by BLAST, ISAB). Only BLAST hits that complied with stringent criteria according to 1) percentage identity (minimum 98%), 2) BLAST hit length (the hit covers at least 98% of the length of the SNP entry in the database, or the hit is longer than 250 base pairs), and 3) location in non repetitive DNA, were considered as valid SNPs. We assessed the intragenic context and redundancy of these SNPs, and demonstrated that the SNP content of the dbSNP and HGBASE/HGVbase databases are highly complementary but also overlap significantly. Second, we assessed the validity of intragenic SNP annotation available on the dbSNP and HGVbase websites by comparison with the results of the ISAB strategy. Only a minority of all annotated SNPs was found in common between the respective public SNP database websites and the ISAB annotation strategy. A detailed analysis was performed aiming to explain this discrepancy. As a conclusion, we recommend the application of an independent strategy (such as ISAB) to annotate intragenic SNPs, complementary to the annotation provided at the dbSNP and HGVbase websites. Such an approach might be useful in the selection process of intragenic SNPs for genotyping in genetic studies.