00:00:10.1 I: And we are recording. Just that you know. 00:00:15.01 2: Already now? I: Yes. Yeah, just audio. 00:01:04.01 (laughter) 00:01:54.24 2: So, I made a pre-selection. I've actually never seen such a file. 00:02:01.2 I: Ok, interesting. That's fine. I was aware that was one danger that I may find files and people "What's this?" (Mumbling) 00:02:27.2 I: "I'm not sure" is also a totally acceptable answer. 00:02:32.2 2: I'm not sure of any of these, so I just try my best to find something. (Mumbling) 00:02:53.1 2: I take this, but I'm not sure for all three. 00:02:54.18 I: Ok. Just to read out this is the file Homo Sapiens GFF from NCBI and we are looking at the identifier/dataset/database and name cards for 100287102 HGNC37102 DDX1101. Ok, thank you very, very much. So I have two more files. 00:04:32.0 2: Again, I've never seen that GAF file. Whatever this is. I just hope GO is gene ontology term and FB fly base. Some sort of database again, I think. 00:04:14.1 I: Yup, fantastic. So, annotating verbally we have FBGN identifier/label/dataset database and we have a GO term GO:OO48149 labeled as GO term. Thank you very much. One last file and then we are on to the second task. 00:05:22.2 2: So is this like five different things? I: Ah yes, you don't have to annotate them all if you're not sure. Actually this is all one single row. 00:05:41.0 2: I would say this is a database, but then it could be identifier... I'll put this but I'm not sure. A1BG. Maybe a Symbole forr A1BG? Feature Type? I have no clue about the numbers. 00:06:07.1 I: I just annotate that quickly. So, we have the ENSEMBL labelled as a database or identifier and A1BG was labelled as a symbol in the NCBI Homo sapiens gene info file. So, that's the taxon ID, that's actually Homo sapiens - human - so that's fine. I think this is a gene identifier, gene "1". That's ok, that's fine. The main thing is, we're looking to see whether people can comfortable map columns from files to databases. So we can make it easier to do, if that makes sense. Let's say you're uploading one of your files to a database I we wanted you to say, well this is the column with the genes. So, it's getting an idea if actually is it as easy as we were hoping or not. You did really well, and you didn't even know the files. 00:06:58.1: 2: No, I do proteomics (laughter). 00:07:03.2 I: So, that was the first task. Thank you so much. The second task is using our pink cards. So, what I would like to see is basically take the cards and organise them. Now it is completely up to you how you organise them. There are absolutely no right or wrong answers, 'cause I wanna know what you think how they should be organised. 00:07:25.1 2: That requires a lot of place. 00:07:28.2 I: If you feel like you're missing any cards or you wanna make any notes, you can use the yellow cards. Talking about it would be helpful as we're actually not have the video on. 00:07:48.2 2: So I saw a lot of biological things in there. SO I maybe put them somewhere. P53, PUBMED ID - so we have a lot of ID and identifier they belong to a certain database as such. I would say symbol goes the same way. And chromosome and transcript I would put together. Ahm, so we have organism somehow on top of this. GO term I would also put here to organism. Accession, DOI, hmmm, disease. Ahhm, mhmmm, mhmmm, mhmmm. Soemthing like that, identifier maybe. Uniprot is a database. Publication gets a DOI. Expression. You want a transcript gets expressed at some point. This is Diabetes, GO Term, here's an example for GO term. There's another organism. Publication has also an author. Pathway, databases, not a DOI...molecular weight. It's now very specific. (Laughter) So, one is probably the symbol and one the identifier. Pubmed ID, this is an exmaple. Protein is expressed here. What is P53 doing here? Asthma - anotherr disease. Length - everything has a length. Homolog. The gene homolog, mhmm, mhmm, mhmm. (Someone popped in) So, I made several pairs of things, but not everything pairs up. I mean this looks similar, so this could be the same. I would like to get the publication over here, because then part of it can collaborate to protein. We have a publication with an author and a doi and a doi example. The publications maybe links to dataset which is stored in a database, such as pubmed with pubmed ID, maybe GO term. This goes more to a single gene/protein whatever, but can also be part of a database. 00:12:54.0 I: You can create multiple links, that's fine. 2: Yeah, length is here - So I put it in between maybe? I: Yeah, sounds good. 2: Pubmed ID - goes up here to the publication thingy, and this links to the database, and use all these identifiers or however oyu call this. and here's an example. And here we are - identifier, symbol. Okay. 00:14:00.2 I: If you're not sure just put it aside. That's fine. 2: This doesn't fit into my image. (unclear which card they are referring to) 00:14:21.0 2: Organisms, and some good examples - here. And then this consists of Chromosomes, genes, transcripts, protein. Pathways belong to everyone 00:14:34.2 I: Yeah, yeah. It's all making sense, it's all good. 00:14:42.1 2: So this I put here. Organism can get a disease. Put this on the other side, towards gene and transcript. 00:14:59.0 2: That's maybe the name of a chromosome. And P53 is a protein. Ah, and the GO term belongs to all of them, and is also linked, I would say. So, and the expression, no I don't know. 00:15:30.1 I: So, I just try to summarise and can you correct me if I get something wrong. I just want to make sure that I get the audio record. 00:15:32.1 2: Ok, but you can also take a picture of this. I: I will do that as well. That's why I've both. 00:15:38.2 I: On the left we have disease, and underneath we have diabetes and asthma. And that connects to organism, chromosome and gene. There's a large block of biological things. As well as gene we have homolog touching gene, transcript touches gene, ahm protein is near to transcript and has a molecular weight to one side, and P53 on the other side. Pathway is also associated with the large biological block. We have XY which is beside chromosome, we have GO terms and an example of a GO term beside pathway again in the large biological block. And beside organism, which is part of the biological block, we have D. Melanogaster and H sapiens. Then we have a separate block of things. Which has publication, author, DOI, example of a DOI, pubmed and an example of a pubmed ID. This links to UNIPROT, which links to database. And underneath of uniprot and database, we have accession, name, identifier, symbol, and we have some examples of a symbol BRCA1 and we have identifier which is Q9H4C3_HUMAN and BRCA1_HUMAN. Two items didn't fit in that was length and expression. 00:16:56.1 2: I would just like to correct here. So the PUBMED should not link to the UNIPROT. So, publication to the dataset and database 00:17:08.0 I: Ok, that makes sense. I'm glad that you told me I'm wrong because otherwise I would have taken a photo, probably I would have assumed. (Taking photos) 00:17:27.2 2: This takes up a lot of space. I: Yeah, maybe I should get some business card sized ones. (Taking photos) 00:18:47,1 I: Ok, one more question. It's just a very quick and easy question. Which of the cards are more important or interesting to you? 2: Interesting. I: Yeah, so like if you say you're looking at biological data, that's the one I would look at first. 2: I don't get it. I: If I was in a library, I might look at fantasy and SciFi first, cause that's what interests me. 2: So is out my personal interest? I mean, I would take something which identifies things very quickly, like accessions, identifier, DOI, PUBMED ID. 00:19:40.1 I: Ok, yeah. That's fantastic. 2: Some index. I: Perfect. Did you feel like any data was missing? From the set...not just like any data. 00:20:01.0 2: Not really. I: Ok, I will stop recording then. Do you have any questions you would like to ask? 2: No. *** END OF RECORDING at 00:20:13.01 ***