0:00:00.0 Y: It's recording. 0:00:06.2 Participant 1: Okay, let's just check first what kind of... mmhmm, okay. [2 sorts cards on the table into piles] 0:01:06.8 Participant 1: Still on? 0:01:09.7 Y: Still on, it's still recording. 0:01:09.5 Participant 1: No, I meant is it still on the screen. Participant 1: Yes, it's just on the edge, but that's okay. 0:01:16.1 Participant 1: So the final results are the important thing, or the process? 0:01:20.1 Y: The process, the final result - I may ask some questions once it looks like you have settled a bit. 0:01:41.7 Participant 1: Seems to be some categories: Gene and Transcript, Chromosome. Let's see. 0:02:07.2 Participant 1: There is a disease category. This fits (places unidentifiable card in disease pile) 0:02:18.4 This card somehow goes here as well. (puts a card in the publication pile) 0:02:34.2 Participant 1: Don't know yet. 0:02:38.9 Y: you can put them to the side if you have any you're not sure about. 0:02:43.4 Participant 1: I'll save them here in the "scrap" pile. (continues to sort cards into piles) 0:03:22.0 Participant 1: Those are more or less guessing which is which. 0:03:27.2 Participant 1: Okay. So those two could be homologues. Yeah, this one here. 0:03:42.7 Participant 1: So at the moment I'm just looking for hierarchy. 0:03:48.5 Participant 1: So there was some term GO term we had here... protein binding. mhhmm... 0:04:00.0 Participant 1: So molecular weight somehow doesn't fit anywhere to me. 0:04:03.4 Y: That's fine. 0:04:08.2 Participant 1: And length as well. 0:04:14.0 Participant 1: So no, those won't fit anywhere for me. 0:04:17.7 Y: That's fine. 0:04:21.6 Participant 1: Ahh... protein... so. Protein expression, Homologues.... So, those are obviously some kind of accessions or identifier . 0:04:46.2 Y: Mmmhmm. 0:04:51.0 Participant 1: mmhmm, put them here. Either/or. GO Terms - don't know yet. 0:04:52.9 Y: Okay, so, can I, I have some ideas about what the groupings are, but can I get you to explain why it's grouped like it is, roughly? 0:05:06.6 Participant 1: Okay, so let's just start here: publication. So those terms are all related. Not all of them, but directly are related to publications. So they have an author, they have a pubmed ID, and they have a DOI. And this is a specific DOI. But thinking about it more other things can have a DOI like a dataset. So we could refine this and put DOIs kind of behind publications and datasets or databases. So, uniprot is a specific database containing proteins, so there we could connect also proteins to the databases, to get some more connections. And proteins are the result of expressions. I'm running out of space, it's hard to see. 0:06:08.0 Y: Maybe I should try some smaller cards. 0:06:10.8 Participant 1: No, that's fine. Expressions which are the result of some transcript. Many proteins together and other transcripts can be part of a Pathway if you count RNAs and ummm... okay. 0:06:24.7 Participant 1: So then accessions, identifiers, names are more or less - are often synonyms, to me at least. So this, P53 and Q9 something - I would take this as accessions. They're more technical, often, just a technical name, which often appears random, whereas identifier and name I guess is something you can actually read, and contains more information than a random string. Therefore I would say those are accessions, and those don't have anything. We have some things with name, like the Chromosomes, Genes. Genes actually go there, which is the basis of the transcripts. And then we have GO Terms. GO Terms are a kind of accession and this is a specific one. So, homologues, this is a term that I would apply only to Genes. Those two could be homologues - so the BRCA1 gene and the homologue in human. 0:07:52.2 Y: Okay, yep, that makes sense (laughs) 0:07:57.2 Participant 1: Okay, So what else do we have, Organisms- I remember just two specific organisms, the homo sapiens one so we could also make a connection here to the human gene. Homologue... so do I miss something? 0:08:13.3 Participant 1: Let's come back to the missing ones. Symbol length and molecular weight. 0:08:18.12 Participant 1: So Genes, Transcripts... Genes and Transcripts have a length. Symbols and molecular weight... So all the, all the specific things like chromosomes, genes, transcripts, proteins - they have a weight. And the others ones are more the names for things like GO terms and identifiers, these are just terms, they have don't have a weight. But at least, not a molecular weight 0:09:01.0 Participant 1: Publications, also not. They don't have a weight. 0:09:04.9 Y: Okay, Um, I think the ones we haven't touched on so far: we have the disease pile and the chromosome pile 0:09:09.4 Participant 1: Mmmhmm, yeah . Then I would put the chromosomes here, because genes they're located somewhere on the chromosomes. And the disease pile - what do we have here, diabetes and asthma. Probably closest to expression and pathways, because there something goes wrong, at least. It's more or less guessing because I'm not entirely sure about the sources of diabetes and asthma. But many diseases are caused by errors in expression ... gene expression. Could be any level, could be. Maybe not on the gene level. Ahh, they're hereditary diseases, I guess - not sure. 0:10:07.1 Y: Ahh, that's an interesting- it hadn't occurred to me that we can have, uh, environmental diseases, for example. Yeah, so you got the angle I was going for. 0:10:18.1 Participant 1: So could be somewhere here, expression, transcript, maybe gene? And also, maybe, I guess proteins are involved for both of them, some of them, for Asthma, I don't know what kind of substances actually initiate... the... don't know how to put this in English. Yeah. 0:10:46.5 Y: Okay, so that's really fantastic, that's been really useful. I have a couple more questions. Do you think any of these are more important or interesting than the others, any of the piles, the cards? 0:11:01.4 Participant 1: Mhhmmm, okay. Mmmm. Fr me personally, or? 0:11:12.8 Y: Yeah, for you, and in general. What's interesting, what might be an exciting entry point, for example? The things you'd focus on first. 0:11:22.9 Participant 1: Homologues. Because I think evolution is the key to everything. And homologues, somehow, are the closest to evolution in these terms, I think, because they actually name an evolutionary - not process, but it's the result of some evolutionary process. And, uh, and if you want to understand any of the other things, then you need to study evolution, which is done not in the extent that it should be at the moment. 0:12:04.0 Y: (laughs) fair enough. Okay, did you feel like there were any data or cards missing from this set that you would have added? 0:12:12.0 Participant 1: Mmmm, if so, then more evolutionary processes. Like, ah, mutations. Maybe I'll just put some terms (takes yellow card and starts writing). Participant 1: Yeah, that's fine - you can just scribble them down as notes, if that's easier. 0:12:24.5 Participant 1: Mutation, duplications of genes, I'll put this here (off camera). What else is missing? Mmmm, environment (writes "ENVIRONMENT" on yellow card). And I'm just using capital letters to keep in the theme of the other ones. Environment - all those organisms live together and influence each other, in particular the small ones, bacteria and so on. I guess also the chemistry somehow here comes into play (writes "CHEMISTRY" on yellow card), but it's more of an outsider in these terms. Then we have something with here molecular weight and connection with molecular weight. That's an even lower level than the proteins then, so there's some interactions with everything. Participant 1: Okay, so. I'm just going to take some quick photos. And, ah, I'm going to take some verbal notes. So we have GO Term, with the GO:0005515 protein binding TERM HERE) connected to accession, uh and to protein accessions P53 and Q9H4C3_HUMAN. That's connected to identifier which is connected to name. And we have length connected nearby to transcript, expression, protein, gene, chromosome, XY, and we branch off from the chromosome, to homologue, BRCA1, BRCA1_HUMAN, symbol and molecular weight, and in this area we have an additional card, mutation, duplications of genes. From the transcript/expression/protein card area, we also have a branching towards disease with examples of specific diseases asthma and diabetes. Uh, and we have pathways. And between, um, after after the disease, branching off from diseases we have environment and organism, and example of organisms d melanogaster and h sapiens then over back on back on the cluster of protein expression transcript gene, we have chemistry, and branching off we have dataset, database, uniprot as an example of a database, DOIs, which span between database and publications, specific examples of a DOI, and under publications, we have pubmed ID, author, and a specific example of a pubmed ID. Okay, I hope that's going to be easy to transcribe later. I figured it would be faster than trying to photograph it since there's piles. So, there's one task final and then we are done. Thank you so much for this slightly bizarre task. 0:16:06.3: Y: I'm going to give you some files, or some print outs of examples of files. So I just want to know if you can match these cards to examples of the cards in blue - and "No" is a perfectly acceptable answer. Participant 1: Okay, yeah Y: So, there's three files I think and there's two more here. Participant 1: So, there are multiple things highlighted, and I should assign to each of those single ones? Y: A card, yeah, and it may be that you can so no single card matches. Participant 1: Okay, (thumbing through cards) I have to find it again. Should I say it? Okay, 9606 goes to organism. Id.... okay, the 1 goes to ID. Symbol - A1BG goes to symbol.... and then we have... Okay, that's a more complex one. Participant 1: You can separate by pipe if it's easier. It's also okay to re-use cards if you need to. 00:18:14.0: Participant 1: Okay, looking for database... database (hunts through piles) Database for the 4th one and seems to me that this is gene and database, somehow there's cross-links between databases I would guess. 00:18:41.0 Y: And that's for MIM 13860 on the rest of that line on the Homo sapiens gene file. And so I've two more files. Similar task. 00:19:13.1 Participant 1: The second one is easy. It's a GO term 00:19:35.0 Participant 1: GO term... and the first one, I could guess it's a gene ID? 00:19:39.1 Y: Sounds good. Ok, last file. Thank you very much. That was the GO term drosophila (???) file. 00:20:24.0 Participant 1: So we probably have here a gene name. The last one the DDX11L1. (Sorting through cards). And the others ones are again database cross-references, ahm, to two databases. Another gene ID, so we could look for identifier. Crosslink to some database identifier, gene identifier (puts identifier card on table). And HGNZ already appeared in the file and I still have no idea. H probably is human, some human something, g could be gene identifier just from another database. 00:21:30.2 Y: Ok, and that was the Homo sapiens .GFF. Ok, I think that's everything. Any questions or comments? 00:21:45.1 Participant 1: Reading the files was slightly different. 00:21:55.3 Y: Yes, I agree.