Name: Nextflow Summit Boston Day 2
Uploaded: 2026-05-01T18:01:18.748Z
Duration: 4 h 7 min 44 s
Description: Nextflow Summit Boston Day 2

Transcript for "Nextflow Summit Boston Day 2": Okay. Let's get started. Thank you very much, everybody, for starting with us again bright and early on this lovely sunny Boston morning. I hope you all had a good time last night at the social and and have plenty of time to maybe speak to some old friends or former colleagues or or meet some new faces. And I know I had a great time. So I had some fascinating conversations last night. So I'm still buzzing from that. We've got a a great lineup of talks for you this morning. Maybe sorry. I should say to start off with probably the most important thing is the Tamagotchi competition. I know you're all desperate to know who the winner is. You have a little bit of time left if you haven't yet done your artwork. I mean, you should really be concentrating on the talks, but there's coffee breaks. So you've got a bit more time to put your your artworks together and post it on socials or on Slack with the hashtag x rays on it. We've had some amazing entries so far, so, no pressure. And, I think the the winner will be announced in the the wrap up talk right at the end of today. We've had also one small change, in the program. So, one of the talks this morning we're we're gonna have to miss, unfortunately, but that talk has been recorded and will be available online afterwards. But the first up talk, we have, from Geraldine and Marcel. You're both very well known figures and faces in the Nextflow community. They wear many hats, and are very well known. Geraldine was kind of previously known for working with JATK, at, at the road in that community. And many, many people have been started off in bioinformatics with her kind words and help. And when she joined Zakira, she took one look at our training material and said, we can do better. So we're gonna hear about some latest news from the Nexford community with Geraldine and Martha. Thank you Phil for that very kind introduction. Phil knows about wearing the many hats of the community team because he used to run the community team. I got his job by fighting him gladiator style. That would have been fun but he's so tall. Anyway, so no, I did not fight him but, he very kindly, let me take over his job and has been fantastic journey. So yes, and so, Marcel and I are going to talk about, some of the activities of the community team. I'm not going to give you a laundry list of all the things that we do. But today, we are going to focus on, two things specifically. Hello? Yes. Nope. The next one. Ah, there we go. We're going to talk about training and, Marcel is going to talk about the ambassador program, which is a really cool program. So this is kind of, split half and half between for the community and by the community, to some extent. So let's talk about training. As Phil mentioned, I'm quite passionate about training. I do believe that is one of the the best ways we can make people successful, with work with Nextflow is offer, quality training that helps you get started, that helps you, learn what you need, to be successful with the tools. And so, over the past couple of years, we've been putting a lot of effort and it is very much team effort. So it's a collaboration between, our little community team and the in house bioinformatics team, that you might have heard referred to as scientific development team. You saw several of them, give demos yesterday. You saw Florian's amazing talk about his work with, using co scientists for the protein prediction. Those folks are all, on that that team. And several of their members have been especially helpful in, helping us update the training materials. So what I'm going to do is kind of give you an overview of the resources that we provide, and I will talk a little bit about how we maintain them up to date and so on. There's some lessons learned here for that might be of interest for some of you who yourselves might be involved in maintaining training materials and developing training materials and, training users of maybe software you're involved in developing or maintaining. Okay. So this is our, training portal. It's, everything I'm going to show you is freely accessible all the time. It's open source under a creative commons license. So anybody can use that for self training but also to teach others and to to, to use as a starting point if you want to develop your own materials. Now, the goal is in the spirit of not reinventing the wheel as much as possible to to provide a resource that hopefully, caters to the needs of many so that if you need to train users at your institution or if you want to help, a colleague learn Nextflow or something like that, You don't have to develop your own materials. You can if you have your way of teaching it, and that's that's totally fine. That's great. But these resources exist so that you don't have to if you don't want to. So, that's our training, portal in general. I'll walk you briefly through the main courses that we have, that that we provide. They're primarily developed to be used as a self-service thing, but they can be used for classroom teaching. We do that regularly. Those of you who took, the trainings, earlier this week will have, worked through some of that. They use GitHub code spaces, so we provide a training environment that's fully loaded, that's got everything installed. You just open, it's freely accessible. It's run by GitHub. You have some quota that is typically sufficient for for working through the trainings, and you can spin that up. Or you can spin it up locally by yourself. We have some of the instructions if you wanna do it locally. But the idea is that we provide this training, environment so that you don't need to worry about installing anything. You can just go ahead, get started, focus on learning Nextflow, and that's really the point. And I'll I'll show you what that looks like in a minute. I always also want to mention that all of this is available in, 11 different languages, so English plus 10 others. And they were supposed to be flags on my slide, but they've all been replaced by the, code except for the Catalan flag because the Catalan flag does not exist as a as a preloaded emoji. You can't get that one. So that one, I had to go and copy paste the picture. And so the Catalans have the, the the, privilege of of getting their flag, when everybody else just has their country code. So there you go. Congratulations. As a reminder, Siquera is based in Barcelona, which which is in Catalonia. So, I guess it's fitting. Alright. Yes. The that's the training portal. What's in it? K. Here. Come on. Do I need to do some kind of invocation to make it progress? This is very bioinformatics. Do I need to turn it off and on again? Oh, no. No. Far too far. Far too far. Alright. Oh, look at it. You have sneak preview. Alright. Alright. No hiccup there. Sorry. It's Friday morning. Alright. So we have a number of courses. We have two main tracks, like, we we loosely organize these things as intro and advanced track. But, really, we have the the the resources that are the most mature at this point, Nextflow for newcomers, that is, we have kind of our flagship intro course, Hello Nextflow. Six parts. It takes about a day to teach in a classroom. You can go through it faster if you're doing it on your own, of course. But that walks you through all the basics of, the the the key things you need to know about Nextflow, having a gentle introduction to all the pieces, but in a way that's very goal oriented. You learn to develop a workflow step by step. Every step builds on the next one, the previous one, like a workflow. So, that is really our our our flagship course for beginners. Please recommend it to your friends and colleagues if you have people, who are interested in Nextflow. We have Nextflow run, which is a short version, kind of a bridge, less focus on code, more focus on running things, but with still some explanation of how it works under the hood. Because I believe in in looking at under the hood, even if you're not ever going to develop code yourself, it's useful to understand how it works, what are the main principles. Because when something goes wrong, it helps for debugging if you understand what a process is in the first place. So it's it's kind of based on that kind of philosophy. We have a few courses, a small growing portfolio of courses in the next flow for science category, where the idea is that the all the newcomer materials are domain agnostic. It's all designed to be, to be usable for for someone who doesn't have any, domain specific background. So it's not specific to genomics or any seek or anything like that. We have people using Nextflow for material science, for astronomy. So this is this is appropriate for anyone from any backgrounds. But then Nextflow for science shows, okay, this is how you apply the principles you learn in the basic, intercourse to a specific use case, like genomics, like RNA Seq. We have BioImaging one. We have, metagenomics one that's in development, thanks to some contributions from the community. So that's something that we're going to develop further. On the advanced side, many of you already know Nextflow or already have some familiarity. You're already developing with the tools. So there we have some some resources that go a little deeper. Hello Nextflow is the introduction to NF core. We know that getting to grips with NF core can be challenging. There's a lot going on in there. So this is, a course that in the same spirit as Hello Nextflow walks you through gradually how to learn, how to work with that. And and we've had people who have been using Nextflow for years come to us and say, okay. This the the NF core course actually really helps because now I understand how NF core pipelines are structured and so on. So even if you're already very familiar with Nextflow itself, you you might, be interested in checking that out. We also have more advanced training that goes deep into specific topics, like metadata handling, like composing workflows, debugging. There's a ton of topics. They're called side quests, and you can take them in no particular order. If there's a particular topic that you you need to level up on to progress in your in your project, you can browse that, and hopefully, that will help you. Anyway, bit long winded. Sorry. But the, this is this is the the catalog of, training that we offer at this time, and we are very actively continuing to develop that further. And we're very happy to hear, feedback from, like, what's what's missing? What's what what what are the topics that you feel like, we need to cover? Okay. In practice, what it looks like, this is this is just these are two browser tabs shown as one within Chrome. But, basically, we have the the instructions. This is to show you kind of how you would work through it. You have the instructions, on the web page that detail what to run, that show you expected outputs, etcetera. It's it's very, step by step. And then on the right side, you have the, that's the the GitHub code space that's loaded by default. It gives you a Versus code interface, and you have everything that you need. You have example code. You have, starter scripts and so on. And you just basically run through the instructions, in the code space. And everything just is is, designed so that you can work through the exercises, as you go. Alright. So that's what it looks like. In addition for Hello Nextflow specifically, Phil has recorded an amazing series of, walk throughs basically showing in practice working through all of these courses, and it's it's an amazingly, valuable resource. So thank you, Phil, very much for putting that effort in. It takes a huge amount of time to do that, so it's very useful. And there's a lot of additional little commentary and tips in addition to what's written in the instructions. People seem to be pretty happy with the course. This is the kind of survey that we take, for Hello Nextflow. This was last year. This year is, shaping up to be even better with the the latest changes we've made. People are generally really thrilled. I'm not saying that to brag. I'm saying that to say, like, please do use these resources. Please do recommend them, to your your your friends and colleagues, because we do think that this is a way to to get people leveled up, with a minimum amount of pain and maximum amount of, results. Okay. I'm yep. Okay. So really important, part of this is, as you can imagine, it's a fairly sizable amount of material. It's a lot of material. And as you might have noticed yesterday, the pace of development in Nextflow is pretty unrelenting. Right? Paolo and Ben keep coming up with new features every six months. It's it's act it's quite fast. You feel like it's fast for you, it's fast for us too. So we have to update the the documentation. And because of how not just documentation, but training materials. And because of how they're built and they're step by step and everything builds on the previous step, you change one thing. You add, like, for example, workflow level outputs. You add that to the starter script. Now boom, domino. You have to update every single snippet of code across dozens and dozens of pages. And have I mentioned it's available in 11 languages? It's a lot. So we have, this year started using Claude, for maintaining and updating materials, and it has been transformative in terms of enabling us to very quickly adapt, the materials. There's still some work that we need to do in terms of designing, deciding, for example, at what point do we introduce a new feature. But there's a lot of the the the work of just going and checking all the updating code snippets, checking that everything still works, regenerating expected results in the solutions directory. All of that now we can handle with cloud skills. And I wanna shout out Jonathan Manning from the side of team who has done a a a really cool amount of work, to set us up with this. Because now I can go in, I can make some changes to their tour tutorial and just ask Claude to rerun everything, update all the solutions. It is phenomenal. Like, I cannot overstate how useful this is. And if you you are, maintaining any kind of training materials, for for users or something like that, I really recommend you you look into that. You check that. It turns things that would normally take, hours into a task that takes minutes. It's days to hours. It's it's it's really transformative. So I'm not going to go too far into detail, because I'm eating up all the time. But I know we have a little bit of buffer time, so I'm taking advantage of that. I hope you don't mind that I'm holding you all hostage, but, I really wanna get this across. The other parts of our use of AI for, the training material updates is for the translations. And that is Phil's work when I said he was irrepressible. That was one of his side while he was doing all that Rust stuff, he still found the time to set us up with an AI power translation harness. So that now okay. We work on on on some development branches. We make a bunch of updates to the courses. Now we're ready to release all the updates. Do we have to go and and update all of the translations? No. We do not. Because we just run a GitHub action. There's literally a button that I go click that says run workflow, and it'll work for, I don't know, it's a few dozen minutes. I feel like it doesn't it it takes less than an hour. It it goes through it looks at what's at the diffs for what has changed, and then it will go update translations across all, 10 other languages. It's fantastic. The key principle on that, it's really clever in terms of saving on tokens and saving on processing time. There's two things. Right? One is just retranslates the the areas with the diffs. Don't retranslate everything. And part of it is also you don't fix the prompt. You you don't sorry. It's the opposite. You don't fix the file itself. So if you notice something is a little off in a translation somewhere, you don't make a fix to the translation itself. You fix the prompt so that next time, every time, it'll be automatically, correct. And and by that, I mean, the the the kind of things that you can do are, improve the grammar, improve the style. For example, in so my native language is French. So I was looking at the French translations, and I realized we do we did still have a lot of cases of, you know, we're saying the developer can do this, the developer can do that. In French, it'll be automatically translated to a, to, male pronouns, just because that's the default in the language. But French has a way of using inclusive notation so that you can, have, like, a gender inclusive form for that. And so we just had to add that to the prompt, rerun the translation, and boom, all of our training materials now are gender inclusive, which is really cool. And you can do that in any language that, that supports that. So we do this with the help of, people in the community who are native speakers of those languages. The the biggest part is, adapting the per language glossary so that we can adapt, technical vocabulary to have the correct terms for, like, for a channel or something like that. Anyway, so so that is very powerful. Huge thanks to Phil, for doing a fantastic job on that and enabling us. With that, I invite you, if you would like to, level up your skill Nextflow skills some more, please join us in two weeks for Nextflow training week. It's online. It's self directed. You choose when you work on on courses. You can get the certificates. You can get all sorts of cool stuff. But please sign up. Check it out if you're interested. And with that, having consumed plenty of time, I am going to hand, the mic over to Marcel, who is going to tell you about the ambassador program. Thank you very much. Hi, everyone. It's great to be here again. I was just watching Gigi's talk and looking at the audience here and thinking about what an amazing community we have. Right? This month is a very special month for me because it's it marks twenty years of running open source communities. Like, the first time I run an open source community was in April 2006. And I can confidently say that in the past twenty years, the Nextflow community is the best community I've ever worked with. Now we had the the hackathon before, the training, the summit here. Nobody comes for an award at the hackathon or certificate of hours in the training, or I don't know, something very like, industry focused in a summit. There's a lot of things about learning more and talking about Nextflow and get to know more people who use Nextflow and all that, right? I'm talking about these things because that's what we are always thinking in the community. What an honor we have of being here in the Nextflow community. And about two years and a half ago, we were discussing this thing, you know, look, this amazing community we have. So many people who are eager to talk about Nextflow over the world, in their fields of expertise, with their friends, with their work colleagues. They talk about Nextflow because they're not paid to do so because they like it. You know, they use Nextflow. They are happy. It helps them. They're really happy about it. And we very quickly, we identified what I call the two profiles of, very active community, people, right? One of them, we just call them ambassadors. Like they already talk about Nextflow whenever they can. They go to conferences, they give talks on Nextflow, they train their friends, they use it at work, they use it at university. Right? So they're ambassadors already. They just don't have the title, but they already work as ambassadors. They already didn't call themselves sometimes ambassadors. And some people, maybe they're a bit shy, maybe they don't know they can talk about Nextflow. They don't have the confidence. Right? But it could be a great job. They could do a great, great job. And that's what I call the potential ambassadors. Right? So we were looking at the community, looking at these two profiles and thinking like, maybe we can help them. I mean, maybe they need their help. We'd like to help them go further, right? So for the ambassadors, if they're doing this great job, let's make sure they don't stop. They are happy doing that. Let's help them in any way we can. And for those who have the potential to do so, let's make sure they feel confident. Let's help them spread the word about Nextflow, enjoy more Nextflow, you know, these things. So we're thinking about ambassador programs, right? I ran ambassador programs before. And when I joined the community team at Securio, it's like, you know, this is the project we need to do. And like, since the first day I was at Secera, I was always talking about this. Let's create an ambassador program. Or like talking to Phil, we have to create an ambassador program. This will be amazing. But I mean, the company was too small. We were still organizing a few things, lots of work to do, but eventually my persistence won and we decided to end up as the program. Right? So this happened, at the late twenty twenty three. We already knew this ambassador that already behaved like one. So it was just a matter of inviting them and talking to some people if they were interested. So we started the first cohort of ambassador program in 2024. We had 45 people and it quickly grew, right? Currently we have 143 ambassadors. Some of them are with us from the very beginning since the first cohort. Some of them join, then they leave for some reason, they change jobs, they change countries or some go, they get too busy and some join for the first time and continue in the program, right? At the beginning, we had 16 countries of residents. Right? Some ambassadors they were the 43 the 45 ambassadors we had, they were in 16 countries, but it quickly grew also to the 40 countries we have today. In all these two years and a few months, the ambassadors have conducted almost 700 activities. This is almost like 25 activities per month. There's a lot of work they've been doing. We don't count, activities more than once. If three ambassadors worked in one activity, here I'm counting them on only as one. So for the past few years and a half, it's a bunch of work they've been doing. And this is, the map of where we currently have ambassadors living. This changed throughout the cohort. Of course, we had people in different countries. Now we have in different countries. But the thing that I want to highlight is that this is where they live, not where they act. So let's take an example of Kubra. She's in Germany, but she's originally from Turkey. She has been doing lots of stuff in both countries. We have people who are in UK, but are from Iraq or in Brazil, but are from Japan or vice versa. So the countries where we act as ambassadors are actually much more, many more than what we have in the map. But it but still it's a very nice geographical representation. Even though it's important to highlight that ambassadors, even though this name has a very, it's very meaningful meaning meaningful related to geographical oh my god. They were I missed their word. Location, it can't be a field of expertise. Right? It doesn't mean that you have to be the ambassador in Brazil or in China. It could be the the ambassador in spatial omics, for example. It's not only about the geographical location, but also your field of expertise. Right? So some numbers about the program. So this is the categories of the activities we have had by the ambassadors. As you can see, lots of blog posts, trainings, events that are organized like hackathons, stocks they've given and so on. And one question we usually have is, okay, but what's the portion of that that is in person? Because we do value this in person interactions. Like here we have the hackathon that's training the summit, right? And when it comes to in person interactions, in many of these categories, a bunch of them are in person, like events and training, talks. So this is very nice. You see our ambassador really like active in the real world, not only online, right? So in 2025, which is the last year we completed the program, we got some numbers about how active our ambassadors were. And we usually ask for commitment of two activities per semester. So you have to do two things. Could be two talks, two trainings, two posters, two blog posts, choose something related to outreach and next level, right? And it's not always easy. Like sometimes the troubles happen during the semester, you change jobs or something like this. So it's understandable not all of them can do that. But we had over half of our ambassadors doing at least two activities. Then 32 of them doing five or more, up to four of them doing 10 or more activities in the year. So five times or more of what we asked. And then I would like to take this opportunity to thank our top 10 ambassadors that got their certificate of excellence. We have James, Julia, Matthias, Jose, Firdas, Evangelos, Mahesh, Maxine, Kubra and Antoine. They're mostly in Europe except for Firdas who is in Tunisia, but they've done amazing work in multiple different places of the world, in multiple different fields, different types of contributions. So we are really thankful to the whole community, of course, all the ambassadors. But these top 10, they are not only the most active, but if you want to join the program, if you feel a bit lost, if you don't know how to contribute, if you like to ask for some suggestions, these guys, they know a lot about the program, a lot about the community, a lot of Nextflow. So they're the best persons. I mean, you could talk to me of course also, but there are great people in the community that you can talk and learn more about how is it to be an ambassador. So James and Julia combined had 62 activities in 2024 and 2025. That's almost 10% of all the ambassador activities. So they are really active. And again, they do a lot of different stuff. It's not like the same type of activity every time. They do a bunch of stuff. They're great ambassadors. So usually what we do is that if you do something that sometimes doesn't fit in an outreach category, like, you know, you roll the pipeline. This doesn't count as an ambassador activity, but if you talk about it, it does, right? If you do a blog post about it, if you do a talk about your pipeline, if you record a video on YouTube teaching how to use it. So all these things are outreach and they count, as ambassador activities. So usually we ask people to write blog posts about these things, but not only that, like sometime you organize a training like Lara did is an outreach activity, but she also wrote the blog post about the experience. So that other people who want to run a similar thing, they can just read and learn from her experience, right? So here we have a few blog posts we had last year about the ambassador activities. We had ambassadors bringing information from COP sixteen that was related to workflows, computational workflows. We had people who wrote pipelines like, so Hybe on the virology field, went to events to give talks about it, wrote materials about it. We have feeders who was traveling over Africa to give talks about next flow, trainings, writing content online, recording videos. So here are just some examples of the activities we had in 2025. And I'd like to talk a bit about hackathons because the big thing we do like the one we had a few days ago and we also have a hackathon at the summit in October or November, depending on the month we have every year and also one in March, which is the hybrid hackathon. I like to say it's the largest one we have every year because it's hybrid. So we have people all over the world participating. We have reached sometime like over a thousand people participating in this hackathon. And what's the connection between the hackathons and ambassadors? So very often ambassadors like spearheading, lots of hackathon sites and stuff like this. Like for the hackathon we had here, some group leaders, they were an excellent ambassadors. Right? When we have the hybrid hackathons with local sites around the world, we have this 70% of local sites are always hosted by ambassadors. It's not a static number, but for some reason every year, 70% of the number of local sites we have around the world, they are hosted by Nextflow ambassadors. And a bunch of them who are not hosted by ambassadors, they are supported by ambassadors. So ambassadors are helping in almost any activity we can think of when it comes to community. So I think after saying all these amazing things about the program, what to do and so on, why not join the program, right? Like maybe you think you're not ready. Maybe you think you don't know NexFlow enough. Maybe you don't think you speak well or write well. Well, if you really want to help and you are already part of the community, you have contributed to the community, you know Nextflow, you're more than welcome. A lot of people, we help them how to better ride. We review their taxes. We help them with slide decks and all these things. So if you want to help and you like the Nextflow community, you're welcome. You can, I'm going to show the instructions to apply, but basically you're more than welcome to try to apply, right? So why joining the program? Like if you like the Nextflow community and you like the impact it has, why not help increase the impact with your contributions? Again, maybe in your geographical location, Nextflow is not well known or in your field of expertise. So why not bring a bit of Nextflow there, right? And it's like a two way thing. We can also help increase your impact. Like if you're sharing something on social media, we can repost. We can mention you in in our post so we can increase your reach. You get more followers, increase your reach and your impact in social media and all these things. Right? You also get some behind the scenes, about how to run a community. But not only that, very often when we have a new thing in Secura before we release to the public, we actually release to our ambassadors. Right? So it was the same thing with Secura AI, with so many projects we have had so far for like the VESQL extension, we first share with ambassadors so they can try, they can give their feedback to us. We can improve technology before it's released to the public. So you get this behind the scenes information on not only how to run a community, but also the technologies we have. We are always looking for nice events, right? It would be good to have Nextflow. We can let you know. We can help you go there. We can keep you posted about news in the topics associated with Nextflow and then of course. You get training. So if you don't feel confident enough, we can help you get more confident with Nextflow. We can help you do a training. We can help you understand, the training materials, even though anyone can ask questions about Nextflow and of course, there's a private ambassador channel where we talk and you can get like some special support, let's say over there. It's by me, but I mean, it's this an extra level of support I say we can do. In terms of recognition, you get a certificate at the end. If you've done your two activities, you get a certificate of recognition for participating in the program. And if you get like an excellence level of contribution, you get an excellence, a certificate of excellence, just like a second certificate some people like. And all these things that you were doing or you want to do, we can support you in doing so. Again, we have this profile that I like to call it the original ambassadors. They're arranging great work, but maybe they can do more. And the reason they can't is because maybe they have to travel. They don't have the money or they don't want to spend money to another event in that year. We can help you with that. So we have a budget for supporting with snacks for events, travel and accommodation sometimes for some strategic events and all the things they have budget for that. Oops. And like so far funding went mostly like to discount in official events, swag, snacks for hackathons or trainees if you want to give to the audience, and travel and accommodation for some specific events. So we have had multiple ambassadors traveling like the Zohai did I mention with the virology pipeline. He went to Portugal. We paid everything for him to go there and talk about, his pipeline, Nextflow and of course and so on. So if you want to join the program, you can just go to the Nextflow ambassador, the Nextflow website, nextflow.io. You go to resources. There's a Nextflow Ambassador button description and a summary of the program. At the bottom, you're going to have a link to the Nextflow Ambassador handbook. Lots of information about what it means to be an ambassador, what you should know, what you should do, how you can we can help you and so on. And also the apply here button that you can click to apply to the next cohort that's going to start in July. So our cohorts, they're a six month cohort. People apply during the one semester to join the next one. We have a kickoff call with foreign ambassador at the beginning and so on. So thank you very much. I'd be more than happy to answer questions if you have and thanks again for making this community amazing. You're a part of that. Thank you very much both of you. You can probably tell that we're all pretty pretty excited about the community. So if we get the opportunity to talk for a really long time, we'll we'll take it. Yeah. Any questions, in the audience before I kick off on a couple of my own? Own. Hi. Sorry. You said that ambassadors are on a six month rotation, but are are people admitted on a rolling basis or is that the beginning at of each six month period? So you start in the next, you start after you apply. Right? And you can stay. So every second when it gets to the end of the six month, I ask like, you know, what which of you guys want to stay? Most people say, yeah, I don't wanna stay. Some people like, you know, I changed jobs in different country. I'm too busy. I wanna leave. I wanna come back later. That's totally fine. So we have people who have stayed for six months and left. And there's people who joined in twenty four dot one and they're still here today. So it's up to you. As long as you want to contribute and you're still there, you can stay. I I think you might not have mentioned that we have a a call in January and in June. So we have two fixed time points of entry in the year if that's The January is right. Yeah. Oh, that was the thing about the rolling. Sorry. Sorry. Yeah. I think so. Maybe. So if you apply in June or January, it's the same thing. You're gonna start in July. Thank you. There was one thing. You were talking about the the translation work, on the training, and I I had to I just wanted to point out that, I wanted to give credit to the community of the fast API docs because I basically stole the idea from them. But, yeah. And, yeah, it's been fascinating seeing people come together with these these translations there. Especially languages which are so completely kind of foreign to me, like Korean and stuff. It's I love seeing all these Nextflow terms pop out amongst these wall of characters. And, And and definitely if there's if there's a language in there that you are competent in and you you'd be interested in seeing that represented, if you know that it would make a difference to to a community that speaks that language. Because we know that it it can be it can be challenging, if you have the double wall of cognitive, you know, challenge of of you have to learn something technical that's very difficult and it's in a language that's not your native language, I'm very familiar with that problem. It is it it's like a double problem. So if we can take away half of that by making things available in in your language or or in the language of the people that you're you're in contact with, we're more than happy to. We just we don't wanna just make a dump of all possible languages. We we wanna have at least one native speaker or somebody who is very fluent, who can help us gauge that, yes, this is this is correct. This is not outrageously, you know, we don't sound like gangsters or something. You hear about translations based on, not quite the right style. Anyway, yeah. We we'd love to get help with that. So thank you. So, Alexia, a few words about this. In, 2022, I think, it was around Save Venice Summit. And I talked about basically localization of training materials. It was just remembering that. And the thing is people think like, okay, everyone speaks English. Right? Well, in some places that's not the case. Even people like doing master degrees and PhD, even though they can understand English, they don't feel comfortable. It's not so easy for them to read content in English or to listen to content in English. So it really helps to have content in different languages. And in 2022, we already started trying to fix this and it's such a huge amount of work because first you translate everything manually. Right? And when you're done, something changed. Right? And then I speak Portuguese. Jordan speaks French. I speak a bit of French and then we have English and Spanish. But what about language we don't master? So we rely on someone else to volunteer proofread that. Not only proofread, to translate that. And at some point, we had, I think, the the basic training material in Italian, in French, in Spanish, in Portuguese, in English. And for a few days, it was so nice until everything got updated. Right? So at some point, it wasn't worth having it in different languages because the quality was not good. So it would we would rather not have anything in a different language than have broken code, broken instructions and all that. So I was particularly very sad because that's what we had to do. We couldn't have so many languages because it was just broken in different languages. When Phil came with the AI thing, I was so happy. And did he show you the now all the the the the skills we have with Claude and all these things in the smart only fixing the translation for the deaf in the recent committee and so on. This makes everything so good and easy because we we can fulfill this goal of having it in different languages, but also with quality, correct, and being up to date. It's just amazing. I don't know if I can transmit to you guys how happy I am about this thing but it's just amazing. It's a dream come true. And I will say those skills are in the training repo too. So if you wanna do that for your project, you're more than welcome to just copy it. Yeah. It's creative commons. You just give credits and and boom, you're you're done. And you've got you can set that up. And if if you need help setting up, it to actually work with the GitHub actions and so on, ask us and we will point you to Phil because he's the only one who understands how it works. But anyway oh, I have an Nf Core duck for the winner of our training quiz from, from earlier this week. So if you were the the the winner of the quiz, come and talk to me and claim your duck. Otherwise, we'll contact you. Sorry. Thank you. Thank you both. Next up on the agenda, we were hoping to have, Vipava from, Gail, but unfortunately, she's had to duck out, due to personal circumstances. So, she has recorded very graciously her talk for us, and we will put that up on, on YouTube, along with all the other talks after the event. So I do encourage you to check back in a few days when when the talks are online, and listen to her talk. It's a it's a shame because the the abstract just sounded really interesting. So we will push on. And, next up, I'm very I'm delighted to have, Rima come up and speak to us about her her pipeline for doing downstream analysis of RNA Seq data. Reema is based at Northeastern University and and formally was, had a stint at Novo. Is that right? Novo Nordisk. And we had a great conversation earlier. And she's also presenting here in Boston at BioIT. So I'm sure probably many people in this crowd will be there as well. So, yeah. Welcome, Humera. Thank you so much, everyone, and good morning. Today, I'm excited to present my work, which I developed and initiated during my time at Novo Nordisk, during for the Nextflow, downstream analysis pipeline. Yeah. So so I will start with the briefly introducing the motivation behind this pipeline, and the upstream foundation for the RNA sequencing analysis. Then the main scientific questions we want to solve with the count matrices and, downstream bottlenecks for the analysis. Then I will cover the DSL two architecture, for the custom pipeline development and key takeaways from my workflow. So, basic questions, of the downstream analysis is, like, this question motivates me to build this workflow. Like, while many workflow and methods for the upstream RNA sequencing processing has become, available and standardized, But still, there is the bottlenecks for downstream analysis, for differential expression and pathway analysis, which hand which, which introduced the result variability, result reproducibility, and, make the scaling difficult. So I started with the solving the question after the count matrices. Like, this is the NF core, upstream analysis pipeline, which is, everyone is familiar with this pipeline, and many of have used this pipeline for the preprocessing steps. And after the preprocessing step, the count matrices are mainly used for the differential expression and pathway analysis. So, once the upstream, analysis is done, the focus is mainly towards the differential expression. So, for multifactor analysis and different different, workflow designs sometimes required, sometimes it requires the downstream, different strategies. So, the question is about what are the main biological effects we want to solve with this, and does the experiment have a sufficient power for the analysis? And the same time, same time, we want to understand that the, experimental factors such as treatment, donor variability, cell type contribute to the gene expression changes, particularly in the multi factor study design. And in more complex experiments, lipidative measure or correlated sample introduce additional modeling challenges that require appropriate statistical approach. So these questions are critical because they determine, reliability and biological relevance of the results. And in terms of the downstream RNA sequencing bottlenecks, these factors are, mainly, critical play a critical role in the analysis. First of all, the study design, hybrid and multiplex design can help to balance signals across the batches. Repedited measure designs improve the power and detecting temporal changes, and group comparison designs captures the difference between the subjects. Alongside the study design, the power analysis is become the important portion to, that that require that estimate the sample size estimation and, evaluate the power of this, this study. The second major is the, issue is the biological and technical variability. So biological variation refers, refers to the, natural fluctuation in the analysis, comparing the within and between subject variability, while the technical variability is the artifacts, technical artifacts that are introduced during the, library preparation and sequencing methods. Along with this, along with this, proper metadata harmonization and standardization of results in the downstream analysis is also, bottleneck in the analysis. So to solve this problem, like, I started with introducing the combat seek in my workflow. So, combat seek is helpful for batch correction effect, when it is based on the binary negative binomial framework. And one of the advantage, it's that preserved account matrices by generating the integer like count, making it compatible with the DC two, Lima, and other Azure method, which required count matrices as a integer, values. And combat seek is useful for specific scenarios because other models are also included the batch variation in their models. But in some of the situation when, study is different have a different batch effect, then including combat seek as a separate method is useful for, the, capturing the and through biological effect. And and for another question is RNA seek power. Sometimes, like, after the study is designed, analysis is, run part perfectly. But, if we want to validate our our study with the power analysis is also a good approach. And, for the RNA Seq power, particularly the RNA Seq power package is available. So, we use user can use that for to estimate the study design in the workflow. It helps to determine the effect size, sample size, biological variability, sequencing depth, and significance level of the, analysis. And other, other, question is about the tool types. Like the, d sig two and Lima is, useful for different approaches. D six two model or raw count data using the negative binomial distribution, which directly capture mean variance relationship in RNA seq data. In contrast, the Lima transforms count into log two count per million. They seek to require raw, unnormalized integer count and internally corrects for the sequencing depth using the size factor. While Lima, on other hand, require preprocess data typically using TMM normalization followed by boom transformation to model variance. DC two applies this, shrinkage on log form log form changes while help to reduce noise in log count, log count genes. LIMA in state applies the empirical b base, inference, which, which, estimate the variance, improve the stability, and especially for, for smaller sample size. Both method relies on replicates for the reliable inference. D six two strictly requires biological replicates while Remai is more flexible in certain design as a time course data and where trends can be modeled. D c two often preferred for smaller sample size when working directly with the count data. And Limavoom is more powerful for larger dataset for when library size are very, significantly. So I build this modular DSL to multifactor RNA sequencing workflow with the NF core standard. And, this, this is based on the, the tools I have described earlier, which I have included in my workflow. So main, main workflow start with the configuration and input layer where the user defined parameter through the analysis, through the configuration file along with the next flow configuration file for the resource allocation and execution profile. This enables flexible execution across our environment such as HPC, Docker, or singularity. This pipeline, orchestrate through the main entry point of man main NF file, which load the parameter, build channels, and connect different sub workflows. This sub workflows represent key analytical steps including batch correction, power analysis, differential expression, and path analysis. Each of workflow calls the modular processes, that executes specific task using the R script where the course data statistical analysis is performed. Data is passed between, using the next flow channel along with the clear separation between hand data handling and computations. There is also the API config. So, if, user want to load the dataset with the date other database or other platforms, so API config can help to load metadata and count data without the manual download. And other option is for manual download, like, if, user want to upload locally their count data and man metadata to the Nextflow. So, two types of option, is available for the data input. And during the execution, Nextflow manager manage the resources of the CPU and memory that generates structured output, including result tables, log files, and for the traceability and debugging. Overall, this architecture, is based on and, of course, standard. It it is, produce, it is for reproducibility, flexibility, and create organizations of the result across the workflow. And if, user want to extend the analysis, so it's generate plot and figures, for the, differential expression and pathway analysis, in the PDF form. So yeah. Moving to the next slide. Like, this is, the metro, metro map, based on the workflow I have designed. The green line is showing the core workflow steps, which is, in it start with the metadata configuration parameter and count matrices. From from there, it moves to the, moves to the gene filtering, which is the first and mandatory step for RNA sequencing analysis. After that, batch correction is optional steps. If there are significant library differences or batch effects, user can apply combat seek after the filtering which generate corrected count file, which you can use in the further step of power analysis or differential expression analysis. So in the differential expression analysis is divided into two branch, d e seq two and Lima. This, in this pipeline, Lima is the main core method, where d e seq two is optional alternative depending on the dataset. The analysis uses the corrected count matrices if batch correction is applied. Otherwise, use the filter count in the differential expression. It performed normalization, applied, contrast definition from the configuration file, and generates the EEG list to which are saved, for each contrast. Finally, in the last step, path analysis and enrichment analysis. It is also, optional step, but it it generate the rank gene list file for from the enrichment of using the FGCA and MCIP DB pathway collections. And these are the output, of, generates through the differential expression and pathway analysis. So, moving to the next slide. The strength of this workflow is about, to it is based on the modular DSL two pipeline, which is reuse, which have reusable components, parameterized the YAML driven execution, standardized output, log files, and for the traceability, containerized reproducible runs across the environment. And for the testing and validation, the core module is functionally validated while, cross design dataset delivery and testing is underway. Run time and resource profiling is also ongoing. And in the next step, it's, it's the next step is for to optimize scalability, and resources and for the portable, the current upturner profiles. Overall, the architecture is stable and, based on the NF core standard. And the next step is more focused on validation and optimization. And in the my final key takeaways is that metadata harmonization and quality control are the most critical steps because, error at this stage directly impact the downstream analysis. Strong study design matter more than tool choices alone and statistical rigor is, what transform broad count data into reliable biological questions. There and there is no perfect tool section. All, no one fit size all matter. Means tool selection always depends on the experiment design and sample size. Replicates are essential for accurate and variance estimation in method like d six two, Lima, and Azure. And finally, the modular containerized workflow like Nextflow are, ensure ensures the reproducibility, scalability in RNA seq analysis. Overall, the goal is to not just run the analysis, but generate the results that are reliable, reproducible, and biological meaningful. I would like to acknowledge my, hand my mentor, for the guidance and support from the Novo Nordisk and, the Nextflow and Fcor community who provides the trainings and, advanced training for and resources for the, for the community. Thank you. Thank you so much. It was a great talk. The topic is very close to my heart, this one. Like I remember before we started NF core, our RNA Seq pipeline, we had endless discussions about whether we should sort of ship scripts for differential gene expression analysis and or it's a really difficult thing to do. Any questions from from the crowd? Yeah. There's a couple over here. Hi, Rima. Thank you for your talk. I really enjoyed learning about this pipeline. This it also takes me back, like Phil, because this is like the one thing that I did, gene expression analysis, related to bioinformatics in back in grad school. And we tried we also tried lots of different methods. And I'm curious forgive me if I misunderstood. It looks like your pipeline, you have the two methods and you you choose one or the other for a given run. Have you looked at all in trying to perform use multiple methods and try to fuse them to try to come up with some sort of composite gene ranking, or does that not make sense? So for combined of both methods, the LIMA and ASIC two combined gene list. Right? For example, yes. Yeah. No. It's generated separately, not combined results, but, it run one time one method, not combined method and generate a result based on selected method in the initial configuration of file. Rajeev, you choose one over the other depending on your experimental setup. Yes. So based on the sample size basically, yes. Some other questions? I have seen the batch correction. How about, like, the, you know, batch sample filtering? You know, how how you make that kind of decision when you do normally, it it's ready to process from sample to sample, it varies, you know. Yeah. You know, there is no streamline way of conducting because human has to make a decision. How you handle those kind of, you know, quality filter. You know, a decision to exclude certain sample. A decision to make, you know, sometimes sustain covariate. For example, in mouse, if you combine sexes, you know, if you get different things, you have to make a decision, you know, which line you have to go depending on that specific data set, how you make those kind of decision when you, you know, just, you know, use a streamlined workflow? Yeah. So it it is, basically in the starting point when so I also tried the user defined approach in with the Python script. When users start the analysis, there is multiple options pop up at the, like, window of what user want to perform or what sample or contrast, user want to select for the analysis. So still I'm working on that portion to, like, user define script. But otherwise, manually, user can select the parameter contrast type and exclude what, the sample which have, low gene counts and, other, like, specific sample the user want to perform for the analysis. So it is, during the parameter, script and the YAML it's generate. Yeah. That's something we've discussed before. It's like having a a config generation to walk on it. Yeah. Question here. Hi. So I noticed that you and for your, pathway enrichment step, that you maybe, baked in MSIG DB as the reference for GSEA. And so I'm wondering, does that mean that this pipeline is limited, to human and mouse references? And is there any way you would see in the future for someone to be able to sort of bring in their own reference for GSEA and the whole pipeline if they had nonhuman and non mouse models they wanted to use this pipeline for? So, currently, I have tested with the human sample only. Yeah. So, that's why I've used this MSICdb pathway collection for it for future. Like, we can elaborate and extend the pipeline once it properly execute with many complex data set of the human. Just call out there. There's been enough core pipeline called differential abundance, which is also slightly more generic, and I think that should work on any species. And we were talking yesterday about whether we can kind of try and come together and you know, take take take some of the excellent work you've done and and see see how we can fit this into the ecosystem together. I I have one more question on on related to the the topic about customization. Differential abundance isn't like an R Shiny interactive app, which kind of gets bundled with the results. So my question was gonna be if you thought it's all about interactivity. It sounds like you have a little bit with, like, a Python wrapper. Is that something that we're kind of kind of be be bundled with the pipeline somehow? Or do you have any thoughts about that? About interactivity? Or the visualization? Yeah. Visualization and and configuration of the experimental setup. Yeah. So, can you Maybe I rephrase the question. Could you tell us a little bit more about how your wrapper works to help people configure their experimental setup? Oh, yeah. So it is based on the, user defined setup, which I've I'm was trying to implement in the Python. So, yeah, it is, so user can so in that way, like, it generate first user defined parameters. So, according to the pipeline, it shows which, contrast or which experiment setup the user want to perform and which model they want to use, otherwise like pipeline execute model by itself with the conditions of the parameter. Good stuff. Thank you. Thank you very much, Yumi. Yeah. We can just set that again. Okay. Next up, we have a familiar face. So who will join us again and tell us a little bit about, what's happened been happening this week, with the hackathon. Log title C. Okay, everyone. So as you're probably aware, we have had the, we had the Nextdoor hackathon a few days ago here during this, before the summit, right? We have the hackathon every March, which is here we briefly talked about it. We have in the Boston summit, we have in the Barcelona summit And every once in a while people do their hackathons also in different locations. Right? Sometimes, quite often, I would say, I hear people saying like, yeah, no. I didn't participate in the hackathon because I'm not good enough or I don't know Nextflow enough. I don't have a project. I think people won't like me because I am not expert enough for Nextflow. And it hurts me because this makes no sense at all for any of the hackathons. Everyone should be able to come. Of course, if we have the training opportunity, anyone to learn, if you want to learn, great, you have this opportunity here. But the hackathons themselves are also a great opportunity to learn more about Nextflow and of course and related topics. Right? The thing is, I don't know about your background, but at least for me, whenever I think about hackathon, the first thing that comes to my mind is like a competition. I I don't know why. Maybe it's because in the past, that's how most hackathons were. People were fighting for an award or a title or something. There are teams battling each other around the same topic. So, well, I would feel threatened by joining this thing too because I'm just learning next flow and I have to fight these people, I have to battle them. I'm gonna get the last position in the ranking. No. I'm not going. The thing is the end of core hackathons are nothing like this. We have changed in multiple different ways throughout the years, but never in this direction. It's not a competition. So I'm sorry. So the first thing I would like to say is like, well, there's this there is one competition which is which is the quiz. Right? So there's a quiz, when we did Boston, there's some questions about Boston and some questions about Nextflow and NFTOR, but it's a game, you know. It's just a game. You get to learn more about Nextflow and Boston or Barcelona, depending where you are. And that's it. It's part of the thing. So it's the complete opposite of a competition. The NF core, the Nextflow hackathons in general, they are for fun. They're a way for you to get to know more people, to have fun, to learn Nextflow, to do something that you already wanted or something someone's working on and that's it, right? So it's a collaboration opportunity. And it's a very flexible one too. Maybe you want to just work on your project or maybe you want to work, you don't have anything to work on and you want to see, you know, who has a project that I'm interested in collaborating. Or like Harrison, you make this huge AI agent is doing a bunch of stuff. It is not even him anymore. It's just like the AI. So I don't know. Maybe we should count the AI as participants in the hackathons. What's bigger? Because there's so many agents doing lots of interesting stuff and amazing project by Harrison. So I would say these are the three things that should come to your mind when you think of an NF core hackathon. It's like it's fun, it's collaborative and it's flexible. You have a personal project you want to work on? Great. You want to work something in an F Corp project? Great. You want to work on someone else's project? Whatever. It's just about getting together, getting to know more people, having fun, learning more in Xflow and then F Corp. So we have some awards, not like prize awards, money awards, but we have this NFT Ducks, which are extremely valuable. In the near future, they're gonna be worth billions of dollars on Amazon or something like this. How do you call it? Not Amazon. They're eBay. They're gonna pay billions on eBay for these things. Believe me, these ducks, they are like, I bought I got more for my daughter. She's gonna pay her university in the future with this duck. So believe me when I say people would kill for that. We probably would have had so far, I don't know, 40 maybe. I don't know. But, like, it's really valuable. So congrats to the to the people who got, like, Benji got for the one with the quickest and most correct questions to the quiz and Joe was randomly selected to get. So, I mean, it's no. You don't even have to win the quiz. You can be randomly selected to get the lottery. So it's good. Right? So if you guys have to reach with the ducks, don't forget me. So a brief recap of the March hackathon, not the one we had a few days ago. The March hackathon, which was hybrid, has some interesting new things that I want to highlight here. So, it was hybrid. So we have people online and people in person like we had also last year. We had 41 projects registered in five different categories, 29 local sites around the world, 600 participants in 19 time zones, lots of PRs and so on, blah, blah. Great. The thing is last year we had a bit of a better experience. Let's say that some people, they were kind of lost. Like, okay, I want to work on this project, but it's not very active online. Apparently, it's people like in Australia working on that and I feel a bit disconnected. Right? So we heard that. And one thing we did in the March hackathon is all the projects, they have very clear if they're going to be mostly online, mostly in person. And well, if you really want to work on something which is mostly in person, great. You're more than welcome. But you know that maybe there won't be a lot of activity online, right? So we gave this, this, heads up to people so that they know what they were working on getting involved, right? And so far the feedback has been great. People loved it. We have people working multiple different projects, just like the one a few days ago. You don't have to work the whole hackathon on the same project. On the first day you can work on something. Maybe the second day, you wanna try something new. Right? And then you go to another project. There's no problem at all. Actually, that's quite amusing because you have new ideas come to different projects every day. So it was an amazing, hackathon. We have the end of course, statistic page that I forgot to put the image here, but you can see like the bumps in the charts on the days during the hackathon. So lots of activities being done. Lots of people happy with their personal projects, NFT core projects, work projects, whatever, whatever they bring to work on the hackathon, people get together to help. So now let's go to the Boston hackathon. So I'm gonna talk a bit of some of the projects that we worked on during the hackathon. It's in person in Boston. Now it's not an easy time to come to The US. So we don't have like a bunch of people, but we still have like about 30 people working on, 30 non secure ones plus secure ones. So it's maybe 40 something people working on this project. One of the projects that we worked on was the Typex migration project. It's something that we've been working on since three hackathons ago, like the one in Barcelona last year. Basically is to adapt the NF core modules to this new functionality we have in Nexo, which are the topics channels. Right? So it's much cleaner code, easier to read, easy to understand. It makes much more sense to have the versions of the tools in the pipeline be shared through this topic channel. It's like instead of passing information between processes, you just say, you know, this information go to topics channel. And this one goes to this topic. It's an easier way for different processes to share information in a single place, which is a topic channel. So we have the versions topic channel, right? I'm actually quite happy because as you can see, we merged, I think 30 or 40 PRs since it was a very productive project. And the best thing is almost everyone was doing that for the first time. So in the next NF core hackathon, if you feel you're not if you don't feel confident enough, you don't feel you're an expert on Nextflow or NF core, just remember we have this, first good, first good first good issues? Good first issues, which basically means that if it was the first time you're there, you don't feel very confident. You can work on that because there are issues that we have organized that are great for people starting. Not only because they are not hard, but because they're quick to merge so that at the end of the hackathon, you can have your PR merged. So we have people who have been here for the first hackathon ever and doing the hackathon in the first day sometimes they got their PR merged. I don't know how you feel about it, but if it was me, I mean, I had this experience in the past. It's amazing to know you were able to contribute something. There's something now out there being used with people all over the world and there's a piece of you that contributed to that. I think that's amazing. Some projects, it takes a bit longer to get merged. I mean, the topics migration one, the first time we did last year was actually only merged days after the hackathon. It's fine. But I mean, when you are starting and you see something you did merge during the hackathon, that's amazing. Like people are using something you did now. It's a contribution. You're officially part of the community as an active contributor. I think that is great. So this is the topic of migration. We've merged a bunch of PRs, converting NFT core modules to use the versions topic. And every project, this is a wall, is a very important wall. Like projects only exist if they have a meme. So if your product doesn't have a meme, you're out. You need a meme. In a good meme, you're evaluated. Okay. This is a competition. This is the competition part of the hackathon. So this is the and every day you have a new meme. So this is the best thing I found for the topics one. Because you know, when you you try you test your modules locally with all the test files, the version of everything else, it takes a while. Then when it's done, you push it to the GitHub repository. And then they get it GitHub actually gonna trigger and do also tests. And they'd like, I don't know, 30 tests or something like this. Because sometimes you're chatting, be like, you're not working. Like, you know, it's testing. You have to wait until the testing is done so you can go back to work. So this means we have lots of moments where you have a chat about life and random things. For the second project, we have this massive beast, which is Sarac, having something else edited. So, Gary and Joe oh, actually, sorry. So we had a bunch of people in the topics, project. We had me, Eamon, Chris, Dylan, Harrison, Annabella, Louie, and all the people who joined throughout the days. Right? For the GPU variant color being added to Cerec, we had Gary and Joe. So Cerec is getting bigger and better. Right? So you guys probably saw like in talks and discussions recently, how we are speeding up some tools with GPUs and and other stuff. So this was an attempt to add some GPU variant colors to Sarek, which is this, pipeline you have in Nf core. I think one of the most famous ones, right? I think it's RNA Seq, maybe Sarek second or third. And again, it's a big chunk of contribution. So it's amazing. We are really happy about it, but not everything resilient, you don't want your PR merge on the first day. You can go to a more challenging project like this one. And here we have the meme. Right? And oh my god, I think probably everyone who was in the hackathon kind of, felt the same thing. Like, you have your task, they're failing. And when you finally make it work and all the tasks pass, you realize the output is not there and that's why it's having a successful test. And then you go back to work again. Eventually, you managed to make it work. So here, no VCF was being generated, but they managed to fix it. For the cell painting one, so the cell painting is NF core, pipeline. Cell painting is a high throughput image based assay using drug discovery and biology to analyze cellular changes by staining cells with fluorescent dyes. I'm reading. Yes. So that same cell painting and there's a NF core pipeline for that. Ken has been spearheading this, for a long time. His personal project would say very nice pipeline, lots of progress. Every hackathon we have new things. Bunch of progress happened here in this hackathon as well. And many people got involved. Now they can, as I said, Rachel, Riley, Ronali, Jake, Faye, and also other people that here and there did some contributions. So lots of progress, lots of PRs be merged, and I personally love this meme. The pipeline too, but this is amazing. Right? The end of calling thing can break some hearts. Then we have the gene, I don't know, bother pipeline also in the core pipeline. So Nina was working on this one of the days. So this pipeline uses code on based methods to detect natural selection, giving multiple sequence alignments of protein coding genes. So if you like to learn genetics, it's a great pipeline to work on. She managed to do some progress and she was kind of alone mostly. So she has these memes to share her happiness. But again, amazing pipeline. And again, as you've seen some of these projects, right, here they're mostly NFT projects, NFT assets, but you can work also on personal projects you have and you're gonna see, in the next projects. So we have this work by Michael, Edmond and Ben on bringing some DuckDV integration to Nextflow. So this is a discussion that started in 2025. So very often you have this discussion happening and then you do something and you stop. You didn't know what to do, how to progress, and then you come back to that. Then there's a hackathon. You start working on that. It's another hackathon and you finish that. This happens quite often. Believe me. And in this example here is a discussion that started on on a GitHub, issue. Eventually Michael created a tool, for that. And now they're trying to make it an NF core module for I think it's Eider or maybe Aider. I don't know how to pronounce that. E I d e r, I think. So it's a way to bring some integration with DuckDV to Maxwell so that you can write, packet files for your next process and so on. So amazing work and well, packet all the things. I think there's a lot of people agree with that. And this is like the beast project by Harrison. Right? So it was Harrison Alana, maybe Harrison and the AIs. And basically what he did was using anti gravity in a bunch of different agents. He wanted to find some, cleaning tasks cleaning task. Right? Something that you have in your, repository. And it's a bunch of work, but manual work. You don't want to do that all manually. So he thought of asking these agents to look for that and find a way to fix all these things automatically. It's a bit, not threatening. What's the word? It's a bit, massive because you had, like, I don't know, out of a sudden 40 PRs to review if there are these PRs and all that. But I think one very nice thing that I think Harrison learned and I also learned that by following this process is that you can teach your AI to do better. Right? Of course. So if you do like a massive PR and that's not nice, why don't you split that in smaller PRs? And then his agents did that. And then sometimes, when someone is reviewing, you saw you did something wrong. So you fix that in this PR, but why not make the agents fix that in all the PRs? So in the end, one review is making you learn, is making all these other PRs evolve based on that. So I saw throughout the day, Harrison worked in the topics migration project and other projects also in parallel in this one. Also what he called the agentic swarm ripple cleanup. And I think it was very nice to see not only Harrison learning because this is the goal of the the hackathon. Right? But also the the agents, they were learning and changing and doing better. And for example, there's one thing an annoying thing that we have to do manually right now is that when we do an f crawl lint, it removes some comments, from some stuff that we want to keep. So we go there manually and put it back. We are fixing nf crawl link to not don't do that. But then he taught his agents, you know, after nf crawl link removes that, bring it back, like the TSP comments and the file formats. Right? So I saw this PRs evolving not only because the human manually was doing that, but also the agents were doing that. And I think that some tasks, they are so manual, right? They changed so little that it would be good if agents are doing that because then we have more free time to work on other things. And eventually when these agents learn and they get better, the review is going to be so simple. So this manual labor, this manual tasks that are annoying to do, but we have almost 2,000 modules. Like the subject's migration, we work on 40. Like a bunch of people working for two days, three days on that. Why not agents do that? It's so simple. Just like replacing some strings. Right? And dealing with the arrows that are gonna appear. And it's gonna free us so that we can work on more interesting things like fixing pipelines, writing new pipelines, not only changing syntax. Right? So I think at the beginning of this massive PR was a bit like scary, but it's it's worth it. You know. Eventually, they're gonna evolve so that they have more free time to work on on nice things. Then we have the last project, by Lucas, Chris, Chase, Rima. Oh, sorry. And there was this meme that I accidentally didn't change for the next project, but I love it. Like this is here this here is called AI. You ask the questions and it confidently lies to you. So I love this meme. So I accidentally forgot to replace for the meme of of Lucas project. I'm sorry for that. But then we have this project on writing an NF core, compliance, let's say, single cell CRISPR pipeline that basically pull CRISPR perturbations data with high throughput single cell sequencing. Bunch of people got involved. I saw the product evolving a lot. It was already a big pipeline. So it's a lot of work to bring new features to that. And they were able to do that throughout the days. So very nice pipeline, lots of progress. Congrats everyone. And I think that after all that, maybe some of you are a bit excited about maybe joining the next hackathon or maybe organizing a hackathon. Like, okay, Marcel, you convinced me. I never did it, but I did a train this time. I feel more confident. Now I'm gonna give it a try next time. But you know, next time is in the second semester that you want to do something now. So why not hosting a hackathon in your class, in your, lab, in your company, in your city? Why not hosting a hackathon? Part Marcel, because it's way more threatening than just participating in one. So the thing is we created, a Nextflow hackathon handbook, which some people call hackathon in a box. I think it's a nice name also. And this is a public document with all the instructions on what it is a Nextflow hackathon heavily inspired by this amazing end of core hackathons, right? So how they are, what they consist of, what things happen during the hackathon, what can I pick? Maybe I don't want to do all these things, all these socials, but maybe just the quest or maybe the socks, the sock hunting, right? And all this thing. So you have all these options, how to plan, how much in advance, some documents to make it easy for you to manage the hackathon. If you want to have local sites or not, how to organize the project is, is, almost 30 pages. It's evolving almost 30 pages document with lots of information. You don't have to use everything, but it makes you confident by hosting your hackathon. Maybe you are 10 people. It doesn't matter. A lot of work can be done by 10 people. Right? So again, hackathon in your city, on your lab, in your class, in your work. If you need help for that, you can reach out out to us at community dot secara dot io. You can look for ambassadors to help you. We we had a hackathon last month in UK, in London, and people got our hackathon in the box, the the Nexo handbook, Nexo hackathon handbook. They read it. They felt confident, but they still wanted help. So we had Nexo ambassadors in the area and they went there to help conduct hackathon. So you can also do that. Is there any ambassadors in my region? Can I call them to help me out? Of course. Right? And a great tip we do usually is to do a pre hackathon training so that you feel more confident. Because again, they're going to say all the things I said to you here and some people still going to say, I don't feel confident enough. So let's do a hello next to all basic training before that. And then we have the hackathon the next week or something like this, right? Again, no problem about starting small. A 10 people hackathon is way better than no hackathon at all. A great opportunity to learn and so on. So thank you all for your attention and I hope you join our next hackathon and host your own. Thank you. Thank you, Russell. We're running a bit over time. So I'm gonna say, if you have any questions for Russell, please do go and find him and and, yeah, to echo his sentiment, I very much hope that, it would be great to have more hackathons happening in in The States, especially. So, hopefully, some some keen people here. Last talks before the coffee break, we have the the lightning talks for the remaining lightning talks for the poster session. So, first up is Marcus. Good morning, everyone. My name is Marcus Sujansky, and I'm a bioinformatician at the MDI Biological Laboratory in Bar Harbor, Maine. For those of you that are not familiar with MDIVL, first things first, don't feel bad. We are a very small research institution. But the main focus of our research is actually regeneration and aging, which we study using a variety of model organisms including the axolotl, the African turquoise, killifish, and the zebrafish. However, with such a wide variety of species all across the phylogenetic tree, all working towards asking and answering the same questions, it's important to have a robust and reliable way to be able to compare cells and genes across these organisms, which unfortunately brings me to our problem. Especially in evolutionarily distant organisms, the ability to identify the same cell type or genes across these species in a comparative study, which we will often want to do, is heavily reliant on mammalian cell type annotation and gene names, which when you're primarily dealing with non mammals, like we do at MDIVL, adds a ton of unneeded bias into the data and fails to capture enough of the biological variation for our purposes. So, to help solve this problem, I designed and developed SC Sam App, which is a Nextflow pipeline, wrapping the existing Sam App algorithm written by Alex Tarashansky and Bo Wang from Stanford. I don't want to spend too much time on the math behind the algorithm, but very briefly, the algorithm constructs a shared featured space between two species and iterates between using BLAST scores and the single cell expression data in order to statistically identify similar populations between the two datasets. However, the pipeline adds crucial pre- and post processing steps that format the data correctly beforehand and perform novel downstream analyses, including homologue detection, cross species cell type annotation excuse me, not annotation, cross species differential expression analysis and a more robust cell type to cell type mapping. All of these additions transform the already powerful Sam App algorithm into a ready to use plug and play single cell comparison tool. All in all, SCCM App capitalizes upon the industry leading single cell cross species comparison tool of Sam App in order to extract biologically meaningful results without sacrificing accuracy. If you have any questions, please see me afterwards. Thank you. Fantastic. Thanks. Hi. My name is Praveen, and I'm a final year PhD student at the City University of New York. So my lab studies our lab studies venom, particularly the venom of cone snails and cephalopods. And the traditional RNA seek pipelines to identify venom are usually quite manual. They lack the ability to integrate proteomic and genomic data, and the outputs are not usually very easily shareable or accessible to everyone that would need access to it. So to address this, I developed two paired pipelines. One is called VenomFlow, which is based on a previous pipeline that was published in our lab that was a Galaxy web server based pipeline. So it's a modified and updated version of that. The second one is an analysis pipeline that takes, that integrates all the outputs of the first pipeline, including proteomic data if it's available, and then gives the user a list of putative annotated venom transcripts that can then be used for phylogenetic analysis or for downstream functional, tests. And it also gives the user a set of HTML reports. So for each sample, you get these set of HTML reports that look something like this with, like, a main menu page and all these different subpages that let the user explore all of the outputs and and and also download any of the figures or any of the tables very easily. And there's also interactive components to help people to make it more enjoyable to, like, look through the data and everything. And a little QR code, shows you an example of one of these samples. Thank you. Yeah. Next up, Sean. Hello, everybody. I'm not gonna hold you guys back for long because I know we are all waiting for coffee, I guess. But, before I start, I just want to ask, everybody quickly, quick question. How many of us have, like, some kind of RNA Seq data sitting around in our computer, from our collaborator, from our project from a decade ago, and we have never thought about looking for certain RNNs in there. Can I see see a list of hands? You guys don't have any RNNs six product lying around? Wow. You guys are lucky, I should say. Okay. So I I just want everybody to do an activity with me, which is that all the hands up and then make you whoever don't have have this ionistic data set, raise your hands and put this index finger to your thumb, I guess. Even if you don't have, you can just raise your hands for the sake of activity, I guess. Wow. We have to we have to make circles. Wow. I love this. So, essentially, what how do we change this? So, essentially, what I am showing you today is this pipeline which helps in identification of Sarkar and Ace from RNAV data set. And looking by the raise of the hand, I feel that my job is safe because none of you think about SIRCARNAs so much, which is pretty cool for that matter. So I'll just talk, like, show like, what happens is this pre mRNA molecule have this non canonical splicing happens, so it formed this closed covalent loop that we just made in our thumb to index finger. Very cool, like, physical model of the sarcoidina. But, essentially, what it is is oh, I have it over here also. Sorry. So I think what happens is that it's just any other quantification pipeline, it gets a pass queue file as input and it forms a count matrix where the circRNA identifies are on the rows and the columns are the, samples. Now what makes it special is the speed of it, which basically what I show you over here is the speed of SORTTT, which I call it 10 out of 10 because it's very efficient, I think. And compared to the benchmark of what is the fastest fastest SORTANA pipeline that we have in the field so far, CD three, and it's very fast. And if you have attended field stock from yesterday, fast is useless unless it's accurate. Right? If you haven't seen, you should go back and see the recording of field stock. Very interesting talk from yesterday. So we check the, accuracy of the pipeline as well. And for accuracy, we just compare the counts of the cert RMA with the, qPCR cycles that is needs to make the cork RNA reach the quantifiable level. So a good correlation means the molecular validation matches with the computational quantification. And what I show you over here is that it matches very well with the published benchmark of CD three and also over does many of the other pipeline that the field has. So it's not just fast, but also is very accurate, which is, of course, at comes at a cost of very low CPU and low run time also, which makes it very, like, powerful and useful, I think. And if you want to know more about it, you should come to my poster in the next room. And also, I have a good circular surprise for all of you who will visit me at the poster. So see you in the next room. Thank you. Thanks very much. And I I missed your your bio as well, but you need I've wanted to ask you about, getting coffee with a slight side of ice, by the way. And so repels both mosquitoes and bugs. Fantastic. Thank you very much. All the lightning speakers there. Alright. Thank you very much, everyone. Can all file in and, take your seats. That would be great. We're gonna kick off, the final session of the summit. It's a bittersweet moment. But we've got some fantastic talks, lined up for you in the in this next session. We try and kind of roughly group the talks thematically a little bit, and and you'll see that come out in this in in this section. First up, we have, Colby Ford who's gonna talk to us. He's, for those of you who are repeat attendees or or watchers of the Nextflow summit, you you might recognize him. This is not his first summit talk. And I I remember previously, I I mean, I'm I'm not very familiar with Azure myself, but I remember thinking and sitting in the audience and learning a lot last time. So I've been looking forward to his talk a lot. So, yeah. Without any further ado, Colby, thank you very much. Hello. Hello. It's always rough being the person that comes at goes, after the break. Right? So two years ago, I gave a talk on the stage about, scaling cloud workflows, scaling bioinformatics workflows in the cloud using high performance computing technologies like Azure Kubernetes. And in the last two years quite a bit has changed. We've got AI now. And so today, the idea is that we'll be talking about building, multi agentic workflows for our bioinformatics processes using Microsoft Foundry and Nextflow. So a little bit, what we'll go through is, I'll give you a little bit about me just to so you can see where I'm coming from, give a little bit of an introduction on where we stand today with, BioAI, And then I want every to get everyone on the same page of, like, what is a model versus an agent versus a workflow. And then I'll give an introduction to Microsoft Foundry. And then I'll give a a demo. Assuming the video works, we'll have a pretty cool demo. So two seconds about me. I'm a computational biologist by training, but I spent most of my career in building cloud architecture for bioinformatics. I am the owner of Tuple. We're a Microsoft partner consulting firm that builds cloud architecture for biotech and, the pharma space. Recently, we have a spin off company that's doing AI protein based design called Silica Biosciences. I'm also a faculty member at USC Charlotte, in the Cypher Center, which is our, basically infectious disease research center. I'm the author of Genomics in the Azure Cloud, a book that came out couple years ago. And then my new book just came on Amazon this morning, is building, agentic solutions with Microsoft Foundry, and some of this content is based off of what's in that book. I'm a three time Microsoft MVP awardee and Microsoft certified trainer. But despite saying Microsoft 15 times in one slide, don't run away if you don't use Azure or Microsoft, because I think there's still things here that you you might, learn. So some questions and goals. What are agents, and what can they do in the science world today? What is Microsoft Foundry and how does that enable, enterprise grade AI? And then how can we implement those workflows for bioinformatics, plus using our favorite tools, of course, Nextflow. So what is a model? I think we all understand what a model is individually. This is a large language model that can accept a user's input, textual input, and give a textual output. I think we're all aware of that. And you can provide a system prompt that sort of guides it into, a particular type of behavior. But by itself, a model doesn't actually connect to anything or isn't connected to anything. And it's limited in its knowledge to its training data and whatever information that you've given it as part of its context window. But it has no memory between different conversations and really isn't connected. It's it's pretty stand alone. So some examples of models are these foundation models like, GPT 5.4 or Quad Opus 4.7. Also open source models like what you see with, Microsoft's c four models and then the millions of others that are on Hugging Face. But when we start connecting those models to other sources and other capabilities, this is what we call an agent. So agents utilize that model and they perform a particular task as defined by its instructions. So the agent can use tools, and connect to other other platforms through services like MCP. They can search the web or they can also query data from, a rag based architecture. Also, they can have, they can be connected to a memory store that allows them to remember things in between chat sessions. So think about if you use Claude or ChatGPT, how it kind of remembers a little bit about you between different sessions, even between different chats. So you can enable this for enterprise workloads as well. And then also, agents are subject to guardrails. That way you can prevent them from, doing something you don't want them to do. For example, you could prevent an input. A user can't, say, ask for, your agent to make political content. Or you can also prevent the output. For example, you could, limit an agent from writing malicious code. So some examples of these instructions or the purposes for agents are you are a research agent, you're an expert Python programmer, or you're a famous Italian chef named Paolo. Is Paolo a good cook? I don't know. Yeah. Of course. So if we zoom out a little bit and we if we attach multiple agents together, this is what we call a workflow. So workflows work really well whenever there's a more complex task that may require multiple steps to get to the solution. So this contains multiple different agents that have different purposes, and then also we may have some logical steps that help to either throw out an error or reloop back, to to try again. And so the very basic diagram that I that I'm showing on the screen here is we've got two agents in a sequential flow where I've got the first agent that maybe can reason on a problem, break that problem up into, into a more manageable chunks in a more consistent way, and then hand that task over to a coding agent that is more specialized for the coding task. And I'll show the the next flow demo that follows this. There are other architecture options including, concurrent architecture where multiple agents can work at the same time on different tasks, or group chat where multiple agents work at the same time on the same task. And then another architecture, that's being that's pretty, popular is the human in the loop where agents kind of work autonomously, but then if there's some sort of action where we need a human to say to approve it or to approve some sort of, like, destructive task or something like that, then there's a human in the loops, a pause where the the human can say yes or no. There's a million different architectures, but these are just some examples. So now that we're on the same page with that, so where are we with bio agent agents in the field today? You may have seen with OpenClaw, some people have have formed some wrappers around that, ClawBio and BioClaw. Of course, the two different platforms that almost the same name, and very similar purpose for sequence analysis and retrieving data. There's also a lot in protein designs. So, like, three ten .ai has a proprietary protein design platform, and Blatant Y is an open source one that came out recently. And then also the big players are starting to put out their own life science specific things like GPT Rosalind or Claude for life sciences. But all in all these have some very common, utilities like sequence analysis, structural biology, data visualization, literature search. And then recently some of them are coming out with more like quality control automation and regulatory prep, stuff, which is great because that's the part that most of us don't like doing. And if you wanted to run some of these tools locally, it's cool. Right? Like you can run this on your laptop and it works, but it's it's great until it isn't. Until you hit some sort of limits. And so some of the limits are what I show here. One is scalability. So if I wanted to run a very, very large model, can you run that on your laptop? Or can you run that if you have a Mac Studio or a a pretty decent GPU at home? Maybe. But can you run multiple of them? Can you run a big workflow with multiples? Right? And if you have things running and you but now you want to connect to other data sources, do you know how to configure that? And if you can configure that, are you doing it well? And are you doing it in a in a way that, that works more optimally for for your agent workflow? And then let's say you get all that, set up. Now you've got the security question. So how do you control who can use your AI agent and how do you control what they can do with it? And this is where it's really difficult to sort of string together, duct tape these things together, when you're doing this locally. And this is the only marketing slide in the whole whole deck here, but this is where Microsoft Boundary comes in. Microsoft Boundary is was, announced in November at Ignite. It is the generative AI platform in the in the Microsoft Azure Stack. And it includes various capabilities that are I think are really cool and they're really easy to use. The first is the model section. So you can internally at your organization, you can deploy your own version of, say, Claude or a GPT model. You can also deploy a series of open source models and then over I think it's like 14,000 different models from, Hugging Face. When you deploy these models, they're internal to your network, to your security, to your everything in your in your cloud environment, which means that you can achieve the the security and compliance requirements that you need for the privacy of your your data. Once you deploy these models, then you can wrap them up as agents, and those agents can then connect to two other services in Foundry. First is Foundry IQ, which allows you to build knowledge bases on your own data. It's super easy. So if you have data in a data lake, for example, or other databases, Foundry IQ can do the vectorization and the the embedding process, and you can connect to the the knowledge bases there. And then also, Foundry tools includes tons of tools that are already preconfigured for other cloud services. So that could be like Azure SQL DB or Databricks or, also things that are not in the Azure Stack like Supabase, for example, is a tool. And there's also a generic MCP tool connector, which is what I use for the demo that you'll see in just a minute. And, it where you can connect your agents to these external services. There's also a, a local foundry. It's a open source project called Foundry Local that's sort of like Ollama where it allows you to download and run models locally like on your laptop. The difference with Foundry Local versus other tools like Ollama is that it it will re re optimize the models for your specific hardware setup. So you don't have to worry about, like, picking the right model that can run on CUDA or that can run like, quantize it for your specific memory, restrictions. The Foundry local does that automatically. And then all of this is wrapped up in a control plane that allows you to achieve the security governance and compliance that you need. So you can keep an eye on who's doing what, token usage, how much is it costing, put put those guardrails in place, etcetera. When you, spin up a Foundry project, you get a home screen that looks like this where you get a project endpoint. This is an OpenAI compatible endpoint. So if you use the OpenAI Python library, this endpoint works for that. It's kinda funny because this endpoint also works regardless of what kind of model you're using. So you can deploy a cloud opus 4.7 and call it from the OpenAI endpoint. Seems a little scandalous. And but also, you you get this similar endpoint for all of your agents and and models. So I'll show a little bit of a demo. And in Microsoft Foundry, the workflow builder tool is this nice DAG or nice, GUI based interface, but it it abstracts out to a YAML file, which I'll show you at the end. What we'll build in this workflow is two agents. The first agent will be a bioinformatics evaluator agent. It is prompted to, analyze a specific scientific question and break it down into consistent steps. So for example, what datasets are needed? What type of analysis is this? And it will give a more structured way of handling, for the coding agent Next to handle that that process. So the Next agent is a, is a coding agent that is connected to my Secura MCP endpoint. And it also can write Python code and Nextflow code and actually perform the analysis. And then we wrap this up in a workflow that has some logical steps to tell us do we need to loop this back, try again, or fail and tell the user. So again, there's two agents for two purposes. The one on the left here is the bioinformatics evaluator agent. It's been prompted to just say you're a super great scientist, break break up the problem into this consistent format. And you can notice that it's connected to a tool. The tool is the web search capability. So this agent actually has the ability to search the internet. And then the second one is the Secura agent. This one is you're a coding agent that's an expert in Nextflow, and the tools that it has connected is a code interpreter, which can write and execute code on your behalf. And it also is connected to my Secura account through a token with through the MCP endpoint. And the way this works, I know this is a little small to see but maybe I can point to things. The what what will happen is we start at the at the top here by asking a question like, how can I help you today to the user? We set a couple variables. One is we set the loop counter. So this this doesn't just loop in, infinitely. We start with, loop count one. And then we store the initial question as a variable. Then that first agent takes that, scientific problem, breaks it down. If it's not successful in breaking it down, it takes this left path where it will say it's complete, but sorry I'm not able to actually help. Please ask another question. If it's successful, it takes that right path and goes and turns that problem over to the Sucura agent. That agent can then take one of three paths. If it's successful, it will complete and give the output. If it's unsuccessful, it will loop back a couple of times or a few times to the, the bioinformatics, evaluator agent. And then if it hits the limit count, which I've set to five here, it will say, sorry. I've I've maxed out my retries. And hopefully, this video works. Maybe. Yeah. Okay. So in this, and don't worry I'll zoom in a little bit on this, but I asked this agent to help me, perform an analysis on a bio project from the short read archive. This is a short read archive from, some plasmodium vivax samples that I produced, like, six, seven years ago. I gave it no other information other than help me do this. And the evaluator agent is able to break down that that, problem. It actually went out and found my paper on this, detected that it was plasmodium buybacks, detected that it was short read, RNA seq samples, and then handed it to the Sequera agent, which was then able to go out, grab the individual samples. I'll show you that. It's able to go grab the individual samples and then, execute, or sorry. Go to the Nextflow core, pick the appropriate pipeline, and then execute the code. So again, here's where you see help me design a bioinformatics workflow for this SRA project. And then I didn't tell it anything about plasmodium. I didn't tell it anything about RNAseq. It actually found it from the PLOS article that I wrote. It detected that it's a plasmodium vivax and then it said what the data type is. And then the next part is where the Sequera MCP, tool is called where it says, okay, go grab the constituent datasets. I think there's 10 of them here. Knows us that it what kind of data it is, and then it goes and selects the appropriate pipeline. Then it starts actually writing the next flow code. And then if I allowed it, then it will, it will create a cloud execution plan and execute this in the cloud in the Secura cloud platform. Then it'll give me the results and outcomes and also give me any, any sort of output information like how many tries it took, etcetera. And so all in all, this is where, tools like Microsoft Foundry allow you to achieve that enterprise scale AI. So it gives you the security compliance, etcetera, but also gives you the flexibility to build agentic workflows that work really well for Nextflow just out of the box because we have an MCP endpoint, but also can point to your internal data and help you to process and automate these things at scale. If you're interested in this code, the GitHub link is at the top here. It's, you can spin up your own Foundry project and just drag and drop my YAML files into your project and and recreate this whole thing, assuming you get your own, MCP token. And then the, the QR code at the bottom here is the O'Reilly, Amazon page. The book just came online this morning. It's not quite available for preorder, so favorite that, link and check back tomorrow on that. And feel free to connect with me online. Thank you. Thank you very much, Cor. That's a really impressive talk. That was fascinating. Any questions from the audience? Can start one off. With the demo at the end, we're because we were watching it go through it. You showed it selected an existing pipeline from NF core, but then it also started writing some Nextflow code. Do you know what what was the Nextflow code it was writing and what was that? Yeah. So there's it there because I didn't actually tell it to, execute in my SecuraCloud, account for demo purposes, the, it it sort of has both options. So that's something where the prompt can be changed a little bit to say, if you found one, don't write a new one. But if you didn't, then then help. So that's what's going on there. Okay. So it was writing a new workflow based on what it found. Yeah. Yeah. It was more of a demo workflow sort of showing how the how that pipeline should be used to sort of what it was actually running. Okay. Cool. Oh, yeah. Matos. What's your experience with or what's your feedback on using the big models like GPD or Opus on Nextflow? Do you think that they are there yet and they're right Nextflow like a good Nextflow programmer or there are things that they can be better? The short answer is I'm not really sure. I think the challenge is cost trade off, right now. So like, for example, the book, I just finished the the rest of the book and sent it off to, for peer review. And to get through the examples in the book, it was, like, $600 just to get through the examples. And I'm not running anything huge. You may have seen it. I was running GPT four o, so not the latest and greatest models. So in terms of, like, is this better or as good as a certain level of Nextflow programmer, it depends on how much you're paying them. Because at some point, you should just pay for the the Nextflow programmer, to be honest. But I do think that some of the open source models are are, are doing really well, especially for coding particular purposes. So I was having a conversation with someone last night about, like, fine tuning, some of these models for, like, Nextflow specific or bioinformatics specific use cases, and you can just swap those into the model. So swap out GPT whatever for that fine tune model, and that's that's probably a better use case. Awesome. Can I no? No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No Yeah. The only weird thing that that it does is it has no con the just the MCP endpoint itself doesn't have any concept of where the data is stored. And so some of the pathing is a little off. So that's where, again, making the prompt a little bit better or being a little a little bit more verbose in, in what I'm asking it to do or in the in the instructions would have improved that. But otherwise, the the code looks great to me. But the it's some of the pathing is a little weird for certain samples. You took my question, Matus. I was kinda so, final one for me. I mean, as I really enjoy seeing, real life examples of people using the secure MCP in different environments. I mean, I think it's a common theme for we try not to ever have any kind of walled gardens or any lock in. And and I love that people can now kind of take our tools to where they're already working. Is there any kind of anything what was your experience like with the Secure MCP? I mean, it's fairly new. We're kind of still feeling out a little bit about how it should work and what the best interaction model is. Did you find that anything was missing with it? Or The, putting me on the spot here not to say bad things. No. The it was actually really easy. So with the Foundry generic MCP tool that's in there, all I had to do was just figure out how the authentication token worked, which wasn't inherently obvious, but I got it working. But otherwise, the rest of the tool stuff worked really well. One one thing about Foundry that's really nice is when you're playing with this, you can get, like, a a demo chat window up on the side, and so you can continue to play while you're building to see, like, oh, I've changed the prompt or I've changed the model or I've testing this tool. And it was it was really easy. One thing with MCP is that, and when the way that Foundry calls MCP is that it authenticates and gives permission to each individual action. So that's a little bit annoying, and that's not an that's not a Sequera problem. That's a, like, a security thing where it's saying, oh, I need you to I need you to grant permissions to list the datasets. I need you to grant permissions to actually go query your data. I need you to grant permissions to actually execute. And so you had to do that multiple times before the thing would finally actually work for the first time. But otherwise, it was very easy. And I sorry. I said final one before, but I have one more. Yeah. In terms of, like, as we move more to this model of using agents, how do you find the determinism, like, aspect of things? I mean, agents are inherently nondeterministic. Like, is there a way we can kind of make them more reliable? Yeah. I think, using using agents to execute is a lot better. Like, you think about it as your in your design process. Right? This is the same conversation I had last night of when an agent is having to, like, rewrite the same code every time to execute something, that's a poor use of tokens and that also, just gives you variability in what's being run. Whereas if you just had, like, a run script that's sitting there and all the agent has to do is recall that same run script every single time, then that's way more consistent. So I think just a design paradigm is don't have your agents redo work just like you wouldn't have a a human employee redo work. Have the have the agent or an employer or whatever have a standard operating procedure that they're following and then that that gives them a a higher level of deterministic feeling to the to the process. Great. Thanks very much. Thank you. Fantastic. Right. Next up, we have, Joe from, Fiserv. He's gonna come and talk us about his, platform, iFOX. Joe works just down the road in Cambridge. And, yeah. Hope welcome. Thank you very much. No. Thank you. So, yes. Hi, everyone. I'm Joe. So I'm advisor. I'm in the machine learning and computational sciences group, which is part of the medicine design research unit. So our group, sits in between, the research units and our digital infrastructure team. So our team is a mixture of bioinformaticians, cheminformaticians, and data engineers. And today, I'm gonna talk a bit about, how we're updating our computing infrastructure that supports our research at at Pfizer. So first, just a little bit about how we use omics data at Pfizer. As you see on the slide, we use it to inform our r and d decision making. So research, like you can see on the slide, we have this nice linear path for how our research is done, but, you know, it's always a bit of a jumble. But if we can impose some work on it, it looks a bit like this. So, typically, it starts with, using genetic approaches to identify new targets. And once those druggable targets are, you know, validated in some way, it moves into the preclinical space. And so those preclinical model systems typically generate lots of omics data and we, use that omics data then to push the programs into the clinic. And so if our therapeutic then makes it into clinical studies, then biomarkers in omics data, are used to enhance our understanding of the patient populations and the disease itself. And then what we've learned in the clinic, you know, ultimately can come back to the top here and and inform our selection of new targets again. So if we then look at who is doing this work, at Pfizer, we have a group called integrated biology, which actually spans all of our research units. And so each research unit has an embedded computational biology team. So I'm showing, the the the research units that are part of integrated biology here. It includes oncology, inflammation and immunology, internal medicine, vaccines, drug safety, biostats, and then our group, machine learning and comp sci. So, you know, this is a pretty big topic, and and it's one that we've thought a lot about at Pfizer, which is the promise and reality of, artificial intelligence in in public's data. And so a few couple of years ago now, all these groups in IB, got together and we started thinking about, like, what's gonna be our path forward to to benefit from from these new large language models and the agentic capabilities that that come with them now. And so I think everyone has a similar concept of of what this could be, this sort of desired state that we're gonna get to, where a researcher can ask a question and that question can then trigger an LMM based workflow that has access to data, access to tools, and they can return synthesized results, that can lead to to new insights. And then as that keeps moving forward, then you get to this this next stage where you can actually use these models to basically go straight from your experiment to to new insights. All that is, you know, for a company the size of Pfizer, much more, easily said than done. So before we were gonna get to the state, we needed to to, like, review the ways in which we're currently registering our experiments and tracking them. Where's our data? How are we accessing it? What are our storage practices? Where is all of our compute running, and what methods are we using? So for that, we began this initiative, and to make this into a reality, we started to invest in four key areas, and that includes generating annotated data at scale, data and creating a nice data and analysis environments, providing sufficient computing power, and then attaching to it all of these fit for purpose AI models and methods. So we call this project iFOX, which stands for Infrastructure for Omics Across Functions. And what this group is, it's actually a very tightly knit, group of both our r and d, colleagues and our information technology colleagues in digital. And it only works if we have buy in from both sides because we need this infrastructure to work for for this type of type of project. So it is helpful though to give you a picture of where we started with all this. We had a very heterogeneous landscape across all these different research units in terms of where our data was stored, where, like, the the actual storage solutions for each of these. We had lots of different ways and platforms for people to access compute. We had multiple HPCs, cloud environments, people running work on Windows versus Mac. Our pipelines, we had very few joint efforts across the research units. We had no joint standards. And lots and lots of, like, shiny apps that probably are not really working anymore. So this is gonna be very difficult to bring together into any sort of, like, common framework if we're gonna start making our data available for for these types of, agentic tools. And so we also realized, like, you know, what what do we wanna get to? What's the future state that we're trying to get to? And, essentially, it's just making time for science. So getting rid of all of these, you know, blockers that are preventing people from getting their data quickly, getting access to the compute they need, the, literature and sources, and really just make time for science. So one of the first components of this is organizing our data and pipelines. So making our data easy to access and share, having that data be well annotated with curated metadata, having an up to date source of reference data, and standardized pipelines that are shared across the groups. For our compute and AI, we really needed to get this, set up for scale. We also needed to make it easy for people to get access to different, machine learning and AI models, make our interactive workspaces very configurable, and then also set up a solution for batch compute. But we also wanna turn this into, like, a community effort where we're actually trying to make our IB community stronger together by learning with from each other. So training and development of all our colleagues for the latest and greatest tools, providing documentation and support, and then also standing up some self-service infrastructure that they can easily go and and get things started. And so this is sort of our vision of what that looks like. So we we think of it in terms of going from sequencer to insight. And so all of our a lot of our omics data, you know, we're starting from some type of sequencer. And now as we move into cloud, we need to make it much very, very simple for the data end up where we're gonna be accessing from. So all of our sequencers now are connected to s three gateways, so they all end up in s three. And that can get registered inside of our data lake house, which, is built on this platform called Lamin for these different types of datasets, sequencing, genetics, proteomics. It's all stored there in in the cloud. And then to to attach and get that data, we need our compute solution, so batch compute on AWS, our interactive analysis environment, and then our pipelines, and then these downstream visual analytics tools. So, think of it like a, platform of apps for, interacting with your data. And so I wanna start just by going over what this data lakehouse is because it really does underlie a lot of the work we're doing. So this is an open data framework for for biology from Laman Labs. And the way we think about it is, Laman is to data as GitHub is to your code. It's a way to basically track and trace the full lineage of your data as you're working on it. So I'll go through some of these examples of of its capabilities. So you can directly attach the inputs and outputs of any transform, which is, a Python script, an R Markdown file, an Nextflow pipeline, etcetera. It's a very generic, abstract way of viewing code that you're running. And any any dataset that's coming into that transform and coming out of it is gonna have a full lineage of who did it, when, and then all of those artifacts can be, versioned as well. And it's something you can actually work with interactively. So inside, like, a Jupyter Notebook, for instance, you could load the the Lamin package along with the Biont ontology and actually start connecting to an instance. So in this case, you are no longer blocked by s three policies or different file systems with with different access control. All you need is your API key, and then you have access to anything that's in that that data lakehouse. And once you've connected, you can also search. So in this case, you could search for a given tissue type and find any artifact that's associated with that metadata tag. And then immediately, actually, instead of just, like, downloading the file and trying to load it, you can load that object directly into a a, Python variable and start start working. This is what the front end of it looks like. It's makes it easy to, attach metadata, in a controlled way, for all of our relevant entities and ontologies. And then you can also search. It's got a nice faceted search. So if you want to search for any XAR file that's for a given tissue type or cell type, you can you can find it here. And then easily get the unique idea of that artifact and pull it into your your analysis. So, yeah, gonna move on to then something that we, I assume is a common problem because it's very common for us, which is getting people to attach the metadata to their experiment. Generally, this needs to be as easy as possible. And so Rebecca Weiss, who's actually here today, she's been working on this tool. At the time the slide was made, we called it register agent. Now it's actually called the laminator. I think we're still deciding what we're gonna call it. But, this is an easy way to basically create a set of schemas that the that people can then, use LMS to correct and quickly get, like, a sample sheet set up for one of their their pipeline runs. And the the underlying back end of this is actually flexible enough that it could work within an interactive analysis too. And so you just use your defined schema, and then you can curate the metadata that's going back into Lamin, and then we have a more harmonized set of metadata, people can use. Pipeline strategy looks a bit like this. We do leverage, as much as we can from the open source community. The reason we do that is because there's a lot of good standards and best practices going on, especially within NFCore, and there's always new innovation happening within these these repositories. It also lets us dedicate more time to working on the pipelines we have to make, from scratch and give us a bigger breadth of pipelines that we can supply to the R and D groups. And like I said, this is a partnership between R and D and digital. And so the way we've set up our relationship is r and d is responsible for figuring out what pipelines we need and how there's how they need to run, and digital is in charge of setting up the infrastructure that allows us to to run them. So this is all running on AWS batch, and, yeah, it's been running on the secure platform. And to kind of bring it back, this pipeline, into Lamin, we can actually use Lamin as a way to set up our pipelines, register our data, and then the pipeline becomes a transform in which we can track the inputs and outputs. We actually tested yesterday to see if the new Nextflow lineage, works with this, and they actually, work together pretty well. So that's, pretty cool. And this is, the output from one of Chi Chi's pipelines. He's also here today. He's in charge of all of our functional genomics pipelines. For interactive computing, we've been using the data studios from Secara, which have been very great. We we did a comparison among a number of different, interactive platforms. And the thing that really drew us to data studios was how configurable they are. So what this lets us do, one, beyond letting the the researcher take the resources and attach data that they want, we can actually provide our own containers. And within those containers, we can provide access to, all the different Pixie environments for the topics that the researchers are working on, each of them with Lamin installed in it. We also can ship these, curated set of skills for, like, GitHub Copilot. So there's a Lamin skill that anyone can use and talk to and learn how to quickly use this new tool, and then laminate their their data. So, yeah, it's been a very popular, platform for us. And then we also want a nice, curated set of reference data, and for that we have this iFOX database. So it runs on a, airflow deck, but the the reason we we set this up is everyone had their own client set up for connecting to these these public, data sources. And this provides a single entry point where you can you can access all this run this reference data and, access it through these R package or Python package called loop, and then serve it up into this, visual analytic platform called Evidence, which I'll show next. But, yeah, we have a lot of datasets ingested so far. These are running on a trigger, so anytime there's a new dataset uploaded, then that triggers the workflow and we get the latest reference set, added. And so evidence, this is our our visual analytics. There are a lot of apps in here now. This is with our partner, DataVision. They're based in Austria. And what you're seeing here is a single cell, viewer app that actually Rebecca also worked on. And it's, there's a lot of apps in there, pharma pipelines, checking to see, like, what all the different clinical trials are. Yeah. There's there's a lot of evidence. We have a lot coming up still. There's always more to do. But, yeah, some of them include this more MLOps. We have a beta version of our machine learning model registry that's available now for people to just access any of their hugging face models that they're interested in that have been approved by legal, and some work on setting up a regulated environment at TRE. Always new pipelines coming in. With LEMON, we're trying to set up a just sort of like get up with code, a pull request for data. If you're working on, like, a reference set that someone else a reference dataset that other people are gonna use, then there's a way for the owner of that dataset to approve any changes to the that artifact that you're you're using. So, yeah, we have a lot of lessons learned. We've been doing this for about a year and a half now. Adapting adopting new systems takes time for a community. You really need documentation and tutorials, but nothing replaces face to face mentoring for a lot of these things. And, really, the research and IT organizations for any of this to work, we have to work closely together. And our data products, we see as living assets. It's not something you just set up one time and walk away from. You gotta keep improving them as as you go. So, yeah, a lot of people involved in all this. Two members are here today, so if you'd like to talk to them, Shiji and Rebecca are right over there. And we'd like to thank all of our partners, Lehman, Data Vision, Data Intuitive, Seqera. You guys have been very, very helpful and really appreciate everything you've done. And I just want to end by saying we are hiring. So if there's anyone who's looking for a data engineering or pipeline engineer role, please reach out, take a look at the description here. It's not officially posted yet. So once it is, then, you know, you could find it there. But please feel to reach out to me, early and let me know if you're interested. But thanks. Thanks very much. That's a fantastic talk. It's a very comprehensive ecosystem you have. It's really impressive. Any questions? I have some. Oh, go ahead. I'll take one even though I've got the mic. First of all, I'd love to cast my vote for Laminator. That's a way better name than registration. But the second thing is, there's been a bunch of changes in the Nextflow language, particularly the workflow outputs and the data lineage. Have you explored that Nextflow automatically registering that metadata in laminate or, as an automated process? Because I agree that getting people to annotate that data is the silliest but the hardest part of this whole thing. So we haven't I mean, I wouldn't say not not quite, within the pipelines themselves. But, with the lineage tracking now inside the the the language, it actually adds more information than we weren't getting before in terms of, like, the the modules that were being run. So it's it's a cool, addition, I think, to what we have. But we still need this, standardized set of terms that that we have to enforce on our our metadata. Otherwise, it's gonna end up, you know, way way too distribute like, too much variety and too hard to actually, like, integrate different datasets across. And and so we do really rely on that that upstream step of registering your your experiment first. And and then from there, that metadata can propagate into the downstream data products. Yeah. One of the things we built into Lineage was the ability to add kind of custom labels. And it was a bit of a kind of stab in the dark because we weren't quite sure how people would use it or what people would use it for. But I had always always envisaged people passing in kind of custom metadata and then it being tracking then through the interest. I mean, we're open to that too. So more metadata is good. Just a second, Sai. I just see that you are a cloud native. So I'm wondering that you run this everything on the cloud. And then how about the computational cost for that? Great question. So, yes, all of all of these these capabilities and platforms are running on AWS right now. And so the cost is, like, you know, we we're shifting from, like, an on premises HPC computing to off prem cloud. You can you can save money by doing a cost savings plan upfront, and then the actual money you pay because of that upfront savings plan is is a lot less. But, yeah, I mean, we have to monitor costs all the time. And I I'm always looking to wondering why people are using resources that they really need and keeping track of that. Because, yeah, cost becomes something you actually have to be much more active about. One question for me. There's there's two things you touched on, which are also topics close to my heart. One is shared reference data. I mean, we heard from, the talk yesterday with GSK about using shared genomes, and that's a problem that I think every organization has. So could you do do you have kind of for for reference genomes, for example, do you have one kind of consistent set for the whole company? Or Yeah. Actually, and Tatiana, who's in our group, has just finished a new Nextflow pipeline that we use for making custom genomes or anytime there's a new reference genome released, that workflow will get run. But we often have groups that, you know, have very special, model systems for their for their mouse genomes. There's a new gene added or if it's oncology, it's a patient derived xenograft. So getting all of these these different flavors of reference data added can now all be tracked with this genome manager pipeline. And then those automatically get registered in lamin So then it becomes stays the source of of where those genomes are found. Fantastic. So, I mean, I I set up, shared AWS, my genomes resource, like, years ago. And I've been desperately trying to get rid of it ever since because and we have a project in NF core, which is, still ongoing as it it's called NF core references. We have a references channel, which maybe have overlapping, ambitions there. The idea is you have, like, a meta a YAML metadata farm, and you rerun the pipeline whenever you change it, and it regenerates any assets. So it'll be I'd love to compare notes on that and see if we can Yep. Sounds good. Yeah. Okay. Thank you. Thanks very much. Right. Last talk of the day, before before the panel, I'd love to welcome on stage Wolfgang. Wolfgang is a principal product manager at Microsoft, and and is probably sounds like the man to go to for all our storage needs. So I'm sure it's something we're talking about cloud cost. It's something that's very close to in the hearts of everyone working on cloud. So welcome. Thank you. So first of all, thank you, thank you to Secara for hosting our session, and also thank you for all the attendees for being here listening. I'm Wolfgang. I work in the Azure storage engineering organization. I follow as a product, Azure manager, Luster as a product manager. But in general, whatever is an HPC workload is something that I tend to influence on Azure in terms of storage products. And in my experience, I'd also had the opportunity to interact with the many of, large scale customer running HPC workloads on Azure. And, some of the many of them are also working in the genomics domain incidentally using Nextflow. So the idea of this session is to share with you some perspective and high level learnings that has been the one that made our customer being successful in the cloud and also helped them optimizing both the performance and the cost at the same time. Just a disclaimer, if you're not the users of Azure specifically, this session, however, from my point of view, even if it's talking about specific Azure product in some parts, still contains underlying concept that you can apply pretty horizontally, not only in the cloud, but even on on prem environment when you are trying to optimize your architecture and more specifically the storage architecture for your pipelines. Looking at the agenda, let's say, first of all, we will have a very high level introduction maybe for many of, the experts in this room. This is just a repetition, but it's just a little bit of level set about what does it mean moving data inside a pipeline, in terms of input data, output data, and then talking specifically on how this can be basically optimized. Another disclaimer I would like to say before we start is that this presentation will be talking a lot about the details that there are behind the scene in the Nextflow open source code, but many of these optimization and that orchestration are, in the case of Secara platform, carefully managed and simplified thanks to the work that Secara platform does on top. So you will see a lot of details to go into, understanding the real all the specific of the data movement and we will be referencing to the open source Nextflow code. Having said this, at a very high level, I want to go relatively quick here. Let's say, when we have a task, a process in Nextflow, at a very high level, you will have a script, the core of your, your task that will need some input file to perform some operation on. And, of course, there will be output file that will be the output of your final workflow or will be intermediate output consumed by all our process. Let's say that when you have a workflow that is composed to multiple process, you usually need to have an executor, which can have different forms starting from local to cloud executor that will need to execute in sequence and with the proper dependency all your tasks, being sure that for each task, the script is able to consume inputs and then is able to deliver the outputs to the subsequent stages or even as an output for the pipeline. Let's say that at an eye level, if we want to visualize a pipeline in a serial way, of course, we know that the this is a very simplified case of sequential processing. The main idea is that there will be a continuous, consumption from the different task of input file and finally a production of output file in a working directory, which will be consumed by the subsequent task in a workflow. Let's say that you can see that there are also scratch directory options, and all of these sum ups in saying that for every specific executor inside Nextflow, there are basically what are defined file copy strategies. File copy strategies that basically determine how the data are made available to the task and how the data are eventually with the specific staging scenario, as we will see in one of the next example, are staged for processing in intermediate storage like local attached SSD or local attached NVMe to get the most in terms of performance, but, of course, with a staging and a stage out burden that you need to balance. Having said this, looking at the specific on Azure, the executor and with the relation of their file copy strategy are usually divided in two big categories. Batch executor. Azure batch is a managed service on Azure, which is able to orchestrate high throughput workloads using a specific Azure REST API through which Nextflow integrates with the Azure executor. And this executor is having a specified copy strategy strategy relying heavily on staging. Again, we are talking about the open source case. We are not talking about what Sepera platform does very nicely, for example, with Sepera Fusion. And then we have all the category of, the orchestrator that have a simple file copy strategy, where there is more reliance on a shared POSIX file system to be accessed by all the nodes in the task as the entry point. In this case, we have the classical HPC scheduler, like Sloane PBS, but even LFS, that on Azure are usually managed with Azure Cycle, which is the service that sitting on the side of a classical HPC scheduler is able to infuse flexibility inside the infrastructure based on the load. And there is, of course, the Azure Kubernetes services where you can integrate with the Kubernetes executor. But going into the details, why let's say we are looking into this because depending on the executor, you have different possibility or now you can optimize the cost of the performance of your infrastructure on Azure. But this is generally true also in other scenario. Usually when we have a customer and this has been my personal experience, there is a strong shift in mindset that needs to happen to find value and to deliver value through the cloud, which is the underlying concept is that we come with a mindset. I have this computational budget, how many petabytes or how many cores I'm able to purchase. When we are able to shift that mindset to say, what is the highest amount of science that I can produce from my computational budget, which is finally what a scientist wants in term of added value, because nobody takes any value in owning cores or owning petabytes. The real value is basically what you can deliver in terms of science. Thanks to that budget is where, of course, we go in the direction of success on a cloud architecture. Of course, this is a super high level conversation. There are details that needs to be discussed in every specific scenario, but this is an important concept. And here comes in all the optimization that you see on the slide. When you have that addressed, usually it's super important that you have a correct assessment and inventory of the data so that you can decide the specific tier in the cloud that gives you the lowest cost for storing this data depending on their life cycle. And at the same time, when you do data staging that we've seen in several of these scenarios is something that happens, You need to optimize that, understanding the trade off and the inefficiency it can introduce. Going ahead in terms of networking, also networking is something often neglected in the cloud. And there are performance consideration and cost consideration in what happens between the compute resources and the storage solution you use in the cloud. And last but not least, as compared to an on prem environment, in the cloud, you can have tens of different possibility in terms of products and tiers inside a single product that needs to be mixed and matched to get the best in terms of cost and performance for your architecture. Making some example, when we talk about Azure Batch, Azure Batch implementation in Nextflow is, again, if we consider, basically, that we're using the standard open source code, but Secera Fusion is able to streamline this even further. It's super Azure Blob Storage centric. Azure Blob Storage in the standard tier is the most efficient TCO storage that you can have, but you need to be aware you have an object interface. So you will need basically to stage in data in local NVMe during processing and then to stage it out. And, this can be used and is also the default for the working directory in this scenario. There are other solutions like Azure File Share that allows you that can be used at least for input and output mounted directly to the Compu nodes. And as we said, there are local ephemeral disks that are used heavily in terms of staging in this scenario. But you can see here, for example, that for every step of your pipeline, you will need to stage in the data in the local SSD if you're not using a Fuse driver. And that will, of course, depending on the data usage of your pipeline, may have different cost and performance implications. If we look at the other domain that we talked about, where we go in the direction of the simple file copy strategy, In this scenario, you have better, or not better, but you have basically a different approach to the to the storage in the pipeline because you can rely on a shared working directory that is a POSIX compliant file system, while object storage is usually still used for input and output, but usually not used, at least if we are still talking about the open source part of Nextflow, is not used as a working directory. This means that in this scenario, you are able to plug in inside the inside the storage option different solutions that are shared file system that can go from super high performance file system like Azure Manager Lustre to high performance NAS like Azure NetApp Files and even in plugging in Azure file mounted in NFS and so on. So all of this, as you can see, has a broad spectrum of different possibility that needs to be carefully tuned for the specific scenario. But other than these architectural examples, let's go effectively in what does it mean optimizing in several of the scenarios we have seen the cost and performance in a pipeline. Many in many cases, we have seen customer that, for the input file, they are staging in for each of the pipeline, they run sample externally from, basically, the cloud environment or they are reading from a Blob Storage account staging in several time the same file, going to the direction of optimizing that transfer for costs or reducing transaction or eventually if you're using private endpoint, the private endpoint cost on the storage account, using solution like that are file based, like, for example, Azure Manager Lustre, where you can store these files that are basic that are used by every pipeline in a way that is performance and cost efficient is usually a good success. Another example is the use of staging. Staging in some cases is mandatory, like, in the Azure Batch case, but in other cases, you can trade off depending on the IO pattern or the processing you do on the data, if it's worth taking the time of staging in the local NVMe for with the performance gain that you can get, or it's better that basically you use a shared file system. And this goes in the direction of the working directory. It's always a trade off in understanding if object storage is really what is giving you the best issue. And even if you need to deal with the fact that many of the scripts and the tools and the application that run-in a pipeline need a fuse layer to interact with the object storage, it's something you can live with or introducing a POSIX file system that is flash and that is shared among the nodes gives you a trade off despite the cost in terms of the performance you can get in in in uplift. We talked a lot, and this goes goes in the direction about the last two points. We talked about this in different session, for example, being having an inventory and having a lineage of your data. So being always sure that the intermediate artifacts that you have in the working directory are kept only for the time that are strictly needed for your troubleshooting or eventually for your scientific research, but at the same time, go in the direction for long lasting data to use the best tiering option that you can have on the on the storage solution. This, for example, on object storage on Azure can go defining life cycle policies or using smart tier that automatically manages the tiering at the best effort base, basically, on your data life cycle. All of these are things that we have seen and we needed to optimize on every case and on every pipeline that we see for our customers. And, it is something that, of course, can reduce the cost of an enormous amount. Going into Azure specific, one thing that we have seen happening more and more is a storage paradigm that we define core storage and accelerator storage duality. So some of the Azure storage solutions have the possibility to be POSIX compliant file system that have SSD performance to be at the front end interacting with the compute, which are seamlessly integrated through the core storage layer, which is usually object storage. And when I say seamless, means that you can see from a namespace and metadata perspective, a continuity between the two with Azure storage movers that are able to move data to Blob Storage and eventually, for example, even in the case of Azure Manager Lustre, where you keep having direct access to the Blob Storage, you can go down, not only tiering from Azure Manager Lustre to Blob Storage, but you can go even in the Blob Storage to set up the tiering that we talked about. So you can have the cold, you can have the cool access tier, and eventually you can manage that depending on your needs. All of this has a great value because allows the users to optimize the layer above in terms of sites and performance, just to deliver what is needed in the working data set, while basically then tearing down to object storage, whatever, as a longer term retention. Another aspect that is very, usually also pretty neglected, but sometimes it's very significant is, how basically do we connect from a networking perspective to resources? Azure Blob Storage account have the possibility to go into the direction of being multi region, meaning that you can access the same storage account multi region and you can access the storage account with different endpoint means. For example, if you don't really need a private endpoint for your security and compliance and a service endpoint with VNet integration is okay, that gives you the possibility to avoid extra network cost that you will have in that scenario. Similar cross region transfer. Again, you need to understand if it is a one off or it's better to create a copy in additional region to avoid for performance and cost reading from another region. Other things that always in performance are relevant in other scenario, like, for solution like Azure Manager, Lustre and NetApp, is being sure that accelerated networking and hardware offloading is enabled and at the same time that you try to co locate in the same availability zone on Azure, the data and the compute in a way that gives you the lowest latency. All of this seems to be very complicated and overwhelming, but one of the things that worked more and let's say work very well with our customer is the monitoring that comes for free in many scenarios in the cloud that usually start to get on prem. You can have a full telemetry of what is happening on the storage, a full telemetry of what is happening on the compute, And you can understand if you have a bottleneck and where you are oversized and when you have you are undersized. If you're not hitting the if you're very far from the throughput limit of your storage, you can evaluate if you need to optimize that. So wrapping up the session, I would like to leave with some key concepts that are, first of all, there is no solution that can be one size fits all. So depending on your workflow and depending on the different scenario that you're managing in your organization, you can have an architecture, but there could be deviation that you need to do for the specific cost and performance optimization of each of the pipeline you have. This goes both in compute and storage. Tuning is always essential. So you can do, of course, an A level tuning to get your project going, but if you don't keep reviewing that and you don't keep basically validating how your workload is changing and now needs to change the infrastructure as a consequence to that, you will for sure go outside the boundary of the best cost and performance optimization. And last point, the granularity is always the key. On prem, we are used to say we have one or two storage solution at most in many scenarios. In the cloud, you need to deal with the fact that, to get the most in terms of cost and performance, you need to have a paraglum of solution that you need to plug in and manage in a way that, you can get the most out of, your budget in terms of size you can produce. Thank you so much for your time, and, I'm really happy to answer if there there are any questions. Thank you very much. That was a really comprehensive talk. I was sat there kind of scribbling notes and thinking maybe we should just put this YouTube recording on the training portal for next week. We've got maybe time for one question. Otherwise, I've got one. Obviously, there's a huge amount of flexibility and ability to customize and different options that can be chosen here. You said yourself it it could be a little bit overwhelming. On kind of the overarching theme of of this particular summit, is there any way that AI can help people make these decisions? Absolutely. We one thing that we've seen and is amazing in my point of view is that, we have many MCP servers. One of them is the Azure MCP server that allows you to connect directly to the Azure control plane. And for example, you can tell a specific job or a specific time frame in which you want to ask him to analyze the resources and what he would suggest in terms of optimization. And we have seen very impressive results. So sometimes he's doing way better than me. I can tell you honestly that. So absolutely is something that we have seen happening and we're also suggesting customer to do. Thanks very much. Thank you so much. Okay. With that, I'm gonna welcome, Evan onto stage. And, for the final part of today, we're gonna have a really exciting, panel. But, Evan, I think I'll let you do the introductions if that's okay. Awesome. Hey. Thanks. Hey. Thanks a lot, Phil. So we're gonna have a discussion. I'm not sure what time we've got. Maybe, I think, maybe twenty, twenty five minutes or so, and then we're gonna wrap up. So today, we're gonna have a panel, which is gonna be discussing agentic bioinformatics. And the idea is to really get some different perspectives of folks, what they're and really just try and learn a little bit more about what, what people are doing. So first background microbiology and biochemistry. Awesome. Thanks a lot. Yes. Hi. I'm Maria Derezita La Inje. I'm working at CEDS for almost five years, and I'm working on AgentiK AI in bioinformatics, but I also have a background in developing pipelines and, running bioinformatics analysis. Amazing. Hi, everyone. My name is Isha. I have been at Secura for about three, three and a half years. I lead our scientific engagement team. We're a team of bioinformaticians working closely with pharma and biotech teams really across pipelines, infrastructure, and now nearly AI and making all these kind of three components work together. So Awesome. Great. Well, I can just start off with, first question that we have here. And this is really around, thinking about the day to day, work that is occurring in terms of what, like, bioinformaticians and computational biologists, are doing. How are how are you seeing their work change, sort of day to day? Well, when I think about when I started in bioinformatics in the late nineties, they didn't even have bioinformatics programs, and it was all kinda self taught. I started in Pearl, of course, like everyone back then. And today, if I had these same tools, obviously, my trajectory would have been very different. Yeah. It's it's amazing how quickly you can, you know, get almost, you know, production ready, pipelines and capabilities. You know, it's, yeah, it's extremely exciting. Yeah. Awesome. Yes. And I feel like the role of the bioinformatician has changed a lot. So from being, like, a hands on executor, now it's, like, mainly an orchestrator or an architect of solutions. But I still see that the role of our informaticians is really critical. Mainly, like, first, they need to define, like, a relevant problem and a relevant question that they want to answer. But then, you know, also setting up the tools and setting up the environment and with their domain expertise, seeing what would be necessary. And then also, it's really needed that they review the outputs and with their domain knowledge, they can validate them. Yeah. It's it's interesting. I think the role of the bioinformatician computational scientists now, as we've seen from the talks, span such a wide surface area beyond just pipelines. It's the infrastructure. It's the tools themselves using containers. And, as that surface area has grown, it's gotten a bit more difficult to wrangle all of that and tackle problems that come from each when you're just trying to get to the results. So I think AI in a way has made wrangling all those components a lot easier. So less time thinking about those problems, more on orchestrating AI to solve them, and then spend more time taking a look at the results, which is where the humans are really needed. I think if we think about the those problems more broadly, some of them can be classified, like, quite, in a kind of similar way to code. And, like, the last four months has been huge, advances in terms of, coding. What are some of the tasks that are maybe not so shaped like coding tasks? What are the things that scientists have to do day to day that are that are not necessarily sort of well adapted for, the kind of, codex, you know, Claude code, like analysis? Yeah. Well, as we know, biology is really hard, and often not exactly deterministic. So I think any kinds of interpretation, are still a little bit, there's some improvement there. And, obviously, currently with the tools, there's still issues around hallucinations. And so, you know, you you wanna trust the outputs, but you need to verify. And so And And how are folks thinking about in terms of, like experiment tracking here? Because we've seen a little internally when you can run 100 pipelines in parallel or how are you how are folks sort of managing all of the, you know, the the the huge increase in in ability to execute? Yeah. That that's certainly a problem. Now, you know, now today, execution and coding is the the cost of that is zero almost. And so I think people do struggle with, this, you know, this fire hose of, data that you we can generate today. It I think if you have good practices like, if you certainly, if you came from a software engineering background and a reproducible research background or adopt those, practices, that helps. But, yeah, certainly, the secure platform, could help with a lot of that managing it. Yeah. That's why I think, like, it's really important that governance is at the beginning of the process and not at the end. Because if not, you end up, like, running a lot of results, and validation takes more than running the results. And maybe, like, running those results would take less than evaluating all of the amounts of the results generated. So that is something we need to be careful. And, yes. I think we've heard a lot this week about almost the ability to verify. So that if you really know what an output was I mean, Phil showed this earlier. And if you can kind of tie all of your, experiments in a in a way it can be very quickly verified. Is that helpful? Yes. I feel like it's difficult in biology to define correctness also. Like, it's not the same as in software development where you would run a code and get a result and you would have, like, test datasets and everything. In some problems in biology, it's more difficult to define what is correct. Like, for example, if you would add a module in a pipeline, then, you have already, like, the test and you have something to verify against. But for example, if you're trying to include, like, a new tool, where you don't have anything to verify, well, that's the difficult part and that's where human are really relevant. I think, Becca, I really like that example because in software development, you write a tool, you write an application. If it doesn't behave as expected or it crashes, you're able to go back and, I guess, now with agents in AI, pinpoint exactly where that might be coming from. And you also then include unit tests to ensure that functions are working as expected. But the I think the ground truth in science and biology is so much fuzzier than that. So taking a look at the results of an analysis or a workflow and trying to pinpoint why it's not looking the way you expect it to do is not as easy of a a solution to tackle with with AI and agents. And if we can free up more time for, say, computational biologists or scientists to focus on that part rather than fighting Docker container issues, that's that's kind of the value you wanna unlock. And and what are you seeing in terms of the ability to to bring in tools and maybe take something which is like something you try out once, like an AI tool, and sort of bring it into, organization and sort of the changes that are required there? Well, there's certainly lots of dabbling that I'm seeing happening, which is great. You know? Shadow IT kind of Yeah. Shadow AI now for sure. And now I feel I have a lot of empathy towards IT departments, because I think they're now facing this issue of how to govern and review, and and there's all kinds of issues around security. So, yeah, I I think this you know, AI is just accelerating these things. So I think there's still good, techniques and practices that you can follow through and establish even if you didn't have AI. But it's, yeah, it's a really exciting time. Yeah. I mean, you mentioned ad hoc analysis, and I think that is something where AI is bringing another value because people who weren't able to run an analysis, like, for example, our biologists are now able to do it. And in terms of bringing that into the company, well, that's kind of a separate question, because it has to, like, have governance. It have to meet with compliance. I've seen mostly that sometimes it's more difficult, for example, with all of the bioscientists and the coscientists that are coming. Sometimes it's more difficult to use them to connect them to internal tools than maybe using a more general general system and that it's already approved and it's already in the in their company and connected to their internal tools. So yeah. Yeah. There I think it's it's nice to try out new tools in sandbox and your your local environment. It gets really sticky. And what we've seen with folks that we've been working with is when they start to then roll out some of those solutions across the board because, you're now touching several other systems beyond just workflows. It might be, LIM systems, ELNs. You're touching identity providers. You're touching audit logs. So, and there's also teams behind those systems and humans behind those systems. So trying to understand how to get those in place and then making sure everything is also still governable, I I don't think I've seen a clear solution to it yet. I think also card rates are really important and having human in the loop to approve or disapprove, like, these these days, if, like, these decisions, it's really important. And also having, like, sandbox environment as you mentioned. Yeah. Yeah. And it's also really easy now to, you know, deploy really functioning functional, tools or apps. And then but you don't think about support and adding features and, and that's something I'm seeing a lot of for sure. So I think, related to that, are you seeing that, some tools become sort of more valuable, as it relates to like some new AI era where things have sort of connected by our APIs more? Are you seeing sort of where folks are putting more value on that in terms of a decision buying decision, you know, across flagship? Or Yeah. Well, certainly skills and MCP servers and things like that are really powerful. And then I guess going back to that our previous question or thought, for sure, I think when you think about deployment and enterprise environments, what Secara is building and other similar solutions, I think that's where you wanna kinda lean into. And so you're not recreating the same app in a couple different groups, and, you have some a better governance layer and and sort of, support for organizations. Yes. I think actually, bioinformatics and more like n of and, of course, has done with their pipelines that they follow, like, certain standards. They all have tests, and they have a lot of documentations that is making it easier to develop skills for authentic AI systems. And, yes, I feel like it's a matter of more working into, like, the skills and connecting it to the tools that are important to solve problems than the authentic system itself. Yeah. It it sounds like, AI agents will work best in structured systems where there's guardrails, there's governance, auditing, and then but also access to tooling beyond that, and standardization that we've been building on for years to the end of core community. Do do you have any kind of, maybe outside the box, things that have really worked well or something that you've applied AI to, that maybe if you were not necessarily thinking in the room, it might be something helpful? Well, I don't know many people who write like writing documentation. So but it's, obviously, been very good for that. And certainly debugging, like, I've used, you know, Secura AI for debugging some pipelines in the cloud. That's been really, really powerful. Yeah. I've also seen a lot of value in maybe creating datasets, or creating metadata. So there's a lot of value there because that is what it also what it's also going to use the authentic AI system. So it's important to have a strong foundation in the datasets, metadata, and tools. And I've seen, the LLMs and the authentic AI system improve that a lot. Definitely documentation. No one wants to write documentation most of the time, but in a way that, allows us to surface a lot of, like, the tribal knowledge you're building in teams. Like, LLMs make it really economical to capture a lot of that, but then also be able to search through it. So almost building like knowledge knowledge bases internally for experimental design, analysis, things of the sort. I know it's, Sekiro, we've we've built a little, several folks have been using Obsidian, which is a a kind of like an a markdown. Maybe you wanna describe it a little bit more, Isha? Yeah. Yeah. It's actually, on the same vein. We've been using Obsidian almost as our kind of second brain. It's where we dump all of our chicken scratch notes and, how we're thinking about things. And, because they're essentially just mark down files behind the scenes, it's a great way to plug in LLMs and be able to search across our thoughts, across different categories of development, how we're talking to teams, in a way that doesn't introduce a lot of mechanical overhead because we've just been doing that already and keeps our workflows the same. And how is it, how are folks thinking about evaluation within organizations? I mean, we've we've seen some, you know, the prices of some AI specific tools, you know, get very, very high there. Is it is it kind of traditional sort of ROI? Have you sort of been part of any of those evaluations or or seen them play out? Yeah. Right now, for sure, like, I think, OpenAI had an early advantage with, you know, chat GPT. But, obviously, we're looking at Claude and using that. So I think we're still at a phase where people are it's very fluid. Mhmm. And and tracking those costs are sometimes, you know, whether you're using, you know, Bedrock for, for for doing some of your work or cloud code or, codex. We're still trying to get it wrap our head around that overall investment. Any kind of, like, AI specific software that's sort of outside of models? Any, specific. Outside of models, it's a good question. I think, certainly, just in day to day activities like you talked about with Cilidion, we we also use Notion, which is a similar platform for just, obviously, transcribing, meeting notes, and things like that. That's been a huge help as well. But, yeah. Yeah. I feel like the tools and what they use varies a lot across companies. There's not, like, a standard yet. But what I've seen that they take into account whether or not to include that or not is mainly, like, first, the usefulness. Like, is that going to be useful for something or, like, what problem it's going to solve? Then the consistency, like, making sure that when it runs the analysis or whatever it does, it gets a consistent result. And you get, like, all of the provenance and all of the logs and reference backing results. And then, I guess, also, governance is important. So having guardrails, including human in the logs for important decisions, having noise. And, also, I think that something important in the implementation of these tools is, having a system where there's ownership. So, you know, who's, like, owning what. So there's someone maintaining the tools, someone maintaining the system. So I feel like all of that is really important in the adoption of new tools. Yes. Yep. The it's interesting you mentioned kind of ownership because there's, like, a hidden cost with using, like, AI and agents, code creation, and generation is extremely cheap now. So people almost start solving the same problems, but with similar solutions. And maybe we, lack the same discipline that we have with handwritten code to have ownership, make sure you maintain it. So just kinda token consumption for solving the same problem over and over again definitely can pose an issue. One way we're trying to address that is we have, like, this internal shared repo, across the portfolio. And so we're we're contributing skills in a repo and other tools, so making that aware. And I think that's a little bit helpful, to address that issue. Yeah. I'm not sure how much visibility got into some of the larger, you know, farmer organizations, customers, etcetera. But, what I've seen is it's kind of very varied in terms of adoption and and just the kind of tooling that's available, to scientists, in those larger organizations. Is that something you've seen as well or, like, or any thoughts on where this is gonna go? For sure, there's, like, early adopters, like, people that are, you know, when CHAT GPT got out, you know, they were using it within days. And then there's probably a long tail of people skeptical. Certainly, more experienced software engineers in the early days were definitely skeptical of of these tools, but it's it's clear it's gotten much, much better. Yeah. I I think it's like a typical thing in any, like, adoption of new technology. It follows that, you know, that same pattern. So Yes. I think that, something that it's a problem with adoption, I guess. It's probably, first, that people need to be trained and need to be informed of what the tool does and what they can do with it. And then I think that there's a lot of trust issues as well. So, I guess, trust, for example, begins with knowing, like, what you don't know. And we've seen a lot with some AI systems that they maybe try to answer a question where they don't have any data or any tools that would answer that question. So I think something that would help in the adoption of the systems is also, like, work on that trust. So being able to be transparent, not only with what you don't know, but also with what you know. So being able to have, like, show, like, the versioning of the pipelines that you are running, show the parameters that you are running with the tools, show the different thinking method, what steps they are running to solve the questions. Yeah. In a lot of the teams that we work with, there is generally a concerted effort to start implementing AI. It's, unclear exactly tactically in what specific area, but I think the general strategy has been start with a focused part of, their teams or workflows, build that trust, understand what is governable, what is not, what is missing, justify that trust before going full steam ahead, is generally the strategy you've seen so far. So it's it's still varied. Yes. I think sometimes it also helps to maybe, like, start with just using, for example, AI to maybe assist you and then, like, go upwards. So maybe start with assisting you, then start, like, using it by supervising. So maybe just accepting or rejecting the plan. And then, like, after you are sure of the guardrails and security and governance, then maybe lean towards a more autonomous system. And I also think it's great to have evangelists throughout your organization kinda doing show and tell, and and that can really amplify AI capabilities. So Yeah. We have internally a a Slack channel called build with AI. Yeah. And it's just really, you know, highlighting things because there is so much, like, to learn to get inspiration from, you know, approaches that you haven't seen before as well. Okay. Does anyone want me to make any predictions this time next year, around the agentic science? I know it's very difficult. I feel like in science, where domain context is really important, it's not about like, even if there are advances in the model, their are the domain context on having the expertise, the biological expertise is so important that just the model doesn't solve the problems. It's not like in software development. So I think that maybe in the future, there will be more systems that are capable maybe of doing results analysis or hypothesis generation and having systems that can really, like, co work with the scientists. Yeah. So, like, really adding in the context, like, the biological context into Yeah. Into the systems. Yeah. Predictions are hard, but I I think actually, the cost the token usage cost will go up, I think. Wow. Yeah. I I I don't know. That's where This is like something Well, you're looking to those invest no. The you're looking at the investments and, you know, they're raising hundreds of billions. They're gonna wanna recoup it and then not gonna get it from subscriptions. Yeah. I mean, I mean, we're seeing that internally the, you know, the out token usage, anyway, has, really skyrocketed. Yeah. For sure. I don't know if I wanna make any predictions per se and put it out in the public forum, but I think my my general prediction is that there's a lot of conversations right now around the models themselves and which model is preferred for particular use case. I think, at the end of the day, it's gonna be the system that will win over any particular model in its own. This is the amount of access to a good base model with good reasoning and access to domain context, biological context, and then harnesses on top as well as being able to retrieve data and tools. Like, the system as a whole will win over any individual model. And I hope we the direction follows that kind of way. Amazing. Well well, thanks so much to our panelists. We're gonna wrap up, the summit now with, first thank you. Thank you. Okay. Check check on this one. Let's try this. So, I think we're gonna wrap up here. I've just got a few things that I wanna, cover. Firstly, starting off with the training that we had here. Talked about it earlier today, but, an amazing, you know, opportunity to bring folks in. If you've got people in your organization who wanna learn Nextflow, really being able to bring them to events like this, they can get, exposed both on the training side as well as the ecosystem. So would highly recom recommend that. Also on the on the hackathon itself, this is something we do regularly. They can we often have these as part of NF core hackathons, which are often online as well. So, yeah, you don't necessarily need to be in person. And then, you know, finally, to to all the folks who have been here, it's, it's been fantastic to spend the last days together. Learned a ton, got a huge amount of energy from the from this community. And, yeah, really given us a lot of ideas and a lot of things to think about. And, hopefully, you receive the same. Absolutely, yeah, necessary, partial selfie, which it has to go along with all of these. So this is a a picture that we had from earlier on. It's, you know, fantastic to get one of those down. And the the pixel art, we have a winner, Michael Soria. Hey. I got a prize here. I I got a I I didn't pick the it's secret. Can we get an explanation of how? Is it by hand? Woah. Impressive. Incredible. Okay. What's coming up? We've got some, some things coming up in the next little while, you may may want to know about. We've got a an Explo training, which is online. So this is gonna be the week of of May. We've also got, Bio IT World, which are gonna be, you know, a few folks in the area will be there. So looking forward to catching up there. We're gonna have a booth and, we've got a couple of talks there as well. There's gonna be a webinar, which may be useful for you sharing with your colleagues there, really going deeper into Securaco scientist, showing some of the capabilities there. We want to learn about that or share that with colleagues. And then we also have a Secara sessions, which is a typically a one day event or half day event of half training, and some some talks, which we're going to be putting on in Kendall Square on June 30. And finally, Doctor. Kirkland is gonna be in here as well as, as ISMB and also sort of BOSC there. So if you're if you're around for that, I know a few folks join there as well. So final thing I'm gonna leave you with is, don't forget to save the date. We have the Nextflow summit, which is, gonna be virtual for the summit aspect of it, in October, as well as we have the, sort of training and hackathon, which will be in Barcelona there. So if you wanna come for some, come for some October sunshine, you're more than welcome. We've got a just quick thanks to our our our sponsors, there. So from from ZS, and from, where has he gone? Oh, he was just here a second ago. Sorry. From the where's the what's the name again? It's okay. It's fantastic to have those folks there. You know, it really kind of helps us put on the event and and we're also, like, kinda joining forces with people who have got, you know, really guys the same same ideas as that. Also to the the summit team, there's a lot of work that goes into this. Goes on for, you know, for many months. They're bringing a lot of things together, a lot of work that goes behind the scenes. And then, obviously, during this week, a ton of people have have pulled this together. And, also the the folks over at AV, folks in catering, etcetera, I really appreciate that. And final one from me, like, really thanks to very all of you for being here. It really makes, really makes the event. People's been taking the time out of jobs or taking time if someone traveled from a long way away to come here, it's, it really makes a difference to have you all here and, see you in person. So with that, thank you all. And, yeah. I look forward to seeing you around. Reach out to us in the community as well. Always looking forward to, to catching up next. Thank you.