# Ants, Bees, Genomes & Evolution @ Queen Mary University London

Published: 10 August 2020

# Genomic architecture and evolutionary antagonism drive allelic expression bias in the social supergene of red fire ants

eLife, 9:e55862

## Abstract

Supergene regions maintain alleles of multiple genes in tight linkage through suppressed recombination. Despite their importance in determining complex phenotypes, our empirical understanding of early supergene evolution is limited. Here we focus on the young ‘social’ supergene of fire ants, a powerful system for disentangling the effects of evolutionary antagonism and suppressed recombination. We hypothesize that gene degeneration and social antagonism shaped the evolution of the fire ant supergene, resulting in distinct patterns of gene expression. We test these ideas by identifying allelic differences between supergene variants, characterizing allelic expression across populations, castes and body parts, and contrasting allelic expression biases with differences in expression between social forms. We find strong signatures of gene degeneration and gene-specific dosage compensation. On this background, a small portion of the genes has the signature of adaptive responses to evolutionary antagonism between social forms.

## eLife digest

Red fire ants (Solenopsis invicta) are native to South America, but the species has spread to North America, Australia and New Zealand where it can be an invasive pest. A reason for this species’ invasiveness types of colonies : one with a single egg-laying queen and another with several queens. However, it is not possible to simply add more queens to a colony with one queen. Instead, the number of queens in a colony is controlled genetically, by a chromosome known as the ‘social chromosome’.

Like many other animals, red fire ants are diploid: their cells have two copies of each chromosome, which can carry two different versions of each gene. The social chromosome is no different, and it comes in two variants, SB and Sb. Each ant can therefore have either two SB chromosomes, leading to a colony with a single queen; or one SB chromosome and one Sb chromosome, leading to a colony with multiple queens. Ants with two copies of the Sb variant die when they are young, so the Sb version is inherited in a similar way to how the Y chromosome is passed on in humans. However, the social chromosome in red fire ants appeared less than one million years ago, making it much younger than the human Y chromosome, which is 180 million years old. This makes the social chromosome a good candidate for examining the early evolution of special chromosome variants that are only inherited.

How differences between the SB and the Sb chromosomes are evolving is an open question, however. Perhaps each version of the social chromosome has been optimised through natural selection to one colony type. Another suggestion is that the Sb chromosome has degenerated over time because its genes cannot be ‘reshuffled’ as they would be on normal chromosomes.

Martinez-Ruiz et al. compared genetic variants on the SB and Sb chromosomes, along with their expression in different types of ant colonies. The analysis showed that the Sb variant is in fact breaking down because of the lack of gene shuffling. This loss is compensated by intact copies of the same genes found on the SB variant, which explains why ants with the Sb variant can only survive if they also carry the SB version. Only a handful of genes on the social chromosomes appear to have been optimised by natural selection. Therefore Martinez-Ruiz et al. concluded the differences between the two chromosomes that lead to different colony types are collateral effects of Sb’s inability to reshuffle its genes.

This work reveals how a special chromosome similar to the Y chromosome in humans evolved. It also shows how multiple complex evolutionary forces can shape a species’ genetic makeup and social forms.

## Introduction

Evolutionary antagonism can emerge when selection favors multiple phenotypic optima within a population. This process can lead to selection for reduced recombination between co-adapted alleles encoding the different phenotypes (Charlesworth, 2016). In turn, reduced recombination will sometimes favor the formation of supergene regions containing tightly linked alleles of up to hundreds of genes. Such regions enable the maintenance of genetic interactions over evolutionary time (Darlington and Mather, 1949; Thompson and Jiggins, 2014). We now know that supergenes controlling ecologically important traits are widespread. Examples include flower heterostyly in Primula (Li et al., 2016), mating type in Mycrobotryum fungi (Branco et al., 2018), Batesian mimicry in butterflies (Joron et al., 2011; Kunte et al., 2014), mating behavior in white-throated sparrows (Zinzow-Kramer et al., 2015; Tuttle et al., 2016) and male sexual morphs in ruff sandpipers (Küpper et al., 2016; Lamichhaney et al., 2016).

The non-recombining portions of sex chromosomes are extensively studied examples of supergenes (Bergero and Charlesworth, 2009). Their evolution is thought to have been shaped by antagonism over sexual phenotypes (Zemp et al., 2016; Vicoso et al., 2013; Khil et al., 2004; Parsch and Ellegren, 2013; Wright et al., 2017; Mank, 2017). However, studies focusing on young sex chromosome systems suggest otherwise (Dufresnes et al., 2015; Nozawa et al., 2014; Alekseyenko et al., 2013; Muyle et al., 2012; Charlesworth et al., 2005; Stöck et al., 2011). Indeed, sex chromosome evolution could be driven mostly by processes resulting from suppressed recombination rather than by antagonistic selection (Branco et al., 2018; Cavoto et al., 2018; Branco et al., 2017; Ma et al., 2020).

The social supergene of the red fire ant Solenopsis invicta is an excellent model for comparing how evolutionary antagonism and the consequences of suppressed recombination shape early supergene evolution. This species includes single-queen and multiple-queen colonies, which differ in behavior, physiology and ecology. This social polymorphism is controlled by a pair of ‘social chromosomes’, SB and Sb, that carry distinct supergene variants (Wang et al., 2013). In single-queen colonies, workers and queens are SB/SB homozygotes. In multiple-queen colonies, egg-laying queens are SB/Sb heterozygotes, but workers can either be homozygous or heterozygous. Sb/Sb queens are rare and their offspring are unviable (Gotzek and Ross, 2007). Furthermore, because of at least three inversions (Stolle et al., 2019; Huang et al., 2018), recombination is severely repressed between SB and Sb (Wang et al., 2013; Pracana et al., 2017a). Similar to a Y chromosome, Sb thus lacks recombination opportunities. Importantly, the fire ant supergene system could be as young as 175,000 generations. It thus offers a rare glimpse into the evolutionary forces shaping the early stages of supergene evolution (Wang et al., 2013).

Suppressed recombination reduces the efficacy of purifying selection through Hill-Robertson interference, which can lead to gene degeneration (Charlesworth, 2016). Accordingly, Sb has accumulated non-synonymous single nucleotide substitutions and repetitive elements (Stolle et al., 2019; Pracana et al., 2017a). Despite Sb degeneration, gene content in SB and Sb is highly similar (Wang et al., 2013; Pracana et al., 2017a), likely because of the system’s young age and purifying selection in haploid males (Hall and Goodisman, 2012). Modifications in the genomic sequence can lead to gene expression changes (Denver et al., 2005; Rifkin et al., 2005), the accumulation of mutations in Sb alleles of some genes could therefore result in SB/Sb expression bias. Changes in expression due to gene degeneration should result in specific expression patterns. For instance, if such mutations go to fixation because of inefficient purifying selection, they would likely result in consistent allelic bias across all tissues. Alternatively, if such mutations go to fixation because they are adaptive responses to antagonistic selection, their expression levels would be likely to require tissue-specific fine-tuning for particular functions. This process should result in tissue-specific allelic bias, as observed across human tissues (GTEx Consortium, 2015).

The accumulation of deleterious mutations in Sb alleles should lead to lower expression levels due to impacts in regulatory sequences. The opposite effect seems to be less likely and may only involve highly specific mutations (Loewe and Hill, 2010). Additionally, selection against the expression of deleterious alleles should result in the downregulation of some Sb alleles (Ma et al., 2020; Xu et al., 2019; Pucholt et al., 2017). If lower Sb expression occurs for dosage-sensitive genes, selection should favor upregulation of the corresponding SB alleles. Such dosage compensation can occur through different mechanisms. For instance, in male Drosophila melanogaster the entire X chromosome is expressed at roughly twice the level of autosomes to compensate for the lack of expression in the highly degenerated Y chromosome (Conrad and Akhtar, 2012). In species with much younger sex chromosomes such as Drosophila miranda, dosage compensation instead occurs only at a gene-by-gene level (Alekseyenko et al., 2013; Nozawa et al., 2018). Because the fire ant supergene is young, we hypothesize that individual genes for which the Sb allele has begun to degenerate would show evidence of dosage compensation. For such genes, the SB allele would be more highly expressed than the Sb allele, while their combined total expression should be similar between SB/Sb and SB/SB individuals.

Because the supergene variants determine social forms in the fire ant, their evolution will also have been shaped by antagonistic selection. We thus expect that the supergene region is enriched for genes with alleles that are beneficial for one of the social forms but detrimental for the other (Zemp et al., 2016; Vicoso et al., 2013; Khil et al., 2004; Parsch and Ellegren, 2013). Genes with such alleles are more likely to also show differences in expression between social forms, resulting in an enrichment of socially biased expression in the supergene. However, such enrichment could instead emerge from lowered expression associated with gene degeneration (Ma et al., 2020), or the fixation of mutations with neutral phenotypic effects (Harrison et al., 2012). Adaptation through lower expression of Sb alleles is indistinguishable from gene degeneration. In contrast, increased expression of Sb alleles is more likely to result from adaptation (Harrison et al., 2012; Pál et al., 2001). This line of reasoning has been used in the analysis of sex chromosomes, in which patterns of high sexually-biased expression for sex-chromosome linked genes is used as a proxy for benefit (Mank, 2017; Mank et al., 2013; Zhou and Bachtrog, 2012). Following this logic and given the expected general pattern of Sb downregulation due to degeneration, genes with high expression of the Sb allele are more likely to be adaptive. Furthermore, because Sb should be enriched in alleles beneficial for multiple-queen colonies, we expect that genes with high expression in multiple-queen colonies will tend to show Sb bias.

We disentangle these evolutionary processes by generating detailed genomic and transcriptomic data: we sequenced genomes of fire ants from their native South American range and combined these with existing genomes from the invasive North American range. This enables us to identify genes with fixed differences between the SB and Sb variants of the social chromosome. To detect differences in expression between SB and Sb alleles, we performed RNA sequencing (RNA-seq) from SB/Sb individuals from the South American range of the species. We used three body parts of queens (head, thorax and abdomen) and whole bodies of workers. We combined this data with published RNA-seq data from SB/Sb individuals from invasive North American and Taiwanese populations. We then compared these expression patterns with those obtained from comparisons between social forms.

We find that most genes in the social chromosome show no strong allelic bias and that there is no clear pattern of supergene-wide expression bias towards either variant. Genes with biased expression tend to show patterns consistent with gene degeneration, such as lower Sb expression with increasing numbers of non-synonymous mutations. Additionally, we find that more genes are SB biased than expected given differences in expression between social forms, a pattern consistent with partial dosage compensation. Accordingly, the accumulation of non-synonymous mutations in Sb alleles correlates with an allelic bias towards SB, consistent with ongoing Sb degeneration. Despite these observations, we find an overrepresentation of Sb-biased genes among genes with higher expression in individuals from multiple-queen colonies. This result indicates that antagonistic selection has also shaped the expression patterns in the fire ant supergene. Given the observed impact of gene degeneration, our results highlight the importance of considering the genomic context of gene expression patterns before making inferences about functionality.

## Results

#### Hundreds of genes have fixed allelic differences between supergene variants

To identify differences between supergene variants, we obtained 408-fold genome coverage from 20 haploid SB males and 20 haploid Sb males. Thirteen within each group (65%) were from the native South American range whereas the rest were from an invasive North American population (Wang et al., 2013). By comparing the two groups of males we identified 2877 single nucleotide polymorphisms (SNPs) with one allele in all SB individuals and a different allele in all Sb individuals, affecting 352 genes (Supplementary file 1). Among the 3.4% of SNPs affecting coding sequence, almost half changed the amino-acid sequence, with one change to a premature stop codon (47.7% non-synonymous vs. 52.3% synonymous changes). The remaining SNPs were in intergenic (36.1%), intronic (58.0%) or in untranslated regions (2.5%).

Because the invasive North American population went through a severe bottleneck in the 20th century (Ascunce et al., 2011), we repeated the analysis after separating populations. We found 252 additional SNPs with fixed differences between SB and Sb individuals in South America, and 23,022 additional fixed differences between SB and Sb in North America. The latter number is 4-fold higher than expected due to differences in sampling size alone and is in line with lower genetic diversity of both supergene variants in North America due to the invasion bottleneck.

#### Seven genes have consistent variant-specific expression patterns in all populations

To understand the impacts of different evolutionary processes on the supergene, we compared the expression of SB alleles and Sb alleles for all genes in the region. For this, we generated RNA-seq data from whole bodies of SB/Sb workers and from abdomens, thoraces, and heads of SB/Sb virgin queens collected in South America. To compare with patterns in other fire ant populations, we additionally incorporated existing RNA-seq gene expression data from pools of whole bodies of SB/Sb queens collected in the USA and Taiwan (Wurm et al., 2011; Fontana et al., 2020). We summarize our hypotheses and results in Table 1.

##### Table 1

Summary of the hypotheses proposed in this study, the tests carried out to explore them, the data used and the results obtained.

##### Figure 1—with 5 supplements

Differences in expression levels between alleles for genes in the fire ant social supergene in heterozygous SB/Sb individuals which exist only in multiple-queen colonies.

Differences in expression (y axis) between social chromosome variants in whole bodies of workers from South America, heads, thoraces and abdomens of queens from South America, whole bodies of queens from North America and Taiwan. We show all 16 genes with significant allelic bias in South American populations, and the corresponding biases from the other populations. Bold gene names highlight when allelic bias occurs in all populations (Benjamini-Hochberg (BH) adjusted p < 0.05 from the joint linear mixed-effects model). Significance in population-specific comparisons is indicated by an asterisks under each boxplot (BH adjusted p < 0.05 from either the joint analysis or DESeq2 Wald tests). Each box shows the distribution of logarithm two expression ratios between SB and Sb in each sample type. Positive values indicate higher SB expression; negative values indicate higher Sb expression. A log2 expression ratio greater than one or smaller than −1 represents a two-fold gene expression difference in either direction. Genes are in chromosomal order.

Figure 1—source data 1
Differences in expression levels between alleles for genes in the fire ant social supergene in heterozygous SB/Sb individuals which exist only in multiple-queen colonies.
The columns in the dataset stand for: refseq_id, gene ID in the RefSeq database; body_part, body part from which the RNA was extracted; median_expression, median DESeq2 normalized expression per gene across all samples per body part; mean_lfc, mean log2 fold changes between Sb and SB allelic expression per body part; sd_lfc, standard deviation of the log2 fold changes between Sb and SB allelic expression per body part; littleb_expression, DESeq2 normalised read counts for the Sb allele, bigB_expression, DESeq2 normalised read counts for the SB allele; sample_id, ID for the replicate; population, population of origin of the replicate, total_expression, combined SB Sb expression per gene per replicate; lfc_SB-Sb, log2 fold changes between SB and Sb DESeq2 normalized read counts; normalized_lfc, Z scores centered log2 fold changes between SB and Sb, the mean log2 fold change per body part is substracted from the log2 fold change per gene per replicate and divided by the standard deviation per body part; significant, whether a gene show significant differences in expression levels between Sb and SB according to either our linear mixed effects model or DESeq2's Wald test, "na" for significance in North American populations, "sa" for South American, "taiwan" for Taiwanese populations and "all" if a geneis significant in all populations.

##### Figure 1—figure supplement 1

Overlapping number of genes with allele-specific expression according to comparisons in each population independently or after combining data from all populations.

The combined analysis detected seven genes with allele-specific expression across both populations, three of which were independently detected using only South American populations, six using only North American populations and three using only Taiwanese populations. The numbers in parentheses indicate how many genes analyzed in each comparison.

##### Figure 1—figure supplement 2

Allele-specific expression for genes in the fire ant social supergene for South American samples (information from body parts of queens and whole bodies of workers merged together).

Differences in allele-specific expression between variants (y axis) for all genes in the supergene with enough expression information (x axis). Significant expression differences (BH adjusted p < 0.05 from Wald test in DESeq2) are in green and the RefSeq ID of the significant gene is outlined in the x axis. Non-significant differences are marked by grey boxes. Within each plot, each box shows the distribution of log2 expression ratios between SB and Sb. Positive values indicate higher expression in SB; negative values indicate higher expression in Sb. The dashed line shows log2 expression ratios = 1, that is a two-fold gene expression difference in either direction. Genes are in chromosomal order, showing that genes with allelic biases in either direction are distributed throughout the supergene, rather than localized in a manner that would be expected by strata of differentiation. Thicker boxes have higher median read counts than thinner boxes.

Figure 1—figure supplement 2—source data 1
Allele-specific expression for genes in the fire ant social supergene for South American samples.
The columns in the dataset stand for: refseq_id, gene ID in the RefSeq database; body_part, body part from which the RNA was extracted; median_expression, median DESeq2 normalized expression per gene across all samples per body part; mean_lfc, mean log2 fold changes between Sb and SB allelic expression per body part; sd_lfc, standard deviation of the log2 fold changes between Sb and SB allelic expression per body part; littleb_expression, DESeq2 normalised read counts for the Sb allele, bigB_expression, DESeq2 normalised read counts for the SB allele; sample_id, ID for the replicate; population, population of origin of the replicate, total_expression, combined SB Sb expression per gene per replicate; lfc_SB-Sb, log2 fold changes between SB and Sb DESeq2 normalized read counts; normalized_lfc, Z scores centered log2 fold changes between SB and Sb, the mean log2 fold change per body part is substracted from the log2 fold change per gene per replicate and divided by the standard deviation per body part; significant, whether a gene show significant differences in expression levels between Sb and SB according to either our linear mixed effects model or DESeq2's Wald test.

##### Figure 1—figure supplement 3

Allele-specific expression for genes in the fire ant social supergene for whole bodies of North American queens.

Differences in allele-specific expression between variants (y axis) for all genes in the supergene with enough expression information (x axis). Significant expression differences (BH adjusted p < 0.05 from Wald test in DESeq2) are in green and the RefSeq ID of the significant gene is outlined in the x axis. Non-significant differences are in gray. Within each plot, each box shows the distribution of log2 expression ratios between SB and Sb. Positive values indicate higher expression in SB; negative values indicate higher expression in Sb. The dashed line shows log2 expression ratios = 1, that is a two-fold gene expression difference in either direction. Genes are in chromosomal order, showing that genes with allelic biases in either direction are distributed throughout the supergene, rather than localized in a manner that would be expected by strata of differentiation. Thicker boxes have higher median read counts than thinner boxes.

Figure 1—figure supplement 3—source data 1
Allele-specific expression for genes in the fire ant social supergene for whole bodies of North American queens.
The columns in the dataset stand for: refseq_id, gene ID in the RefSeq database; body_part, body part from which the RNA was extracted; median_expression, median DESeq2 normalized expression per gene across all samples per body part; littleb_expression, DESeq2 normalised read counts for the Sb allele, bigB_expression, DESeq2 normalised read counts for the SB allele; sample_id, ID for the replicate; population, population of origin of the replicate, total_expression, combined SB Sb expression per gene per replicate; lfc_SB-Sb, log2 fold changes between SB and Sb DESeq2 normalized read counts; significant, whether a gene show significant differences in expression levels between Sb and SB according to either our linear mixed effects model or DESeq2's Wald test.

##### Figure 1—figure supplement 4

Allele-specific expression for genes in the fire ant social supergene) for whole bodies of Taiwanese queens.

Differences in allele-specific expression between variants (y axis) for all genes in the supergene with enough expression information (x axis). Significant expression differences (BH adjusted p < 0.05 from Wald test in DESeq2) are in green and the RefSeq ID of the significant gene is outlined in the x axis. Non-significant differences are in gray. Within each plot, each box shows the distribution of log2 expression ratios between SB and Sb. Positive values indicate higher expression in SB; negative values indicate higher expression in Sb. The dashed line shows log2 expression ratios = 1, that is a two-fold gene expression difference in either direction. Genes are in chromosomal order, showing that genes with allelic biases in either direction are distributed throughout the supergene, rather than localized in a manner that would be expected by strata of differentiation. Thicker boxes have higher median read counts than thinner boxes.

Figure 1—figure supplement 4—source data 1
Allele-specific expression for genes in the fire ant social supergene) for whole bodies of Taiwanese queens.
The columns in the dataset stand for: refseq_id, gene ID in the RefSeq database; body_part, body part from which the RNA was extracted; median_expression, median DESeq2 normalized expression per gene across all samples per body part; littleb_expression, DESeq2 normalised read counts for the Sb allele, bigB_expression, DESeq2 normalised read counts for the SB allele; sample_id, ID for the replicate; population, population of origin of the replicate, total_expression, combined SB Sb expression per gene per replicate; lfc_SB-Sb, log2 fold changes between SB and Sb DESeq2 normalized read counts; significant, whether a gene show significant differences in expression levels between Sb and SB according to either our linear mixed effects model or DESeq2's Wald test.

##### Figure 1—figure supplement 5

Correlation of log2 allele-specific expression ratios between the SB and Sb variants in heterozygous queens from three populations: South American data we generated, North American data (from Morandin et al., 2016), and Taiwanese data (from Fontana et al., 2020).

We show correlations: (a) between Taiwanese and North American populations; (b) between Taiwanese and South American populations; and (c) between South American and North American populations. Positive values represent higher expression of the SB allele, negative values represent higher expression of the Sb allele. Each dot represents a single gene within the supergene region. The sizes of the dots are proportional to the average gene expression level. Red dots represent the genes detected by the linear mixed effects model as significantly differentially expressed between SB and Sb across populations. The correlation r2 between each pair of populations was calculated using the Spearman method, with each gene being weighted by mean expression level (read counts).

Among the 352 genes with fixed differences between SB and Sb in this combined dataset, 122 had sufficient expression for analysis of differences between alleles. We found that seven of the genes (5.7%) had consistent expression differences between variants across all populations (linear mixed-effects model; Benjamini-Hochberg (BH) adjusted p<0.05, Figure 1). Expression bias went in both directions: the Sb variants of ‘pheromone-binding protein Gp-9/OBP3’ (LOC105194481), ‘retinol-binding protein pinta-like’ (LOC105199327) and uncharacterized LOC105193135 were consistently more highly expressed. In contrast, the SB variants of ‘ejaculatory bulb-specific protein 3’ (LOC105199531), ‘carbohydrate sulfotransferase 11-like’ (LOC105193134), ‘calcium-independent phospholipase A2-gamma’ (LOC105203065) and uncharacterized LOC105199756 were consistently more highly expressed.

We repeated our analysis in a population-specific manner. Within each of our three populations, this approach independently confirmed most of the allelic biases we had previously detected (Figure 1—figure supplement 1). It additionally uncovered population-specific patterns of allelic bias within the supergene (Figure 1—figure supplements 2–4, Supplementary file 3).

#### Supergene allele expression differs based on population ancestry but not body parts

If allelic bias evolved in response to antagonistic selection, we expect it to be fine-tuned for specific functions across different tissues. In contrast, if allelic bias resulted from the accumulation of neutral or deleterious mutations, we would expect a consistent bias across tissues.

To test which scenario occurred in the fire ant, we used the South American RNA-seq data. We found no effect of body part or caste on allelic expression for any gene in the supergene region (DESeq2’s Logarithmic Ratio Test and all pairwise Wald comparisons between interaction terms; all BH adjusted p>0.05). This result was unlikely to be due to lack of power because we did find such differences for genes in normally recombining parts of the genome, despite having less power to do so (see Materials and methods).

The general lack of tissue-specific fine-tuning of allelic expression bias in the supergene suggests that most of this bias is due to the accumulation of neutral or deleterious mutations (Stolle et al., 2019; Pracana et al., 2017a) in Sb rather than being adaptive. We further tested this idea by comparing expression patterns between populations. If most changes in Sb were not adaptive, patterns of allelic bias should correlate with similarity of the supergene genomic sequence. We therefore expected a stronger positive correlation of allelic bias between the two closely related invasive populations than to the less closely related South American population (Ascunce et al., 2011). For each pair of populations, we calculated correlations of the log2 ratios between the expression levels of SB and Sb alleles (Figure 1—figure supplement 5). Our findings were in line with our expectation: correlation was stronger between invasive populations (Spearman’s r2 = 0.67), than between either invasive and the native population (Spearman’s r2 = 0.21 and 0.18). This result was further supported by a linear mixed-effects model of all data: population ancestry has a stronger effect on allelic expression bias than geographic proximity (respective interaction terms with the effect of gene: F = 3.42, p<10-15 and F = 0.94, p=0.65).

Together, these results support the idea that the effects of suppressed recombination on genomic architecture explain most of the allelic biases in the supergene region.

#### Overrepresentation of socially biased genes in the social supergene

Social-form specific selection should lead to an overrepresentation of socially biased gene expression in the supergene region. To test whether this pattern occurs, we compared gene expression between egg-laying queens from single-queen and from multiple-queen colonies. There were 293 socially biased genes with known chromosomal locations (Supplementary file 4). Such genes were indeed overrepresented in the supergene region (Figure 2a, 33 out of 474, 12 expected by chance, χ2 = 29.7, p<10-7). Next, we examined the direction of expression bias: we found that most socially biased genes had higher expression in multiple-queen colonies than in single-queen colonies (274 out of 293, i.e., 94%; binomial test, p<10-15). However, this pattern was not specific to the supergene (χ2=1.04, p=0.3, Figure 2b). In sum, more socially biased genes are present in the supergene than in the rest of the genome, but the direction of social bias is similar across the genome. Since the trend of social bias is genome-wide, it cannot be explained by Sb degeneration alone.

##### Figure 2

Distribution of socially biased genes in the genome of the red fire ant within (left bars) and outside (right bars) the supergene region.

(a) The supergene region is significantly enriched in genes with differences between social forms, a pattern that could indicate the effect of antagonistic selection. (b) Most genes with differential expression between social forms are more highly expressed in multiple-queen colonies. This expression bias is observed across the genome and is not a unique feature of the supergene.

Figure 2—source data 1
Genomic location of the analyzed genes in the Solenopsis invicta genome.
The columns in the dataset stand for: genome_region, whether the gene is located in the supergene ("supergene") or in the rest of the genome ("recombining"); gng_linkage_group, the linkage group for the gnG assembly of the Solenopsis invicta reference genome in which a gene is located; refseq_id, gene ID in the RefSeq database; social_bias, whether a gene shows differential expression between social forms according to DESeq2's Wald test (Benjamini Hochberg adjusted p value < 0.05); social_bias_direction, whether a gene is more highly expressed in multiple-queen colonies ("Poly"), single-queen colonies ("Mono") or not differentially expressed ("Non-diff").

#### Genes with no social bias tend to have allele-specific expression biased towards the SB variant

Gene degeneration in Sb could lead to dosage compensation. To test whether this occurs, we compared the differences in expression levels between the SB and Sb alleles within heterozygous SB/Sb individuals from multiple-queen colonies to differences in expression between queens from single-queen (SB/SB) and multiple-queen colonies (SB/Sb). Dosage compensation should lead to a pattern where higher expression of the SB allele does not result in differences in expression between social forms.

We tested whether such a pattern occurs for 294 genes in the supergene region using North American data. We compared the proportion of SB allele expression (PB) for each gene, with its expression in multiple-queen colonies relative to single-queen colonies (PMQ; Figure 3). If Sb degeneration had led to its global downregulation or inactivation, we would have found a stark bias towards SB, irrespective of social bias. Instead, we find that the expression levels of SB and Sb are balanced for most genes without differences between social forms (black line in Figure 3, linear regression passes through the point 0.5, 0.5 with a non-significant deviation of 0.0058; p=0.32). At the extremes of the distribution, however, allelic bias is higher when social bias is higher. We therefore considered two models to explain this pattern. The null model assumes that differences in expression patterns between social forms are due solely to differences in baseline allelic expression (purple line in Figure 3). In this model, the PB and PMQ values change solely as a consequence of the relative expression of the Sb and the SB alleles (see Materials and methods). This model (the purple line) fits the data poorly. The second model (red line), additionally allows for the effect of gene-specific dosage compensation by increasing expression of B-alleles in SB/Sb individuals in multiple queen-colonies. This model fits the data much better and it is significantly different from the null model (analysis of variance p<10-5). This finding confirms that SB expression in SB/Sb individuals is higher than expected given the observed patterns of expression differences between social forms. Further in line with this, we found no relation between the relative expression levels of the SB and Sb alleles in queens from multiple queen-colonies to the total expression of the genes in the supergene in either social form (Figure 3—figure supplement 1;Wilcoxon sum rank test p>0.05). Thus, higher SB allele expression in heterozygous individuals does not imply higher expression in single-queen colonies.

##### Figure 3—with 1 supplement

Relationship between measures of bias in allelic expression (PB) and between social forms (PMQ).

Each point is one of 294 genes within the supergene (North American data). Point size is proportional to the mean expression in queens from multiple-queen colonies. The values were calculated as PB = xB/(xB + xb) and PMQ = xMQ/(xSQ + xMQ); where xB and xb are the expression of SB and Sb alleles, and xSQ and xMQ are the expression in single-queen and multiple-queen colonies. Values of PMQ below 0.5 therefore indicate higher expression in SB/SB queens from single-queen colonies; values above 0.5 indicate higher expression in SB/Sb queens from multiple-queen colonies. Values of PB above 0.5 indicate allelic bias towards SB; values below 0.5 indicate bias towards Sb. The straight black line shows a linear regression. The purple line shows the predicted null relationship if the pattern of expression was due to Sb degeneration alone PMQ = (1/ (2PB+1)). This model is a poor predictor of the data. The red line assumes gene-specific dosage compensation, where a decrease in expression of Sb leads to increased SB expression PB=(1-(PMQ/2))/PMQ. The model including dosage compensation fits the data better than the null model; both models are significantly different (analysis of variance between models p < 10-5). The observed enrichment of multiple-queen genes in Sb is therefore unlikely due to Sb degeneration alone.

Figure 3—source data 1
Social and allelic bias for genes in the supergene.
The columns in the dataset stand for: gene, RefSeq gene ID for each gene; lfcs_ase, log2 fold changes between SB and Sb estimated by DESeq2; lfcs_social_form, log2 fold changes between multiple-queen and single-queen colonies estimated by DESeq2; mean_reads_ase, mean expression across all samples per gene for the allele specific expression analysis; mean_reads_social_form, mean expression across all samples per gene for the social form differences expression analysis.

##### Figure 3—figure supplement 1

Gene expression levels in single-queen and multiple-queen individuals for different levels of SB and Sb expression levels in multiple-queens.

The plot represents the overall expression levels for all genes analyzed in the supergene for which there was allele-specific and social form expression data in North American populations. To account for the potential effects of antagonistic selection, nine genes with significant expression biases towards Sb or with high SB expression in multiple-queen colonies were removed. As a result, 193 genes in total were included in this analysis. Each bar represents the median normalized expression within group, the error bars are the 95% CI interval around the median (estimated from a 5000x median bootstrapping). The expression levels within each expression group (Single-queens, multiple-queens, SB and Sb expression) is normalized by the total number of reads in that group. This normalization allows the comparison across different datasets with different levels. The differences in expression between single-queen and multiple-queen individuals remain non-significant (Wilcox test p>0.05) across varying levels of SB-Sb expression differences. Only when Sb expression is much higher than SB expression (Sb/SB expression ratio >1.5) does it seem to be an increase in gene expression for multiple-queen individuals, but the difference between social forms remained non-significant (Wilcox test p=0.08). Without dosage compensation, we would expect that allelic bias would invariably result in changes in expression between social forms: As the allelic expression levels of SB increases, we would observe an increase in expression of single-queen colonies. Instead, expression levels in single-queen and multiple-queen individuals remain similar. These results are therefore consistent with SB expression compensating for low Sb expression.

Figure 3—figure supplement 1—source data 1
Median gene expression levels in single-queen and multiple-queen individuals for different levels of SB and Sb expression levels in multiple-queens.
The columns in the dataset stand for: expression_group, the samples from which the expression levels were calculated, either whole body queen from multiple-queen colonies ("Poly"), whole body queen from single-queen colonies ("Mono"), SB allelic expression in multiple-queen colonies ("PolySB") or Sb allelic expresion in multiple-queen colonies ("PolySb"); Sb-SB_expression_category, the different levels in which we divided the Sb to SB allele expression levels ratio; median_expression, median raw expression levels per group per category; confidence_level, estimated median of normalized expression across groups and categories after bootstrapping (5000 bootstrap replicates); lower_median_ci, the lower confidence interval at 95% for the median; upper_median_ci, the upper confidence interval at 95% for the median.

To complement these analyses of general trends, we focused on individual genes. Most genes had no significant allelic or socially biased expression (256, i.e., 87%; Figure 4a and Figure 4—figure supplement 1). However, fifteen of the 294 genes showed allelic bias but no differences in gene expression between social forms (5%; Figure 4c). Most of these genes (12) had higher expression in SB, with only three being more highly expressed in Sb (binomial test, p=0.03). In line with this result, the median expression of SB alleles of these 15 genes was 5.9-fold higher than that of Sb alleles (Wilcoxon signed rank test p=0.0008 against equal expression).

##### Figure 4—with 2 supplements

Distribution of differences in gene expression between social forms and between supergene alleles.

X axes indicate ratios of expression between SB/Sb queens and SB/SB queens. Y axes indicate allelic expression ratios in SB/Sb queens. Both ratios use a log2 scale whereby log2 = 0 indicates absence of differences. Colors are proportional to numbers of genes. Double-headed arrows indicate significant expression differences. Panel (a) shows expression patterns for genes showing no difference in either comparison. The remaining three panels summarize expression patterns for: (b) genes with significant expression differences between SB/Sb and SB/SB queens only – these are biased towards higher expression in multiple-queen colonies (13 multiple-queen vs. two single-queen, binomial test, p=0.007); (c) genes with significant expression differences only between SB and Sb alleles within SB/Sb individuals – these are biased towards higher expression in the SB variant, in line with a dosage compensation mechanism (12 SB vs. 3 Sb, binomial test, p=0.03); (d) genes with significant expression differences between SB/Sb and SB/SB queens and between the SB and Sb variants in SB/Sb queens – the genes with higher expression of the Sb allele (y < 0) tend to be more highly expressed in queens from multiple-queen colonies (x > 0), in line with evolutionary antagonism between social forms (5 out of 8 Sb biased genes with bias towards multiple-queen colonies, compared with 1 out of 15 for SB biased genes χ2 = 5.8, p=0.02). The numbers in a) indicate how many genes had no differential expression. In b), (c) and d) the numbers in each quadrant indicate how many genes were significantly differentially expressed in the relevant comparison.

Figure 4—source data 1
Expression differences between allele expression levels and between social forms for genes in the supergene.
The columns in the dataset stand for: gene_id, gene ID in the RefSeq database; lfcs_allele_bias, log2 fold changes between SB and Sb estimated by DESeq2; padj_allele_bias, Benjamini Hochberg adjusted p value obtained from DESeq2's Wald test for differences in expression levels between SB and Sb; lfcs_social_bias, log2 fold changes between multiple-queen and single-queen colonies estimated by DESeq2; padj_social_bias, Benjamini Hochberg adjusted p value obtained from DESeq2's Wald test for differences in expression levels between multiple-queen and single queen colonies.

##### Figure 4—figure supplement 1

Overlap of genes with expression differences between variants and social forms out of all genes within the supergene region with sufficient data in both comparisons in the North American dataset.

##### Figure 4—figure supplement 2

Allelic bias measured as the log2 ratio of expression between the SB and Sb alleles compared to the number of nonsynonymous mutations per gene.

Positive values indicate higher expression in SB; negative values indicate higher expression in Sb. Allelic expression bias was measured in all populations analyzed in this study: South America, U.S.A. and Taiwan. Each dot represents a gene from a sample in each population. Because genes with low expression result in noisy log2 ratios of expression, we excluded genes with a median expression of 5 reads or fewer). A linear regression (dark blue line, ‘log2 ratio’ ~ ‘Population’ + ‘Sample’ + ‘Number of nonsynonymous mutations’) shows that, overall, SB bias increases with the number of nonsynonymous mutations (coefficient = 0.052, p=0.01).

Figure 4—figure supplement 2—source data 1
Effect of non-synonymous mutations in the allelic bias between SB and Sb.
The columns in the dataset stand for: gene, gene ID in the RefSeq database, reads_b, raw read counts per gene per sample for the Sb allele; sample, replicate ID, population, population to which the replicate belongs; reads_B raw read counts per gene per sample for the SB allele; total_reads, combined SB and Sb expression per gene; median_expression, median read count per gene across all samples; lfc, log2 fold changes between SB and Sb expression levels; missense_muts, number of missense mutations, stop_gain_id, number of stop codon gain mutations; stop_loss_id, number of stop codon loss mutations; all_nonsyn_muts, total number of non-synonymous mutations.

The general and gene-specific patterns are hallmarks of dosage compensation: Lower expression of Sb alleles is compensated for by higher expression of SB alleles, resulting in the absence of expression differences between social forms. Importantly, the overall trends and gene-specific patterns both indicate that this dosage compensation occurs in gene-specific manners rather than being a response to global Sb-inactivation.

Dosage compensation would arise if gene degeneration leads to decreased Sb expression. We tested this hypothesis by asking whether genes with higher coding sequence degeneration have a higher SB bias. We used the number of non-synonymous mutations per gene in the Sb allele as a proxy for gene degeneration and compared this measure against allelic bias (Figure 4—figure supplement 2). Indeed, as the number of non-synonymous mutations increases, so does the allelic bias towards SB (coefficient = 0.052, p=0.01). This indicates that coding-sequence degeneration could lead to lower expression, or alternatively that some genes generally degenerate faster than others.

#### Genes with higher expression of Sb than SB alleles are socially biased towards higher expression in multiple-queen colonies

Antagonistic selection should lead to an enrichment of genes in the supergene that are highly expressed in multiple-queen colonies and show allelic bias towards Sb. In contrast, without antagonistic selection all expression differences between Sb and SB alleles should be due to gene degeneration, and lead to lower Sb expression levels (Ma et al., 2020).

From the 294 genes analyzed in the previous section, eight (3%) had both allele-biased and socially-biased expression (Figure 4d). Their expression patterns were strongly directionally biased towards higher expression in Sb and in multiple-queen colonies (5 out 8 Sb biased genes were significantly more highly expressed in multiple-queen colonies Figure 4d; compared to 1 out of 15 for SB biased genes, χ2 = 5.8, p=0.02; Figure 4c). This enrichment of multiple-queen biased genes in the Sb variant is consistent with antagonistic selection. Although unlikely, Sb-specific upregulation could lead to higher expression in multiple-queen colonies without affecting fitness. However, the broad expression patterns described in the previous section (Figure 3) show that differences in expression between social forms are not due exclusively to changes in expression levels between Sb and SB alleles. We additionally showed that the fixation of non-synonymous mutations in Sb alleles correlates with lower expression levels (Figure 4—figure supplement 2). We therefore consider it unlikely that the trend of enrichment in high multiple-queen expression among Sb biased genes would have arisen neutrally. Consistent with the lack of direct correlation between allelic bias and social bias, only 3 of the seven genes with consistent allelic bias in all populations were also differentially expressed between social forms (Supplementary file 2): ‘pheromone-binding protein Gp-9’ (LOC105194481), ‘ejaculatory bulb-specific protein 3’ (LOC105199531) and ‘retinol-binding protein pinta-like’ (LOC105199327). This narrow overlap between allelic and social bias makes these genes candidates for playing roles in phenotypic differences between social forms.

## Discussion

In the fire ant, a supergene system with two variants (SB and Sb) controls whether colonies have one or multiple queens. We compared gene expression patterns between the SB and Sb variants of the social supergene within heterozygote SB/Sb individuals which exist only in multiple-queen colonies. We contrasted these patterns with differences in expression between queens from single-queen and multiple-queen colonies. We find patterns consistent with degeneration of Sb and with dosage compensation in response to this degeneration. We also find that some genes in Sb have patterns consistent with evolutionary antagonism.

#### The effects of gene degeneration on supergene expression patterns

We found that a small proportion of the genes in the supergene region showed consistent allele-specific expression differences between the SB and Sb variants. It is tempting to conclude that such gene expression differences arose through selection, as a consequence of evolutionary antagonism between the single-queen and multiple-queen phenotypes. However, this interpretation may be too simplistic, as it ignores the impacts of supergene degeneration. Several studies have shown that Sb is degenerating (Wang et al., 2013; Stolle et al., 2019; Pracana et al., 2017a). Our observation that roughly half of the fixed differences in coding sequence between SB and Sb impact the protein sequence (where they are likely to have a deleterious effect) is also consistent with degeneration of the Sb variant. Such sequence-level degeneration is a symptom of reduced selection efficacy. By examining gene expression, we revealed three additional symptoms of degeneration and low selection efficacy. First, the absence of tissue-specificity in our study would not be expected if expression differences were adaptive. For some genes, the expression differences between Sb and SB are likely due to mutations that are completely neutral or deleterious. However, for other genes, expression differences could be partially adaptive, but low selection efficacy may have hindered the fine-tuning of their expression during the short timespan since the supergene’s emergence (Wang et al., 2013). As a result, strong selection for a particular level of allele-specific expression in one body part (e.g., in antennae), could result in consistent allele-specific expression patterns across tissues, even if this has mildly deleterious effects (e.g., in the gut).

Second, there was a strong correlation in allelic bias between the invasive North American and Taiwanese populations, despite the data from the two populations being from different studies. Both invasive populations have lower genetic diversity overall (Ascunce et al., 2011), and in the supergene region in particular (Pracana et al., 2017a). The strong effect of ancestry on allelic bias indicates that genomic architecture, rather than gene function, defines most expression patterns within the supergene. Finally, gene degeneration can result in lower expression levels (Xu et al., 2019; Pucholt et al., 2017). For example, gene expression is reduced in genes with sequence-level signatures of degeneration in the mating-type chromosomes of the anther-smut fungus Microbotryum (Ma et al., 2020). We similarly find that Sb alleles with more non-synonymous mutations tend to have lower expression than SB alleles. This pattern could result from the relative inefficacy of selection on Sb, leading to selection favoring the downregulation of alleles accumulating detrimental mutations. We find such patterns of degeneration despite the likely effects of antagonistic selection (Huang and Wang, 2014) and the lack of evolutionary strata (Pracana et al., 2017a), both of which are likely to dampen the degeneration signal (Ma et al., 2020).

#### Dosage compensation in the social supergene

The reduction in expression of the Sb allele in several genes could result in detrimental fitness effects if the genes involved are dosage sensitive, as observed in many sex chromosomes (Mank, 2013). We tested this idea in the fire ant supergene and found that allelic expression bias is relatively balanced, with similar levels of SB and Sb bias across the supergene. However, far more genes have multiple-queen biased expression than single-queen biased expression. This pattern implies that much of the observed SB bias leads to no differences between social forms. Our findings are consistent with dosage compensation, where the higher SB expression effectively counteracts lower Sb expression.

Some of this dosage compensation likely emerged through selection despite Hill-Robertson effects and a short time of divergence (Pracana et al., 2017a). However, some dosage compensation could instead occur automatically, whereby transcriptional machinery is less able to bind to degenerate regions of Sb and thus binds to SB instead (Teufel et al., 2019). Additionally, we speculate that regulatory elements located outside the supergene region could have co-evolved with Sb degeneration, allowing for variant-specific expression regulation. These elements would not be affected by suppressed recombination, allowing for a quicker emergence of dosage controlling mechanism (Lenormand et al., 2020).

Regardless of the mechanisms which mediate gene-by-gene dosage compensation in the fire ant, we show that it can emerge over time scales as short as approximately 175,000 generations (1 million years) (Wang et al., 2013; Pracana et al., 2017a). Furthermore, this is to our knowledge only the second known instance of dosage compensation in a supergene that does not determine sex or mating type. Indeed, a 2–3 million-year-old supergene controlling color morphs of the white-throated sparrow also shows patterns consistent with dosage compensation (Sun et al., 2018). Such findings support the idea that many of the patterns seen in sex chromosomes are representative of supergenes in general. Indeed, rapid evolution of dosage compensation similarly occurred in the 10 million-year-old sex chromosomes of the plant Silene latifolia (Muyle et al., 2012) and in two Drosophila species neo-sex chromosomes that are only a few million years old (Nozawa et al., 2014; Alekseyenko et al., 2013; Nozawa et al., 2018).

#### Supergene expression patterns consistent with antagonistic selection

Most Sb-biased genes are also more highly expressed in multiple-queen colonies, supporting the idea that antagonistic alleles are present in the supergene. In the unlikely case that this pattern had no fitness effects, it could arise neutrally (Harrison et al., 2012). We argue against this possibility because, as discussed above, expression patterns in the supergene do not follow what would be expected if differences between social forms were due exclusively to differences in allelic expression. Additionally, the patterns of expression differences between social forms are similar within the supergene and in the rest of the genome. This observation further suggests that expression differences between social forms are driven by factors unrelated to the genomic architecture of the supergene. Given the patterns of ongoing degeneration in Sb, we conclude that genes with Sb and multiple-queen bias were likely under antagonistic selection. Similarly, the neo-Y chromosome of Drosophila miranda is enriched in male-biased genes in the gonads (Zhou and Bachtrog, 2012).

#### Candidate genes for differences between social forms

The approaches underpinning our analyses are unable to detect allelic differences in genes absent from the reference genome such as OBP-Z5, a putative Odorant Binding Protein exclusive to Sb (Pracana et al., 2017b). However, our analysis does single out candidate genes that potentially contribute to the social polymorphism of the fire ant (lists of all sequence and expression differences are in Supplementary files 1, 2, 3, 4). In particular, three genes stood out because they were differentially expressed between social forms and had variant-specific allele expression in all populations. For the first gene, ‘Pheromone-binding protein Gp-9’ (LOC105194481), also known as OBP-3, the Sb allele was more highly expressed. For decades, this gene has been a candidate effector for social form differences (Pracana et al., 2017a; Keller and Ross, 1998), yet its linkage to hundreds of other genes in the supergene led to doubts that its association to social form is any more than coincidental. We found five fixed differences between SB and Sb for this gene, four of which could affect protein efficiency (consistent with previous findings [Krieger and Ross, 2002]). For the second gene, ‘Ejaculatory bulb-specific protein 3’ (LOC105199531), which also contains an insect odorant binding protein domain (InterPro IPR005055), the SB allele was more highly expressed. Orthologs of this gene are associated with mating (Laturney and Billeter, 2014) in Drosophila melanogaster, sexual behavior in a moth (Bohbot et al., 1998), subcaste differences in bumblebees (Wolschin et al., 2012), venom production in social hornets (Yoon et al., 2015) and caste differences in the termite Reticulitermes flavipes (Steller et al., 2010). Finally, LOC105199327 is likely a Pinta retinol-binding protein. Such proteins help pigment transport and vision in D. melanogaster and the butterfly Papilio xuthus (Pelosi et al., 2018). In sum, all three candidate genes have putative functions related to environmental perception, in line with the complex social phenotype requiring subtle changes in environmental perception or signaling (Favreau et al., 2018).

#### Conclusions

We found differences in expression between the SB- and Sb-linked alleles of genes in the fire ant supergene across three populations. Such strong patterns can naively be assumed to be indicative of adaptive processes emerging from evolutionary antagonism between social forms. However, we show that the evolutionary forces shaping expression patterns in the supergene are complex and must be interpreted with care. In particular, genes with higher expression of the SB allele than the Sb allele tend to either lack expression differences between social forms or have lower expression in multiple-queen than in single-queen colonies. Both patterns are consistent with the idea that suppressed recombination leads to degeneration in Sb and thus lower Sb allele expression. In some cases, a dosage compensation mechanism through higher expression of the healthy SB allele leads to similar expression levels in both social forms. In cases where no dosage compensation occurs, overall expression is lower in multiple-queen colonies than in single-queen colonies.

Conversely, we also show that genes with higher expression of the Sb allele than the SB allele are also biased towards higher expression in multiple-queen colonies. This pattern is consistent with evolutionary antagonism favoring the accumulation of beneficial alleles for multiple-queen individuals in Sb, the supergene variant found only in this social form.

Our study shows that multiple complex evolutionary forces can simultaneously act on a young supergene system. It highlights that allele-specific expression patterns alone are insufficient for inferring whether they are adaptive, deleterious or phenotypically neutral. Instead, putting such expression differences into broader contexts is needed to draw reasonable conclusions. Applying this idea to our data highlights genes which have molecular roles that could affect perception or signaling of the social environment.

## Materials and methods

#### RNA sequencing of fire ants

We used three published and one new RNA-seq gene expression datasets from fire ants. Wurm et al., 2011 obtained whole-body RNA-seq data from six pools of 4 egg-laying SB/Sb queens, each from a multiple-queen colony from Georgia, USA. Morandin et al., 2016 obtained whole-body RNA-seq data from six samples, each being a pool of 3 queens from a single-queen or a multiple-queen colony (three replicates per social form) from Texas, USA. All the queens were mature and egg-laying (C. Morandin, personal communication), thus queens from multiple-queen colonies carried the SB/Sb genotype (Keller and Ross, 1998). Additionally, we manually checked the resulting RNA reads for heterozygous positions at key supergene markers using the IGV gene browser v2.4.19 (Robinson et al., 2011). Fontana et al., 2020 generated RNA-seq data from 4 samples of SB/Sb queens from multiple-queen colonies from Taiwan. Each sample is a pool of whole bodies from two virgin queens (more details in Supplementary file 5).

All three published datasets are from pools of whole bodies from the invasive ranges of S. invicta. The Taiwanese invasive population of red fire ants is derived from that of North America (Ascunce et al., 2011). Because comparisons of whole bodies can be confounded by allometric differences (Johnson et al., 2013) and genetic diversity is reduced among Sb haplotypes in the invasive populations (Pracana et al., 2017a), we generated a new gene expression dataset. We collected samples from the native South American range of this species to obtain molecular data ((collection and exportation permit numbers 007/15, 282/2016, 433/02101-0014449-4 and 25253/16). To obtain RNAseq, we collected six multiple-queen colonies. We confirmed the social form of each colony (Krieger and Ross, 2002) on a pool of DNA from 10 randomly chosen workers. Colonies spent six weeks under semi-controlled conditions before sampling (natural light, room temperature, cricket, mealworm and honey water diet). From each colony, we snap-froze one worker and one unmated queen for gene expression analysis between 12:00 and 15:00 local time. To partly control for allometric differences between genotypes, we separated each queen into head, thorax and abdomen. This was done in petri dishes over dry ice using bleached tweezers. In total, we had 24 samples for RNA extraction: six whole bodies of workers and six replicates of three body parts from queens (more details in Supplementary file 5).

We extracted RNA and DNA from each sample using a dual DNA/RNA Tri Reagent based protocol (https://www.protocols.io/view/rna-dna-extraction-protocol-bi8fkhtn). We applied the Krieger and Ross, 2002 assay on the extracted DNA to identify only individuals with the SB/Sb genotype. Once RNA was extracted, we prepared Illumina sequencing libraries from total RNA using half volumes of the NEBNext Ultra II RNA Library Prep Kit. We checked RNA and library qualities on an Agilent Tapestation 2200; library insert size averaged 350 bp. An equimolar pool of the 24 libraries was sequenced on a single lane of Illumina HiSeq 4000 using 150 bp paired-end reads. This produced an average of 14,848,226 read pairs per sample (maximum: 27,766,980; minimum: 6,015,662. Raw RNA-seq reads for all samples are on NCBI SRA (PRJNA542606).

For all datasets, we assessed read quality using fastQC (v0.11.5; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Raw reads for all samples were of sufficient quality to be used in subsequent analysis. We removed low quality bases using fqtrim with default parameters (v0.9.5; http://ccb.jhu.edu/software/fqtrim/), and Illumina adapters using Cutadapt v1.13 (Martin, 2011). We then generated a STAR v2.5.3a (Dobin et al., 2013) index of the S. invicta reference genome (version gnG; RefSeq GCF_000188075.1 [Wurm et al., 2011] while providing geneset v000188075.1 in GFF format through the ‘sjdbGTFtagExonParentTranscript = Parent’ option. As recommended by the developers of STAR, we aligned each sample to the reference twice, using the ‘out.tab’ file for the second run, and set ‘sjdbOverhang’ to the maximum trimmed read length minus one, that is 74 for the Wurm et al., 2011 data and Morandin et al., 2016 data, 125 for the Fontana et al., 2020 data and 149 for the South American data we generated here. Alignments were run using GNU Parallel v20150922 (Tange, 2011). All steps and downstream analyses were performed on the Queen Mary University of London’s Apocrita High Performance Computing Cluster (King et al., 2017).

We further assessed aligned reads (i.e., BAM files) using MultiQC v1.5 (Ewels et al., 2016) and the BodyGene_coverage.py script of the RSeQC toolkit v2.6.4 (Wang et al., 2016). We removed one sample from multiple-queen colonies in the Morandin et al. data from subsequent analyses due to poor alignment quality. None of the other BAM files showed markers of technical artefacts that could bias our results.

#### Identifying SNPs with fixed differences between SB and Sb males

To detect allele-specific differences between SB and Sb we first identified SNPs with fixed differences between the SB and Sb variants. Because the patterns of genetic diversity differ between the invasive and South American S. invicta populations (Ross et al., 2007; Ahrens et al., 2005), we estimated allele specific expression differences in the social chromosome independently for each population. For this we used haploid male ants because they can provide unambiguous genotypes. For the invasive populations, we identified fixed allelic differences between a group of 7 SB males and a group of 7 Sb males from North America (NCBI SRP017317) (Wang et al., 2013).

For the South American population, we sequenced the genomes of 13 SB males and 13 Sb males sampled from across Argentina. For each individual, we extracted 1 µg of genomic DNA using a phenol-chloroform protocol. The extracted material was sheared to 350 bp fragments using a Covaris (M220). We constructed individually barcoded libraries using the Illumina TruSeq PCR-free kit. The libraries were quantified through qPCR (NEB library quant kit). An equimolar pool of the 26 libraries was sequenced on a HiSeq4000 at 150 bp paired reads. This produced an average of 17,790,416 pairs of reads per sample, with a maximum of 38,823,285 and a minimum of 7,910,042 (Supplementary file 6, genomic reads of all samples deposited on NCBI SRA (PRJNA542606)). For each dataset, we identified fixed allelic differences between the group of SB males and the group of Sb males. We first aligned the reads of each sample to the S. invicta reference genome (Wurm et al., 2011; gnG assembly; RefSeq GCF_000188075.1) using Bowtie2 v2.3.4 (Langmead et al., 2009). We then used Freebayes v1.1.0 (Garrison and Marth, 2012) to call variants across all individuals (parameters: ploidy = 1, min-alternate-count=1, min-alternate-fraction=0.2). We used BCFtools (Li et al., 2009) and VariantAnnotation (Obenchain et al., 2014) to only retain variant sites with single nucleotide polymorphisms (SNPs), with quality value Q greater than or equal to 25, and where all individuals had a minimum coverage of 1. To avoid considering SNPs erroneously called from repetitive regions that are collapsed in the reference genome, we discarded any SNP with mean coverage greater than 16 for the North American samples or 12 for the South American samples or where any individuals had less than 60% reads supporting the called allele. This last filtering step also acts to remove SNPs called from reads with sequencing errors. We then extracted only the SNPs located within the supergene (based on the genomic locations from Pracana et al., 2017a) and with fixed differences between SB and Sb. This step was performed independently for each population. The two resulting variant call files were inspected using VCFtools v0.1.15 (Danecek et al., 2011) and we manually ensured that all variants had the SB allele as reference and Sb allele as alternative. To test the effect of sample size differences between populations we downsampled the South American dataset to 7 pairs of SB and Sb males, matching the sample size in the North American dataset.

We extracted SNPs shared between South and North American populations using BCFtools isec v1.9 (Li et al., 2009). We then used SNPeff (Cingolani et al., 2012) to characterize the effects of individual SNPs.

#### Estimating read counts from alternate supergene variants in heterozygous individuals

Because the reference genome for S. invicta is based on an SB individual, read mapping could be biased towards the SB variant in heterozygous individuals, resulting in false positive detection of allelic bias (Castel et al., 2015). To overcome this potential artifact, we called BCFtools consensus v1.9 (Li et al., 2009) once using North American Sb males and once using South American Sb males. We then aligned the RNAseq reads from each sample to the regular reference genome (version gnG; RefSeq GCF_000188075.1) and also, independently, to the most relevant of the modified references. For the reads from the Taiwanese population, we used the Sb reference using North American SNPs. For the alignment we used STAR with the same parameters as described above. We merged the two resulting BAM files from each sample using SAMtools v1.9 (Li et al., 2009). We then used the ‘rmdup’ function from the WASP pipeline (Soneson et al., 2015) to generate reference-bias free alignment files. The resulting BAM files can be considered reference bias free alignments. We added a reading group ID to each reference-bias free BAM file using the ‘AddOrReplaceReadGroups’ tool from Picard (v 2.7.0-SNAPSHOT; //broadinstitute.github.io/picard/). We then ran all BAM files through GATK’s ‘ASEReadCounter’ v 3.6–0-g89b7209 (Wright et al., 2017) with default options to obtain read counts for each allele. We performed this step once on each population independently.

We then imported the resulting allele-specific SNP read counts per sample generated by GATK into R v3.4.4 (R Development Core Team, 2017). We used the R packages ‘GenomicRanges’ v1.26.4 (Lawrence et al., 2013) and ‘GenomicFeatures’ v1.26.3 (Lawrence et al., 2013) along with the NCBI protein-coding gene annotation for S. invicta to identify which SNPs are in which genes. We estimated the total expression level for a particular allele (i.e., the SB or Sb variant for any given gene) as the median of all SNP-specific read counts per gene and per variant. For instance, consider a gene with three fixed SNPs between SB and Sb for which the SB variants have support from 12, 15 and 18 reads, and the Sb variants from 5, 8 and 6 reads. In this particular case, we would report that the SB variant for this gene has an expression level of 15 reads and the Sb variant, six reads. If instead of this approach, we randomly select one of the possible SNPs for every gene, we find qualitatively similar results to those reported.

Additionally, to test whether we would be able to detect allele-specific expression changes across body parts and castes in the South American data, we calculated allele-specific expression in the whole genome as a positive control. We used the VCF file containing all SNPs in the 26 males collected from South America. We retained only SNPs with expression data in all samples and a median of at least 1 X RNA coverage in each allele across all samples. After filtering, 1096 SNPs remained for which we were able to test for allele-specific expression. We performed an allele specific expression analysis throughout the whole genome using body part and caste information from South American populations. Unlike the analysis of genes in the supergene region, in the whole genome analysis we cannot ensure that every individual is heterozygous for all SNPs. Indeed, the average frequency in the population for all the alleles analyzed was 0.41 with a standard deviation of ±0.2. This implies that both alleles were not necessarily present in all samples. We therefore had far less power to detect allele-specific expression across body parts using data from the whole genome than using SNPs from the supergene region only. Despite this lack of power, we were able to detect significant (Wald test BH adjusted p<0.05) allele-specific expression changes across body parts of queens and workers in 15 SNPs. These significant SNPs were distributed across nine genomic scaffolds. The significant differences in allele-specific expression were between a queen body part and whole bodies of workers.

#### Identifying expression differences between the SB and Sb variants of the supergene

We imported the estimated read counts generated by Kallisto into R using Tximport v1.2.0 (Soneson et al., 2015) and DESeq2 v1.14.1 (Love et al., 2014). For every sample, read counts for the SB alleles and for the Sb alleles come from the same sequencing library, thus standard normalization methods (Dillies et al., 2013) are not applicable. As recommended by the developers of DESeq2 (Love, 2018), we thus deactivated normalization by setting SizeFactors = 1. For the North American and Taiwanese datasets (Wurm et al., 2011; Fontana et al., 2020), we only considered genes expressed in all samples for downstream analyses, whereas for the South American populations RNA dataset, we only analyzed genes expressed in all replicates of at least one body part.

To have the strongest possible analysis of expression between the SB and Sb variants of the supergene region, we performed a joint analysis of RNAseq data from Taiwanese, South and North American populations. The South American dataset includes body part information, which is absent in the North American dataset. We applied a linear mixed effects model on the log2 of the expression ratios between SB and Sb across populations and body parts, using the R packages lme4 v1.1–18.1 (Bates et al., 2014) and lmerTest v3.1–0 (Kuznetsova et al., 2017). We fitted the log2 expression ratios using a 0 intercept with gene, population and their interaction as fixed effects, and the interaction between gene and body part as random effects (formula: log2 expression ratio ~0 + gene * population + (1|body part:gene)).

We also performed an additional linear mixed-effects model to test the effect of geography and ancestry on the allele-specific expression patterns within the supergene. We grouped the populations by geographical proximity (North and South America vs. Taiwan) or by phylogenetic proximity (Taiwan and North America vs. South America). We then fitted the log2 expression ratios using a 0 intercept, the main effects of gene, ancestry and geographic proximity and the interactions between ancestry and gene and between geographic proximity and gene as fixed effects. As random effects we used again the interaction between gene and body part (formula: log2 expression ratio ~0 + gene * ancestry + gene * geography + (1|body_part:gene)). We then performed an analysis of variance on the model to estimate the size effects of each term.

For both models, the log2 expression ratios were weighed by a function of the total read counts per gene to reduce the impacts of genes with low expression which have extremely high variance. Here we only report the results of the fixed effects per gene after adjustment of p values for multiple testing following the Benjamini-Hochberg approach (Benjamini and Hochberg, 1995). For this joint analysis we only used genes that had fixed differences between SB and Sb in all three populations.

We additionally analyzed the allele-specific expression patterns between the SB and Sb variants of the supergene in each population independently using DESeq2 (Love et al., 2014 following Castel et al., 2015. The model formula used for the South American RNA-seq data used ‘body part’ and ‘colony of origin’ as blocking factors, and allele-specific expression, that is ‘variant effect’, as variable of interest. This analysis allowed us to detect differences in expression between variants specific to body part. Preliminary analyses showed that the interaction between ‘variant effect’ and ‘body part’ had no significant effect in any of the genes, and consequently, only the main ‘variant effect’ was considered as the factor of interest for this analysis. The model formula for both the Wurm et al., 2011 and the Fontana et al., 2020 RNA-seq datasets included only whole bodies of queens. We thus used ‘sample’ as a blocking factor and ‘variant effect’ as variable of interest.

In all analyses, we report gene differences between variants as log2 expression ratios between the SB and the Sb counts. That is, genes with expression biased towards SB will produce positive log2 expression ratios whereas those biased towards Sb will produce a negative value. To check whether there was an overall bias towards either variant, we tested the significance of the deviation from 0 for the median log2 expression ratios between SB and Sb via a Wilcoxon sum rank test.

#### Expression differences between single-queen and multiple-queen colonies

We determined the expression levels for all samples from the North American populations (Morandin et al., 2016) by using the count mode in Kallisto v0.44.0 (Bray et al., 2016) using S. invicta coding sequences. We imported the estimated counts into DESeq2 v1.14.1 (Love et al., 2014) using Tximport v1.2.0 (Soneson et al., 2015). We compared the DESeq2 normalized expression levels between social forms, determining significance of differential expression using the default Wald test for pairwise comparisons between genes. We estimated the proportion of significantly differentially to non-differentially expressed genes within and outside the supergene region based on supergene region coordinates from Pracana et al., 2017a. We then used the R packages GenomicRanges and GenomicFeatures (Lawrence et al., 2013) along with the annotations of S. invicta coding sequences to locate each gene with expression information in the genome. Our analyses are restricted to the 10,481 known S. invicta genes that can be reliably placed within or outside the supergene region; other genes are on scaffolds which lack chromosomal locations (Pracana et al., 2017a).

#### Expression differences between variants and social forms

We fitted a model to test whether there is a significant relationship between allele-specific expression differences between supergene variants in the Wurm et al., 2011 dataset, and gene expression differences between social forms (log2 expression ratios using the Morandin et al., 2016 dataset). We examined the overall trend in allele-specific expression patterns within the supergene (i.e., any bias towards expression of either the SB or Sb allelic variant).

We obtained relative expression levels using DESeq2 for both comparisons: single-queen vs. multiple-queen expression for each gene (XSQ vs. XMQ) from the Morandin et al., 2016 dataset and expression of the SB allelic variant vs. the Sb (XB vs. Xb) within each gene from the Wurm et al., 2011 dataset DESeq2 returned an estimate of log2(XB/Xb) for the differences in expression between alleles and log2(XSQ/XMQ) for the differences among colony types. We first generated a null model in which the PB and PMQ values change solely as a consequence of the relative expression (r) of the Sb allele (xb) and the SB alleles (xB), such that xb = r xB, and therefore:

$\Large{P}_B=\frac{X_b}{X_b+{X}_B}$

and

$\Large{P}_{MQ}=\frac{X_{b}+X_{B}}{X_{b}+3X_{B}}$

Notice that in this case the minimum value of PMQ = ⅓ would occur when there was no expression of the b allele (xb = 0). For greater values of xb, we can solve the pair of equations to obtain the relationship:

$\Large{P}_{MQ}=\frac{1}{2P_{B}+1}$

The second model additionally allows for the effect of dosage compensation by increasing expression of B-alleles in SB/Sb individuals in multiple queen-colonies. To do this, PB is defined by the expression differences between social forms such that:

$\Large{P}_B=\frac{1-\frac{P_{MQ}}{2}}{P_{MQ}}$

Finally, we test whether the two models are significantly different using a standard analysis of variance test using the aov function in R. The linear regressions and statistical tests were performed in R v3.4.4 (Lawrence et al., 2013).

We also explored whether genes with low Sb allele expression had higher SB allele expression, resulting in similar expression between multiple-queen (SB/Sb genotype) and single-queen (SB/SB genotype) individuals. Such a pattern would be consistent with an ongoing process of dosage compensation. To do so, we excluded the nine genes with significant biases towards Sb and high SB-SB/Sb ratios (i.e., SB variant more highly expressed in SB/Sb than SB/SB individuals), since they are more likely to have been subjected to antagonistic selection. We also excluded genes with fewer than three read counts mapping to either allele to remove more noisy estimates. The rest of all analyzed genes were then grouped by relative SB/Sb expression. We then compared the overall expression levels between these groups in multiple-queen and single-queen individuals.

#### Data availability

We deposited the genomic and transcriptomic reads we generated from South American Solenopsis invicta on NCBI SRA (PRJNA542606). All analysis scripts used will be made available at https://github.com/wurmlab/2019-11-allelic_bias_in_fire_ant_supergene (copy archived at https://github.com/wurmlab/2019-11-allelic_bias_in_fire_ant_supergene; Martinez-Ruiz, 2020).

## Data availability

We deposited genomic and transcriptomic reads we generated from South American Solenopsis invicta on NCBI SRA (PRJNA542606). All analysis scripts used are available at https://github.com/wurmlab/2019-11-allelic_bias_in_fire_ant_supergene (copy archived at https://github.com/elifesciences-publications/2019-11-allelic_bias_in_fire_ant_supergene).

The following data sets were generated

Martinez-Ruiz CPracana RStolle EParis CINichols RAWurm Y (2019) NCBI BioProject ID PRJNA542606. Solenopsis invicta Raw sequence reads. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA542606

The following previously published data sets were used

Wang JWurm YNipitwattanaphon MRiba-Grognuz OHuang YCShoemaker DKeller L (2012) NCBI BioProject ID PRJNA182127. Solenopsis invicta Variation. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA182127

Fontana SChang NCChang TLee CCDang VDWang J (2019) NCBI BioProject ID PRJNA542500. The fire ant social supergene is characterized by extensive gene and transposable element duplication. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA542500

Morandin CTin MMAbril SGómez CPontieri LSchiøtt MSundström LTsuji KPedersen JSHelanterä HMikheyev AS (2015) NCBI BioProject ID PRJDB4088. Comparative transcriptomics reveals the conserved building blocks involved in parallel evolution of ant phenotypic traits. https://www.ncbi.nlm.nih.gov/bioproject/PRJDB4088

Wurm YWang JRiba-Grognuz OCorona MNygaard SHunt BIngram KFalquet LNipitwattanaphon MGotzek DDijkstra MOettler JComtesse FCheng-Jen SWu WChin-Cheng YThomas JBeaudoing EPradervand SFlegel VCook EFabbretti RStockinger HLong LFarmerie WOakey JBoomsma JPamilo PYi SHeinze JGoodisman MFarinelli LHarshman KHulo NCerutti LXenarios IShoemaker DKeller L (2011) NCBI BioProject ID PRJNA49629. The genome of the fire ant Solenopsis invicta. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA49629

• Supplementary file 1
Single nucleotide polymorphisms (SNPs) with fixed differences between the SB and Sb variant in present in both North and South American populations of red fire ant.
The columns show, from left to right: the scaffold in the reference genome of the fire ant (version gnG; RefSeq GCF_000188075.1) where the SNP is located, its position within the scaffold, the allele present in all SB males (reference allele), the allele present in all Sb males (alternative allele), the position within the gene were the SNP is located, the gene (or genes) that could be potentially affected by the SNP and the potential effect of the SNP in Sb: ‘HIGH’ implies a change that substantially alters protein sequence (e.g., an early stop codon), ‘MODERATE’ implies a change affecting protein sequence, but without necessarily altering substantially protein structure (e.g., a non-synonymous mutation), ‘LOW’ implies a change with no effect on protein sequence (e.g., a synonymous mutation) and ‘MODIFIER’ are variants outside gene coding regions that could have potential regulatory effects. The last three columns of the table are based on the results from snpEff.

• Supplementary file 2
Names and RefSeq identifiers of the seven genes that are significantly differentially expressed between the SB and Sb variants of the S. invicta supergene.
The significance levels were determined using a linear mixed effect models on the log2 expression ratios between SB and Sb. Population was used as a random effect and the log2 expression ratios were weighted by read count of the gene. The third column in the table shows whether that particular gene is also differentially expressed in the comparison between social forms (using Morandin et al., 2016 data), and if so, in which social form it is more highly expressed.

• Supplementary file 3
Genes with significant differential expression between the SB and Sb variants of the S. invicta supergene in a) South American, b) North American or c) Taiwanese populations.
Significance levels were determined by the Wald test in DESeq2. Significance was established as Benjamini and Hochberg corrected p<0.05. The columns in the tables show the names of the genes, their RefSeq identifiers, their log2 expression ratios for allele-specific expression differences between variants (values greater than 0 are more highly expressed in SB) and in which variant they are more highly expressed.

• Supplementary file 4
Genes with significant differential expression between queens from single-queen and multiple-queen colonies of S. invicta from North American populations (data from Morandin et al., 2016).
Significance levels were determined by the Wald test in DESeq2. Significance was established as Benjamini and Hochberg corrected p<0.05. The columns in the tables show the names of the genes, their RefSeq identifiers, their log2 expression ratios for gene expression differences between social forms (values greater than 0 are more highly expressed in queens from multiple-queen colonies) and in which social form they are more highly expressed. Locations in the supergene are based on the data from Pracana, Rodrigo, et al. Molecular ecology 26.11 (2017): 2864–2879.

• Supplementary file 5
Overview of RNA-seq datasets used in this study.
(a) Accession numbers of the North American RNA-seq datasets. ‘Project’ and ‘SRA’ columns indicate NCBI identifiers. The descriptions provided and the sequencing method used are based on metadata available on NCBI and in the manuscripts. One sample (marked with an asterisk) was discarded because of very low coverage after aligning the reads to the S. invicta genome. b) Details for the South American RNAseq dataset. From left to right, the colony name from where samples were taken, the caste used from these colonies, the body parts extracted, the location of each colony in Argentina, the coordinates from where the sample was taken and finally, whether or not samples from the same colony were used to generate the VCF with fixed differences between Sb and SB.

• Supplementary file 6
Location of the colonies used to estimate single nucleotide polymorphisms (SNPs) between SB and Sb males.
Note that individuals from colonies AR102, AR111, AR112, AR114 and AR28 were also used to extract RNA for RNA sequencing.

• Transparent reporting form

## References

1. Ahrens MERoss KGShoemaker DD (2005) Phylogeographic structure of the fire ant solenopsis invicta in its native south american range: roles of natural barriers and habitat connectivity Evolution 59:1733–1743. https://doi.org/10.1111/j.0014-3820.2005.tb01822.x

2. Alekseyenko AAEllison CEGorchakov AAZhou QKaiser VBToda NWalton ZPeng SPark PJBachtrog DKuroda MI (2013) Conservation and de novo acquisition of dosage compensation on newly evolved sex chromosomes in Drosophila Genes & Development 27:853–858. https://doi.org/10.1101/gad.215426.113

3. Ascunce MSYang CCOakey JCalcaterra LWu WJShih CJGoudet JRoss KGShoemaker D (2011) Global invasion history of the fire ant solenopsis invicta Science 331:1066–1068. https://doi.org/10.1126/science.1198734

4. Preprint Bates DMächler MBolker BWalker S (2014) Fitting linear Mixed-Effects models using lme4 arXiv. https://arxiv.org/abs/1406.5823

5. Benjamini YHochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society: Series B 57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

6. Bergero RCharlesworth D (2009) The evolution of restricted recombination in sex chromosomes Trends in Ecology & Evolution 24:94–102. https://doi.org/10.1016/j.tree.2008.09.010

7. Bohbot JSobrio FLucas PNagnan-Le Meillour P (1998) Functional characterization of a new class of odorant-binding proteins in the moth Mamestra brassicae Biochemical and Biophysical Research Communications 253:489–494. https://doi.org/10.1006/bbrc.1998.9806

8. Branco SBadouin HRodríguez de la Vega RCGouzy JCarpentier FAguileta GSiguenza SBrandenburg JTCoelho MAHood MEGiraud T (2017) Evolutionary strata on young mating-type chromosomes despite the lack of sexual antagonism PNAS 114:7067–7072. https://doi.org/10.1073/pnas.1701658114

9. Branco SCarpentier FRodríguez de la Vega RCBadouin HSnirc ALe Prieur SCoelho MAde Vienne DMHartmann FEBegerow DHood MEGiraud T (2018) Multiple convergent supergene evolution events in mating-type chromosomes Nature Communications 9:2000. https://doi.org/10.1038/s41467-018-04380-9

10. Bray NLPimentel HMelsted PPachter L (2016) Near-optimal probabilistic RNA-seq quantification Nature Biotechnology 34:525–527. https://doi.org/10.1038/nbt.3519

11. Castel SELevy-Moonshine AMohammadi PBanks ELappalainen T (2015) Tools and best practices for data processing in allelic expression analysis Genome Biology 16:195. https://doi.org/10.1186/s13059-015-0762-6

12. Cavoto ENeuenschwander SGoudet JPerrin N (2018) Sex-antagonistic genes, XY recombination and feminized Y chromosomes Journal of Evolutionary Biology 31:416–427. https://doi.org/10.1111/jeb.13235

13. Charlesworth DCharlesworth BMarais G (2005) Steps in the evolution of heteromorphic sex chromosomes Heredity 95:118–128. https://doi.org/10.1038/sj.hdy.6800697

14. Charlesworth D (2016) The status of supergenes in the 21st century: recombination suppression in batesian mimicry and sex chromosomes and other complex adaptations Evolutionary Applications 9:74–90. https://doi.org/10.1111/eva.12291

15. Cingolani PPlatts AWang leLCoon MNguyen TWang LLand SJLu XRuden DM (2012) A program for annotating and predicting the effects of single Nucleotide Polymorphisms, SnpEff: snps in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly 6:80–92. https://doi.org/10.4161/fly.19695

16. Conrad TAkhtar A (2012) Dosage compensation in Drosophila melanogaster: epigenetic fine-tuning of chromosome-wide transcription Nature Reviews Genetics 13:123–134. https://doi.org/10.1038/nrg3124

17. Danecek PAuton AAbecasis GAlbers CABanks EDePristo MAHandsaker RELunter GMarth GTSherry STMcVean GDurbin R1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools Bioinformatics 27:2156–2158. https://doi.org/10.1093/bioinformatics/btr330

18. Book Darlington CDMather K (1949) The Elements of Genetics London: George Allen & Unwin Ltd.

19. Denver DRMorris KStreelman JTKim SKLynch MThomas WK (2005) The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans Nature Genetics 37:544–548. https://doi.org/10.1038/ng1554

20. Dillies MARau AAubert JHennequet-Antier CJeanmougin MServant NKeime CMarot GCastel DEstelle JGuernec GJagla BJouneau LLaloë DLe Gall CSchaëffer BLe Crom SGuedj MJaffrézic FFrench StatOmique Consortium (2013) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Briefings in Bioinformatics 14:671–683. https://doi.org/10.1093/bib/bbs046

21. Dobin ADavis CASchlesinger FDrenkow JZaleski CJha SBatut PChaisson MGingeras TR (2013) STAR: ultrafast universal RNA-seq aligner Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635

22. Dufresnes CBorzée AHorn AStöck MOstini MSermier RWassef JLitvinchuck SNKosch TAWaldman BJang YBrelsford APerrin N (2015) Sex-Chromosome homomorphy in palearctic tree frogs results from both turnovers and X-Y recombination Molecular Biology and Evolution 32:2328–2337. https://doi.org/10.1093/molbev/msv113

23. Ewels PMagnusson MLundin SKäller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report Bioinformatics 32:3047–3048. https://doi.org/10.1093/bioinformatics/btw354

24. Favreau EMartínez-Ruiz CRodrigues Santiago LHammond RLWurm Y (2018) Genes and genomic processes underpinning the social lives of ants Current Opinion in Insect Science 25:83–90. https://doi.org/10.1016/j.cois.2017.12.001

25. Fontana SChang NCChang TLee CCDang VDWang J (2020) The fire ant social supergene is characterized by extensive gene and transposable element copy number variation Molecular Ecology 29:105–120. https://doi.org/10.1111/mec.15308

26. Preprint Garrison EMarth G (2012) Haplotype-based variant detection from short-read sequencing arXiv. https://arxiv.org/abs/1207.3907

27. Gotzek DRoss KG (2007) Genetic regulation of colony social organization in fire ants: an integrative overview The Quarterly Review of Biology 82:201–226. https://doi.org/10.1086/519965

28. GTEx Consortium (2015) Human genomics. the Genotype-Tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans Science 348:648–660. https://doi.org/10.1126/science.1262110

29. Hall DWGoodisman MAD (2012) The effects of kin selection on rates of molecular evolution in social insects Evolution 66:2080–2093. https://doi.org/10.1111/j.1558-5646.2012.01602.x

30. Harrison PWWright AEMank JE (2012) The evolution of gene expression and the transcriptome-phenotype relationship Seminars in Cell & Developmental Biology 23:222–229. https://doi.org/10.1016/j.semcdb.2011.12.004

31. Huang Y-CDang VDChang N-CWang J (2018) Multiple large inversions and breakpoint rewiring of gene expression in the evolution of the fire ant social supergene Proceedings of the Royal Society B: Biological Sciences 285:20180221. https://doi.org/10.1098/rspb.2018.0221

32. Huang YCWang J (2014) Did the fire ant supergene evolve selfishly or socially? BioEssays 36:200–208. https://doi.org/10.1002/bies.201300103

33. Johnson BRAtallah JPlachetzki DC (2013) The importance of tissue specificity for RNA-seq: highlighting the errors of composite structure extractions BMC Genomics 14:586. https://doi.org/10.1186/1471-2164-14-586

34. Joron MFrezal LJones RTChamberlain NLLee SFHaag CRWhibley ABecuwe MBaxter SWFerguson LWilkinson PASalazar CDavidson CClark RQuail MABeasley HGlithero RLloyd CSims SJones MCRogers JJiggins CDffrench-Constant RH (2011) Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry Nature 477:203–206. https://doi.org/10.1038/nature10341

35. Keller LRoss KG (1998) Selfish genes: a green beard in the red fire ant Nature 394:573–575. https://doi.org/10.1038/29064

36. Khil PP, Smirnova NA, Romanienko PJ, Camerini-Otero RD (2004) The mouse X chromosome is enriched for sex-biased genes not subject to selection by meiotic sex chromosome inactivation Nature Genetics 36:642–646. https://doi.org/10.1038/ng1368

37. Report King TButcher SZalewski L (2017) Apocrita—High Performance Computing Cluster for Queen Mary IT Services.

38. Krieger MJRoss KG (2002) Identification of a major gene regulating complex social behavior Science 295:328–332. https://doi.org/10.1126/science.1065247

39. Kunte KZhang WTenger-Trolander APalmer DHMartin AReed RDMullen SPKronforst MR (2014) Doublesex is a mimicry supergene Nature 507:229–232. https://doi.org/10.1038/nature13112

40. Küpper CStocks MRisse JEDos Remedios NFarrell LLMcRae SBMorgan TCKarlionova NPinchuk PVerkuil YIKitaysky ASWingfield JCPiersma TZeng KSlate JBlaxter MLank DBBurke T (2016) A supergene determines highly divergent male reproductive morphs in the ruff Nature Genetics 48:79–83. https://doi.org/10.1038/ng.3443

41. Kuznetsova ABrockhoff PBChristensen RHB (2017) lmerTest Package: Tests in Linear Mixed Effects Models Journal of Statistical Software 82:i13. https://doi.org/10.18637/jss.v082.i13

42. Lamichhaney SFan GWidemo FGunnarsson UThalmann DSHoeppner MPKerje SGustafson UShi CZhang HChen WLiang XHuang LWang JLiang EWu QLee SMXu XHöglund JLiu XAndersson L (2016) Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax) Nature Genetics 48:84–88. https://doi.org/10.1038/ng.3430

43. Langmead BTrapnell CPop MSalzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology 10:R25. https://doi.org/10.1186/gb-2009-10-3-r25

44. Laturney MBilleter JC (2014) Neurogenetics of female reproductive behaviors in Drosophila melanogaster Advances in Genetics 85:1–108. https://doi.org/10.1016/B978-0-12-800271-1.00001-9

45. Lawrence MHuber WPagès HAboyoun PCarlson MGentleman RMorgan MTCarey VJ (2013) Software for computing and annotating genomic ranges PLOS Computational Biology 9:e1003118. https://doi.org/10.1371/journal.pcbi.1003118

46. Lenormand TFyon FSun ERoze D (2020) Sex chromosome degeneration by regulatory evolution Current Biology 30:3001–3006. https://doi.org/10.1016/j.cub.2020.05.052

47. Li HHandsaker BWysoker AFennell TRuan JHomer NMarth GAbecasis GDurbin R1000 Genome Project Data Processing subgroup (2009) The sequence alignment/Map format and SAMtools Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

48. Li JCocker JMWright JWebster MAMcMullan MDyer SSwarbreck DCaccamo MOosterhout CVGilmartin PM (2016) Genetic architecture and evolution of the S locus supergene in Primula vulgaris Nature Plants 2:16188. https://doi.org/10.1038/nplants.2016.188

49. Loewe LHill WG (2010) The population genetics of mutations: good, bad and indifferent Philosophical Transactions of the Royal Society B: Biological Sciences 365:1153–1167. https://doi.org/10.1098/rstb.2009.0317

50. Love MIHuber WAnders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biology 15:550. https://doi.org/10.1186/s13059-014-0550-8

51. Website Love MI (2018) Using RNA-seq DE methods to detect allele-specific expression (2017) Accessed May 22, 2018. https://rpubs.com/mikelove/ase

52. Ma WJCarpentier FGiraud THood ME (2020) Differential gene expression between fungal mating types is associated with sequence degeneration Genome Biology and Evolution 12:243–258. https://doi.org/10.1093/gbe/evaa028

53. Mank JEWedell NHosken DJ (2013) Polyandry and sex-specific gene expression Philosophical Transactions of the Royal Society B: Biological Sciences 368:20120047. https://doi.org/10.1098/rstb.2012.0047

54. Mank JE (2013) Sex chromosome dosage compensation: definitely not for everyone Trends in Genetics 29:677–683. https://doi.org/10.1016/j.tig.2013.07.005

55. Mank JE (2017) The transcriptional architecture of phenotypic dimorphism Nature Ecology & Evolution 1:6. https://doi.org/10.1038/s41559-016-0006

57. Software Martinez-Ruiz C (2020) Scripts for “Genomic architecture and evolutionary conflict drive allele-specific expression in the social supergene of the red fire ant”, version 072aace GitHub. https://github.com/wurmlab/2019-11-allelic_bias_in_fire_ant_supergene

58. Morandin CTin MMAbril SGómez CPontieri LSchiøtt MSundström LTsuji KPedersen JSHelanterä HMikheyev AS (2016) Comparative transcriptomics reveals the conserved building blocks involved in parallel evolution of diverse phenotypic traits in ants Genome Biology 17:43. https://doi.org/10.1186/s13059-016-0902-7

59. Muyle AZemp NDeschamps CMousset SWidmer AMarais GA (2012) Rapid de novo evolution of X chromosome dosage compensation in Silene latifolia, a plant with young sex chromosomes PLOS Biology 10:e1001308. https://doi.org/10.1371/journal.pbio.1001308

60. Nozawa MFukuda NIkeo KGojobori T (2014) Tissue- and stage-dependent dosage compensation on the neo-X chromosome in Drosophila pseudoobscura Molecular Biology and Evolution 31:614–624. https://doi.org/10.1093/molbev/mst239

61. Nozawa MIkeo KGojobori T (2018) Gene-by-Gene or localized dosage compensation on the Neo-X chromosome in Drosophila miranda Genome Biology and Evolution 10:1875–1881. https://doi.org/10.1093/gbe/evy148

62. Obenchain VLawrence MCarey VGogarten SShannon PMorgan M (2014) VariantAnnotation: a bioconductor package for exploration and annotation of genetic variants Bioinformatics 30:2076–2078. https://doi.org/10.1093/bioinformatics/btu168

63. Pál CPapp BHurst LD (2001) Highly expressed genes in yeast evolve slowly Genetics 158:927–931.

64. PubMed Parsch JEllegren H (2013) The evolutionary causes and consequences of sex-biased gene expression Nature Reviews Genetics 14:83–87. https://doi.org/10.1038/nrg3376

65. Pelosi PIovinella IZhu JWang GDani FR (2018) Beyond chemoreception: diverse tasks of soluble olfactory proteins in insects Biological Reviews 93:184–200. https://doi.org/10.1111/brv.12339

66. Pracana RPriyam ALevantis INichols RAWurm Y (2017a) The fire ant social chromosome supergene variant sb shows low diversity but high divergence from SB Molecular Ecology 26:2864–2879. https://doi.org/10.1111/mec.14054

67. Pracana RLevantis IMartínez-Ruiz CStolle EPriyam AWurm Y (2017b) Fire ant social chromosomes: differences in number, sequence and expression of odorant binding proteins Evolution Letters 1:199–210. https://doi.org/10.1002/evl3.22

68. Pucholt PWright AEConze LLMank JEBerlin S (2017) Recent sex chromosome divergence despite ancient dioecy in the willow Salix viminalis Molecular Biology and Evolution 34:1991–2001. https://doi.org/10.1093/molbev/msx144

69. Software R Development Core Team (2017) R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria. http://www.r-project.org

70. Rifkin SAHoule DKim JWhite KP (2005) A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression Nature 438:220–223. https://doi.org/10.1038/nature04114

71. Robinson JTThorvaldsdóttir HWinckler WGuttman MLander ESGetz GMesirov JP (2011) Integrative genomics viewer Nature Biotechnology 29:24–26. https://doi.org/10.1038/nbt.1754

72. Ross KGKrieger MJBKeller LShoemaker DD (2007) Genetic variation and structure in native populations of the fire ant solenopsis invicta: evolutionary and demographic implications Biological Journal of the Linnean Society 92:541–560. https://doi.org/10.1111/j.1095-8312.2007.00853.x

73. Soneson CLove MIRobinson MD (2015) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences F1000Research 4:1521. https://doi.org/10.12688/f1000research.7563.1

74. Steller MMKambhampati SCaragea D (2010) Comparative analysis of expressed sequence tags from three castes and two life stages of the termite reticulitermes flavipes BMC Genomics 11:463. https://doi.org/10.1186/1471-2164-11-463

75. Stöck MHorn AGrossen CLindtke DSermier RBetto-Colliard CDufresnes CBonjour EDumas ZLuquet EMaddalena TSousa HCMartinez-Solano IPerrin N (2011) Ever-Young sex chromosomes in european tree frogs PLOS Biology 9:e1001062. https://doi.org/10.1371/journal.pbio.1001062

76. Stolle EPracana RHoward PParis CIBrown SJCastillo-Carrillo CRossiter SJWurm Y (2019) Degenerative expansion of a young supergene Molecular Biology and Evolution 36:553–561. https://doi.org/10.1093/molbev/msy236

77. Sun DHuh IZinzow-Kramer WMManey DLYi SV (2018) Rapid regulatory evolution of a nonrecombining autosome linked to divergent behavioral phenotypes PNAS 115:2794–2799. https://doi.org/10.1073/pnas.1717721115

78. Tange O (2011) Gnu parallel-the command-line power tool The USENIX Magazine 36:42–47. https://doi.org/10.5281/zenodo.16303

79. Teufel AIJohnson MMLaurent JMKachroo AHMarcotte EMWilke CO (2019) The many nuanced evolutionary consequences of duplicated genes Molecular Biology and Evolution 36:304–314. https://doi.org/10.1093/molbev/msy210

80. Thompson MJJiggins CD (2014) Supergenes and their role in evolution Heredity 113:1–8. https://doi.org/10.1038/hdy.2014.20

81. Tuttle EMBergland AOKorody MLBrewer MSNewhouse DJMinx PStager MBetuel ACheviron ZAWarren WCGonser RABalakrishnan CN (2016) Divergence and functional degradation of a sex Chromosome-like supergene Current Biology 26:344–350. https://doi.org/10.1016/j.cub.2015.11.069

82. Vicoso BKaiser VBBachtrog D (2013) Sex-biased gene expression at Homomorphic sex chromosomes in Emus and its implication for sex chromosome evolution PNAS 110:6453–6458. https://doi.org/10.1073/pnas.1217027110

83. Wang JWurm YNipitwattanaphon MRiba-Grognuz OHuang YCShoemaker DKeller L (2013) A Y-like social chromosome causes alternative colony organization in fire ants Nature 493:664–668. https://doi.org/10.1038/nature11832

84. Wang LNie JSicotte HLi YEckel-Passow JEDasari SVedell PTBarman PWang LWeinshiboum RJen JHuang HKohli MKocher JP (2016) Measure transcript integrity using RNA-seq data BMC Bioinformatics 17:58. https://doi.org/10.1186/s12859-016-0922-z

85. Wolschin FShpigler HAmdam GVBloch G (2012) Size-related variation in protein abundance in the brain and abdominal tissue of bumble bee workers Insect Molecular Biology 21:319–325. https://doi.org/10.1111/j.1365-2583.2012.01142.x

86. Wright AEDarolti IBloch NIOostra VSandkam BBuechel SDKolm NBreden FVicoso BMank JE (2017) Convergent recombination suppression suggests role of sexual selection in guppy sex chromosome formation Nature Communications 8:14251. https://doi.org/10.1038/ncomms14251

87. Wurm YWang JRiba-Grognuz OCorona MNygaard SHunt BGIngram KKFalquet LNipitwattanaphon MGotzek DDijkstra MBOettler JComtesse FShih CJWu WJYang CCThomas JBeaudoing EPradervand SFlegel VCook EDFabbretti RStockinger HLong LFarmerie WGOakey JBoomsma JJPamilo PYi SVHeinze JGoodisman MAFarinelli LHarshman KHulo NCerutti LXenarios IShoemaker DKeller L (2011) The genome of the fire ant solenopsis invicta PNAS 108:5679–5684. https://doi.org/10.1073/pnas.1009690108

88. Xu LAuer GPeona VSuh ADeng YFeng SZhang GBlom MPKChristidis LProst SIrestedt MZhou Q (2019) Dynamic evolutionary history and gene content of sex chromosomes across diverse songbirds Nature Ecology & Evolution 3:834–844. https://doi.org/10.1038/s41559-019-0850-1

89. Yoon KAKim KNguyen PSeo JBPark YHKim K-GSeo HKoh YHLee SH (2015) Comparative functional venomics of social hornets Vespa crabro and Vespa analis Journal of Asia-Pacific Entomology 18:815–823. https://doi.org/10.1016/j.aspen.2015.10.005

90. Zemp NTavares RMuyle ACharlesworth DMarais GABWidmer A (2016) Evolution of sex-biased gene expression in a dioecious plant Nature Plants 2:16168. https://doi.org/10.1038/nplants.2016.168

91. Zhou QBachtrog D (2012) Sex-specific adaptation drives early sex chromosome evolution in Drosophila Science 337:341–345. https://doi.org/10.1126/science.1225385

92. Zinzow-Kramer WMHorton BMMcKee CDMichaud JMTharp GKThomas JWTuttle EMYi SManey DL (2015) Genes located in a chromosomal inversion are correlated with territorial song in white-throated sparrows Genes, Brain and Behavior 14:641–654. https://doi.org/10.1111/gbb.12252

## Decision letter

##### Dieter Ebert

Reviewing Editor; University of Basel, Switzerland

##### Patricia J Wittkopp

Senior Editor; University of Michigan, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

There is increasing evidence that supergenes play an important role in the evolution of diverse traits, including colour patterns, parasite resistance and, as in the current study, social interaction. We know very little about the early evolution of such supergenes. The study by Martinez-Ruiz et al. addresses this topic, studying the social supergenes of the red fire ant with the help of gene expressing experiments. The authors report a complex history of the evolution of this supergene including signatures of adaptive evolution, gene degeneration and gene-specific dosage compensation.

Decision letter after peer review:

Thank you for submitting your article “Genomic architecture and evolutionary conflict drive allele-specific expression in red fire ant’s social supergene” for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Patricia Wittkopp (Senior Editor).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

This is a rich paper with a lot of information, including various sources of information to create a complex picture. The authors use caste, social form, supergene genotype and geography to understand how variation in expression of genes forming the so-called social supergene can be used to unravel the evolution of an ant supergene alleles. This is a powerful and fruitful approach, but results in a complex and long manuscripts. Many of the comments of the reviewers refer to the style the material presented.

Furthermore, the reviewers ask for clarification of terms and concepts. Some assumptions are not well explained or justified.

Reviewer 4 suggests an overview summary table summary table synthesizing the main comparisons, results and conclusions and thus to guide the reader through the complex material. The text (in particular the Discussion) is at places long and windy. Here one may shorten the text and give it more of a red line. The summary table may help here. Below are the detailed comments of the three reviewers.

Reviewer #1:

In this study Martinez-Ruiz et al. investigate the social supergene in fire ants that has become famous as a “Green Beard gene” explaining the polymorphisms in queen numbers in colonies of the fire ant. The study uses a combination of population genomics and gene expression to characterize the differences among the two functional types of the gene. For this, the authors used a combination of different data sets (some new, some earlier published). Consistent with the relatively young age of this supergene, most variation in the super gene haplotypes is not associated with the phenotype. Also expected, the haplotypes suffer from degeneration, a phenomenon related to the absence of recombination and the increased likelihood of accumulation of deleterious mutations. Finally, a relatively small number of genes carry the signature of functional haplotypes, associated with the phenotypes. Interestingly, these signatures are consistent with the idea that the evolution of the supergene is driven by conflict. All in all, this is a solid study and I think it is of interest for many readers of eLife.

The study is well done and technically sound. The combination of different methods and different datasets places particular strength on the analysis. The results are appropriately discussed and placed in the light of other super genes known from a number of very different systems. I have no comments.

Reviewer #3:

This manuscript reports analyses on gene expression in the supergene controlling social structure in the red fire ant to explore whether differences in gene expression between the two haplotypes can be due to antagonistic selection or degeneration of the non-recombining haplotype. The questions are very interesting, the biological model is fascinating and the dataset is important. However, there is some room for improvement, in particular regarding clarity and interpretation, as detailed below.

– The Abstract is frustrating, it would be good to indicate what kind of evidence are provided in favor of each of the two hypotheses. Actually, the Introduction, and even the Results, are not clear either on the specific hypotheses tested and how they can be tested. It only becomes clear in the conclusion how the hypotheses were tested and how the results allow inferring the two types of selection. I would recomment to be clear in the Abstract and the end of the conclusion about what kinds of tests will be run and what results would support what hypothesis. Also in the Results section, help the reader to interpret the raw results. As it stands, the results read like a long list of statistical tests, which one has a hard time to make sense of.

– About the interpretation of degeneration causing some of the differences in gene expression, maybe it could be tested whether the alleles in the Sb alleles that are less expressed than in SB show more footprints of degenerative mutations, as was done in the following paper, that would be relevant to cite : Ma et al., 2020.

Reviewer #4:

This is a very rich paper with a complex analysis of gene expression, in which the authors use caste, social form, supergene genotype and geography to understand how variation in expression of genes forming the so-called social supergene can be used to unravel the evolution of supergene alleles. This is an interesting approach. This paper presents a large amount of data on gene expression, placed in an interesting context of supergene evolution and possibly allele degeneration. I like the idea that degeneration can be studied by exploring the possibility of dosage compensation.

I have two main points to raise regarding how the approach is explained and justified.

The first point is that the authors do not explain well enough why they seem to equate over-expression with adaptive variation in expression. It seems that in many places over-expression of genes in a particular compartment (body part, or cast, or genotype, etc.) is seen and presented as an indication that this gene is advantageous in that context. I can think of many genes whose overexpression is deleterious or lethal, and in many cases of alleles for which repressed expression is advantageous. It is also not clear why under- and over-expression should not be equally likely when it comes to mutations affecting regulatory regions, since transcription factors can also suppress gene expression; therefore, mutations that cannot be purged despite their deleterious effects could equally be mutations with a positive or negative effect on expression. The idea that an unpurged mutation is more likely to have a loss of function effect is also not so obvious, at least in the case of non-coding mutations. This idea that the direction of change in expression due to mutations indicates its adaptive value lingers in the manuscript, and while this may reflect a misinterpretation on my part, I suggest that the authors clearly explain, from the outset, what the expectations are as to what constitutes an expected adaptive response through variation in expression, and test it accordingly.

The second point is that the authors try to disentangle the effect of the social phenotype and supergene genotype, but the situation is not symmetrical because of the lethality of the Sb/Sb genotypes and the dominance of Sb. It may be expected that genes with an overexpression in an Sb genotype are only overexpressed in multiqueen colonies. Again, this may reflect my lack of knowledge of the system, but I think the authors should explain in more detail how they can eliminate the expected correlation between the Sb genotype and social form.

https://doi.org/10.7554/eLife.55862.sa1

## Author response

This is a rich paper with a lot of information, including various sources of information to create a complex picture. The authors use caste, social form, supergene genotype and geography to understand how variation in expression of genes forming the so-called social supergene can be used to unravel the evolution of an ant supergene alleles.

This is a powerful and fruitful approach, but results in a complex and long manuscripts. Many of the comments of the reviewers refer to the style the material presented.

We are grateful for the constructive comments of the editors and reviewers. In light of these comments, we have re-written most of the Introduction and Discussion to make our message clearer and more concise. Due to these efforts to streamline the resulting manuscript, the main text is now also 6.4% shorter. We have paid particular attention to style, which we believe is now also much improved.

Furthermore, the reviewers ask for clarification of terms and concepts. Some assumptions are not well explained or justified.

We apologize for the ambiguities. We have now stated more explicitly our hypotheses in the Introduction and added more context to our results. We have additionally devoted more space to clearly defining our assumptions.

Reviewer 4 suggests an overview summary table summary table synthesizing the main comparisons, results and conclusions and thus to guide the reader through the complex material.

This is indeed a helpful idea. We have added Table 1, where we summarize in a concise manner all of our hypotheses, the tests carried out to address them, the expected results and the observed results.

The text (in particular the Discussion) is at places long and windy. Here one may shorten the text and give it more of a red line. The summary table may help here. Below are the detailed comments of the three reviewers.

Because of our efforts to make the text more readable and to the point the Discussion is now 27% shorter and the whole text is 6% shorter. Adding Table 1 indeed also helped contextualize our results more clearly. Overall, we believe that these efforts have substantially improved the quality of the manuscript.

Reviewer #3:

This manuscript reports analyses on gene expression in the supergene controlling social structure in the red fire ant to explore whether differences in gene expression between the two haplotypes can be due to antagonistic selection or degeneration of the non-recombining haplotype. The questions are very interesting, the biological model is fascinating and the dataset is important. However, there is some room for improvement, in particular regarding clarity and interpretation, as detailed below.

The Abstract is frustrating, it would be good to indicate what kind of evidence are provided in favor of each of the two hypotheses. Actually, the Introduction, and even the Results, are not clear either on the specific hypotheses tested and how they can be tested. It only becomes clear in the conclusion how the hypotheses were tested and how the results allow inferring the two types of selection. I would recomment to be clear in the Abstract and the end of the conclusion about what kinds of tests will be run and what results would support what hypothesis.

We agree that the message could have been clearer. We have now substantially revised the Abstract and Introduction. We have now explicitly stated the hypotheses we tested, how the tests were performed and how the conclusions we reached from these results. Additionally, we have added a summary of the hypotheses, tests and results in Table 1.

Also in the Results section, help the reader to interpret the raw results. As it stands, the results read like a long list of statistical tests, which one has a hard time to make sense of.

The Results section indeed previously included a lot of technical detail for several analyses that we had done as additional controls. This was in part due to merging what was initially in supplementary data into the main text. We now still mention most of these results but have substantially reduced the technical detail with which they are described. Furthermore, we now provide more context and interpretation within each subsection of Results, which should also help the reader make sense of the work.

About the interpretation of degeneration causing some of the differences in gene expression, maybe it could be tested whether the alleles in the Sb alleles that are less expressed than in SB show more footprints of degenerative mutations, as was done in the following paper, that would be relevant to cite : Ma et al., 2020.

These are indeed interesting ideas. We applied them to our system with some caveats. The system studied by Ma et al., 2020 determines mating types; it is thus not thought to be affected by evolutionary antagonism, whereas ours is. In addition, their system is older and contains evolutionary strata, unlike the fire ant supergene. Such differences could reduce potential signals. Furthermore, we lack genomes from sister species to test as many features as Ma et al. did. Despite all this, we were able to test whether there is a gene-level link between sequence degeneration and reduced expression. For this, we now compare allelic bias with the numbers of nonsynonymous mutations for each gene (in Figure 4—figure supplement 2). We additionally perform a linear regression to test whether increased gene degeneration in Sb leads to an overall SB bias. We find that genes with more nonsynonymous mutations are more biased towards SB expression, supporting Sb degeneration. In addition, we now compare changes in expression between social forms across a set of categories of SB/Sb allele bias (Figure 3—figure supplement 1). We find that relative expression between social forms remains constant across different levels of SB bias, adding further evidence for dosage compensation of SB. These results are described in subsection “Genes with no social bias tend to have allele-specific expression biased towards the SB variant”, and discussed in subsection “The effects of gene degeneration on supergene expression patterns”. We describe how these analyses were performed in subsection “Expression differences between variants and social forms”.

Reviewer #4:

This is a very rich paper with a complex analysis of gene expression, in which the authors use caste, social form, supergene genotype and geography to understand how variation in expression of genes forming the so-called social supergene can be used to unravel the evolution of supergene alleles. This is an interesting approach. This paper presents a large amount of data on gene expression, placed in an interesting context of supergene evolution and possibly allele degeneration. I like the idea that degeneration can be studied by exploring the possibility of dosage compensation.

I have two main points to raise regarding how the approach is explained and justified.

The first point is that the authors do not explain well enough why they seem to equate over-expression with adaptive variation in expression. It seems that in many places over-expression of genes in a particular compartment (body part, or cast, or genotype, etc.) is seen and presented as an indication that this gene is advantageous in that context. I can think of many genes whose overexpression is deleterious or lethal, and in many cases of alleles for which repressed expression is advantageous. It is also not clear why under- and over-expression should not be equally likely when it comes to mutations affecting regulatory regions, since transcription factors can also suppress gene expression; therefore, mutations that cannot be purged despite their deleterious effects could equally be mutations with a positive or negative effect on expression. The idea that an unpurged mutation is more likely to have a loss of function effect is also not so obvious, at least in the case of non-coding mutations. This idea that the direction of change in expression due to mutations indicates its adaptive value lingers in the manuscript, and while this may reflect a misinterpretation on my part, I suggest that the authors clearly explain, from the outset, what the expectations are as to what constitutes an expected adaptive response through variation in expression, and test it accordingly.

The reviewer raises many valid points here that we hope to have addressed more succinctly throughout the text now.

Firstly, we would like to clarify that we do not expect selection to be completely suppressed in the supergene region. There is now indeed some evidence that it can occur at low frequency (Ross and Shoemaker, 2018). Hill-Robertson effects due to suppressed recombination make selection less efficient. Consequently, a broader range of mildly deleterious mutations would remain, but highly deleterious mutations will still be selected against. Bearing this clarification in mind, while we agree that some random mutations can increase expression, we disagree with the idea that over- and under- expression are equally likely. It is thought that most de novo mutations are mildly deleterious and therefore one can expect that mutations rather decrease expression due to impacting regulatory sequences, transcription factor binding sites or other functionally relevant sequences. The opposite effect seems to be less likely and may only involve mutations in specific regulatory (suppressor) sequences or need to involve very specific point mutations with transcription-increasing effects (Loewe and Hill, 2010). Genes with high expression generally show markers of adaptive selection such as slower evolution (e.g. Harrison, Wright and Mank, 2012; Pál, Papp and Hurst, 2001; Fraser, Moses and Schadt, 2010). In addition, previous studies show that genes in degenerating regions have lower expression levels (Xu et al., 2019; Pucholt et al., 2017; Ma et al., 2020).

Importantly, we would like to clarify that the patterns we refer to are general patterns – for multiple genes. We do not feel that expression patterns alone are sufficient do demonstrate that changes in any individual gene are adaptive or detrimental. What we mean instead is that, overall, we expect that groups of genes affected by gene degeneration will have general patterns of lower expression. Conversely, in general, highly expressed genes would be more likely to be adaptive than detrimental. We now describe these assumptions in more detail in the Introduction. Additionally, we have now added Figure 4—figure supplement 2, which shows with our data that genes with more nonsynonymous mutations in Sb are more likely to be biased towards SB, indicating that gene degeneration tends to result in lower Sb expression levels (as recommended above). Finally, we now discuss (subsection “Dosage compensation in the social supergene”) the potential mechanisms by which the phenotypic effects of gene degeneration could have arisen in such a young system.

The second point is that the authors try to disentangle the effect of the social phenotype and supergene genotype, but the situation is not symmetrical because of the lethality of the Sb/Sb genotypes and the dominance of Sb. It may be expected that genes with an overexpression in an Sb genotype are only overexpressed in multiqueen colonies. Again, this may reflect my lack of knowledge of the system, but I think the authors should explain in more detail how they can eliminate the expected correlation between the Sb genotype and social form.

Again, the reviewer raises a very good point. We have now re-written the Introduction and Discussion to focus on addressing the issue raised. The data and tests used for Figure 3 (previously Figure 4) show that the observed pattern of differences in expression between social forms are unlikely to be due solely to differences in expression levels between the SB and Sb alleles. We show that patterns of expression differences between social forms are better explained if we allow for an overall higher expression level of the SB alleles when Sb expression is low. Much of the observed SB bias results in no differences in expression between social forms. In other words, social bias cannot be directly linked to allelic bias in the supergene. In addition, in Figure 2B we show that the patterns of expression differences are similar between social forms within and outside the supergene. This result further supports the idea that patterns of social bias are more or less independent of allelic bias within the supergene.

Given these results, we conclude that there is not a direct link between allele expression level differences between SB and Sb and social bias in the supergene. We discuss these results in more detail in subsections “Overrepresentation of socially biased genes in the social supergene” and “Genes with no social bias tend to have allele-specific expression biased towards the SB variant” in the Results section and in “Supergene expression patterns consistent with antagonistic selection” in the Discussion section.

#### References

Fraser, Hunter B., Alan M. Moses, and Eric E. Schadt. 2010. “Evidence for Widespread Adaptive Evolution of Gene Expression in Budding Yeast.” Proceedings of the National Academy of Sciences of the United States of America 107 (7): 2977–82.

https://doi.org/10.7554/eLife.55862.sa2

## Article and author information

#### Author details

##### Carlos Martinez-Ruiz

School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom

Contribution: Conceptualization, Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing

For correspondence: c.martinezruiz@qmul.ac.uk

Competing interests: No competing interests declared

##### Rodrigo Pracana

School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom

Present address: Department of Zoology, University of Oxford, Oxford, United Kingdom

Contribution: Data curation, Supervision, Validation

Competing interests: No competing interests declared

##### Eckart Stolle

School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom

Present address: Center for Molecular Biodiversity Research, Zoologisches Forschungsmuseum Alexander Koenig, Bonn, Germany

Contribution: Resources, Writing - review and editing

Competing interests: No competing interests declared

##### Carolina Ivon Paris

Departamento Ecología, Genética y Evolución, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Intendente Güiraldes 2160, Ciudad Universitaria, Buenos Aires, Argentina

Contribution: Resources

Competing interests: No competing interests declared

##### Richard A Nichols

School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom

Contribution: Conceptualization, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - review and editing

Competing interests: No competing interests declared

##### Yannick Wurm
• School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
• Alan Turing Institute, London, United Kingdom

Contribution: Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Project administration, Writing - review and editing

For correspondence: y.wurm@qmul.ac.uk

Competing interests: No competing interests declared

#### Funding

• Yannick Wurm
##### NERC (NE/L002485/1)
• Carlos Martinez-Ruiz
• Yannick Wurm
• Yannick Wurm
##### BBSRC (BB/K004204/1)

Yannick Wurm The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

#### Acknowledgements

This research was possible thanks to the funding provided by the Natural Environment Research Council (NE/L00626X/1 and NERC EOS Cloud to YW; NE/L002485/1 to CM-R), Deutscher Akademischer Austauschdienst (DAAD) Postdoc Program (570704 83 to ES); European Commission Marie Curie Actions (PIEF-GA-2013–623713 to ES and YW); Biotechnology and Biological Sciences Research Council (BB/K004204/1 to YW). We thank Emiliano Boné for help with ant rearing, Claudia Castillo Carrillo for help during sample collections, Dr. Monika Struebig for preparing RNA-seq libraries, Phillip Howard, Dr. Chloe Economou and Martin Tran for their technical wet lab support, Wurm lab members and colleagues in the Department of Organismal Biology for valuable input and discussions, and the ITS Research Group and Adrian Lärkeryd at Queen Mary University of London for computational support and access to the Apocrita High Performance Computing facility (http://doi.org/10.5281/zenodo.438045) and NERC EOS Cloud.

#### Ethics

Animal experimentation: We snap froze field-collected ants into liquid nitrogen. Ethical guidelines typically do not consider such invertebrates. However, we performed the experiments in a manner that minimized potential harm.

#### Senior Editor

Patricia J Wittkopp, University of Michigan, United States

#### Reviewing Editor

Dieter Ebert, University of Basel, Switzerland