A corpus-based study of self-mention markers in English research articles

The current research aimed at investigating the authorial identity through explicit self-mention markers (I, me, my, we, us, and our) in English research articles written by Indonesian authors. For this purpose, we employed a mix-methods research design consisting of two analysis phases. First, the quantitative analysis was represented by analyzing the frequency of self-mention markers in the corpus of 200 linguistics and applied linguistics research articles using the corpus tool AntConc ver. 3.9.5 (Anthony, 2020). The corpus was compiled from ten journals indexed in SINTA 1 and 2 in the latest five years (2017-2021). Second, the qualitative phase was represented by concordance analysis to interpret the discourse function of self-mention markers in use. We refer to Hyland's taxonomy (Hyland, 2002). Our findings have discovered that Indonesian authors use self-mention in various functions. This research shows the novice authors the extent to which authors can exploit self-mention markers in English research articles and how expert authors in reputable national journals use self mention markers to obtain essential functions to mark their authorial identity. Thus, this research is expected to add insight to EAP/ESL courses to encourage novice writers to construct and represent their identity in conveying their arguments firmly using these self-mentions markers. ARTICLE HISTORY Received November 30, 2021 Accepted December 18, 2021 Published December 31, 2021


Introduction
Academic writings are often regarded as objective and impersonal kind of writing. In other words, academic writing authors cannot include their personal views in conveying their research. On the other hand, Hyland (2001) states that authors cannot entirely refrain from presenting themselves in the text. Thus, academic writing, which is regarded as faceless and dry, is shifted. Academic writing has always been seen as a process for sharing knowledge among the discourse community in the same field. Until recently, academic writings, particularly research articles (RAs), are regarded as arena to create identity in which authors strive for recognition in their academic community (Afsari & Kuhi, 2016). That is what Hyland (2001Hyland ( ), (2002aHyland ( ), (2002b refers to authorial identity. Authorial identity refers to how the authors represent themselves to emphasize their main contribution and credibility. Scholars refer to authorial identity as authors' authority, authors' visibility, authorial stance, and authorial voice (Dontcheva-Navrátilová, 2013;Garzone, 2014;Hyland, 2002a;Ivanič, 1998;Kuo, 1999;Matsuda & Tardy, 2007). According to Ivanič (1998), there are three aspects of identity. First, autobiographical self refers to the authors' background experience that added to their writing. The second is the discoursal self, which denotes the authors' identity based on the convention of discourse community adopted to claim membership in the discourse community. Third, the most notable aspects of identity are the authorial self or self as the author. It concerns how authors take a stance, expressing opinions and beliefs in their writing, validates authors' ownership of their self-confidence to contribute ideas to their discourse community.
Since research articles tend to be written in a more competitive setting, the authors should then explicitly present their convincing argument and reliable findings on their writing to project their authorial identity. In constructing authorial identity to achieve their academic credibility, authors use various linguistic strategies. The fundamental strategy is to use self-mention markers, which are indicated by the first person pronouns (e.g., I, me, my, we, us, and our) (Hyland, 2002a;Ivanič, 1998;Kuo, 1999;Tang & John, 1999). The authors feel that it is necessary to link their findings into the academic community by showing an appealing significance and contribution to their research. Hyland & Jiang (2017) on their research, they found the use of this feature rose 45% from 1965-2015, which mainly happened in the soft science domain compared to the hard science domain.
Studies on investigating authorial identity through self-mention markers have been widely done across various genres, such as theses, dissertations, and student essays (Afsari & Kuhi, 2016;Castillo-Hajan et al., 2019;Daryaee Motlagh, 2021). Some identified the self-mention markers across nationalities and disciplines (Al-Shujairi, 2020; Alyouse & Alotaibi, 2019;Hyland, 2002a;Yu, 2021). For instance, in his corpus-based analysis, Hyland (2002a) compared various disciplines (biology, physics, mechanical, electronic engineering, applied linguistics, business studies, philosophy, and sociology) written by non-native novice writers and expert writers. His research suggested that students underused the self-mention markers since they are not the expertise of the field, which in results self-effacing in claiming the findings are most likely happening in their writing. Meanwhile, when they explain the steps taken in their research, they are willing to exploit self-mention markers. Similarly, Al-Shujairi (2020) and Yu (2021) compared research articles in language studies. They discovered self-mention markers the non-native authors largely employ us in these fields. Yu (2021) argued that the lack of self-mention markers is affected by the authors' culture and the authors' proficiency and competitiveness to publish international papers.
Regarding different genres, in their study, Afsari & Kuhi (2016) investigate the use of selfmention markers in 20 master theses by English postgraduates. The result found that these English authors frequently use the first-person pronoun I, which is in line with (Rahimivand & Kuhi, 2014) that L1 authors are most likely to present themselves in first-person pronoun I, especially in soft science fields. In contrast with expert writers, (Castillo-Hajan et al., 2019) on their study found that first-person pronoun is the least favored in students' essays. Unlike the result in Afasari & Kuhi (2016), the results of MA and Ph.D. non-native English and Applied Linguistics students in Daryaee Motlagh (2021) found that they tend to use third person markers throughout the main sections of text. We believe that different authors' background mainly causes these diverse results. Writing an appropriate research article is difficult even for native authors, but it is even more so for non-native English authors. Those non-native authors also have their own beliefs and culture attached to them when writing academic writing in a second language or foreign language. So when they write in English, they may carry those writing norms that intervene with other conventions. Non-native authors in Asia, as investigated in Aminifard (2020), stick with the traditional view in academic writing. The comparison of authors from three different nationalities showed that most authors are reluctant to reveal their identity through first-person pronouns. They are hardly using the first pronouns to claim their findings. We argue that non-native authors are not familiar yet with the shifting convention, especially in the soft science field. Thus, they still hold their academic convention. Meanwhile, Hyland (2002a) sees these self-mention markers as essential for successful academic writing. It is seen as a strategy that allows the authors to interact with their potential readers to negotiate the ideas, novelty research, and reliable findings.
The studies, as mentioned earlier, have focused on comparing genres, nationalities, and disciplines and most of their findings suggest that non-native authors of English use fewer selfmention markers to justify the objectivity of their writings. Apparently, the Indonesian academic writing style also promotes objectivity and emphasizes that no personal pronouns should be involved, especially in writing formal writing such as research articles for publication. In addition, Yuliawati et al. (2020) added that Indonesian authors favor passive construction in writing research articles. Furthermore, Indonesian academics have gained significant encouragement for academics to publish nationally and internationally to push the quality of Indonesia's publication (Adnan, 2014). This became the problem Indonesian authors faced, who are most likely to adhere to the traditional convention and not be familiar with the English academic discourse.
Although the Indonesian academic convention encourages objectivity, there is still no clear result on how they construct the authorial identity in research articles in the Indonesian context to date. Thus, we decided to conduct this research that focuses on investigating Indonesian authors constructing their authorial identity using self-mention markers in ten reputable national journals. Also, we would like to see whether Indonesian authors in reputable national journals have confidently presented their identity or not. For this purpose, our research has two main aims. First, we purpose to identify a corpus of English research articles written by Indonesian authors to track down the self-mention markers that construct the authorial identity. To do this, we use frequency analysis using AntConc ver.3.9.5 (Anthony, 2020). Second, since self-mention markers perform several functions, we also analyze the discourse function by referring to Hyland's taxonomy Hyland (2002a) in categorizing the function using concordance analysis. This research aims to help Indonesian authors in linguistics and applied linguistics to argue and state their findings convincingly in their writing.

Method
As a signature of the corpus research (Biber & Reppen, 2015;Cheng, 2012), this research adopted a mixed-method research design. Mix-method design is believed to facilitate a deeper understanding of the research objectives of Creswell & Creswell (2018), which is the authorial identity in this context. There are two steps taken for the research. In the first step, we employ quantitative analysis, which sought to obtain the frequency of explicit self-mention markers (I, me, my, we, us, our). In the second phase, we employ a qualitative analysis that helps us explore the function of self-mention markers. In this case, we referred to Hyland's (Hyland, 2002a) taxonomy on discourse function. His taxonomy consists of five categories: expressing self-benefits, stating a purpose, explaining a procedure, elaborating an argument, and stating results/claims. However, we only take four-category since expressing self benefits will only appear on students writing that is meant to the students "to reflect on their learning experience," which is mainly not occurring in the RA genre (Lorés-Sanz, 2011;Mur Dueñas, 2007).
Before we begin to analyze the corpus, we set the criteria to compile the corpus that suits our purpose, such as national journal indexed in SINTA (specifically SINTA 1-2), journals that were published during the period 2017-2021, and English medium journals that only concerned with linguistics, applied linguistics, or language teaching were written by Indonesian writers only.
Having specified the data for our corpus, we began to build the corpus by selecting journals in linguistics and applied linguistics. We chose this field because the authors deal with English which; we assumed that authors in this field have more expertise in writing English academic writing such as research articles. However, these fields commonly use qualitative and quantitative approaches. Therefore, the corpus of this research does not set apart the research articles that applied quantitative or qualitative approaches. (Dobakhti & Hassan, 2017) found no significant differences between quantitative and qualitative research articles using self-mention markers.
All the selected journals are English medium national journals indexed in SINTA 1 and 2. In Indonesia, the credible national journals are the journals that have been validated by SINTA, which indexed the journals' quality. Moreover, the journals indexed in SINTA 1 and SINTA 2 acquired the highest category in national journal rank. Most of the previous studies built their corpus by selecting from a single journal to ensure the representation of our corpus, we decided to select ten.
Following Ariannejad (2020), Harwood (2005) and Hyland (2001), to limit the number of RAs taken from each journal and considering that this corpus needs to be edited and cleaned manually, we decided to take 20 RAs from each journal which was believed to be a representative number of each identified group. In total, we gathered 200 RAs published in 2017-2021, which are hoped to be representative and balanced from ten journals.
Regarding the nativity of the research articles authors, since there are multi-cultured authors from each journal, we only select authors from Indonesia, which was intended for our research purpose. Knowing the fact that the nativity of each author is impossible to be traced one by one by asking the authors nativity personally, following the previous studies (Molino, 2010;Utomo & Suryani, 2019;Wang & Jiang, 2018), we considered authors' nativity from their name and affiliation.
Once we had done the data collecting process, we began to do the data cleaning process. All of the research articles for our corpus were downloaded in pdf format, which means they have to be formatted into text and should be cleaned from any extra information that is unnecessary for our analysis, such as journals' names, page numbers, footnotes, headings, tables, figures, authors' names, acknowledgments, and references. We also code every research article into ERA-001-1A to ease the identification of the articles.
We utilized a corpus tool in creating and conducting the analysis, namely AntConc ver. 3.5.9 (Anthony, 2020). In total, we obtain 969,187 words from the corpus to trace the occurrences of selfmention markers during 2017-2021. The corpus profile is shown in table 1. As mentioned earlier that this research conducts two steps of analysis. At the beginning of corpus analysis, the first step is usually the frequency analysis, and it is the key idea in corpus and points out to the researcher what to be analyzed (Baker et al., 2006). Thus we initiate the frequency analysis, which is used to trace the occurrences of self-mention markers in the corpus using the wordlist feature in AntConc.
After gaining the frequency, we used concordance analysis to separate the corpus's exclusive and inclusive self-mention markers. Suppose there are self-mention markers other than the authors are excluded. Moreover, concordance analysis is also employed to assist us in uncovering the most common realization of the function of self-mention markers in the corpus used by the Indonesian authors in their writing. Finally, examples of the concordance analysis were provided to illustrate how these authors employ the self-mention markers in their research articles.

Findings and Discussions
This research demonstrated the various form of self-mention markers that occurred based on the frequency analysis. The wordlist feature in AntConc counts the self-mention markers in our corpus, which produces the frequency of each self-mention marker. At the beginning of frequency analysis, we found that the first pronoun I occupied the highest frequency with 1791 occurrences, followed by we (856) and the other pronouns such as my (433), our (302), us (216), and me (176). Based on our initial finding, we assume that Indonesian authors are firm in projecting their identity in their writing because they use those self markers in such numbers. We take this result to further analysis to unravel our first assumption, as stated in our procedure. Then, we employ the concordance analysis. However, when we applied the concordance analysis, we were surprised that the numbers were decreasing significantly. It is found that in the frequency the self-mention markers are mainly referring to extracts, interview transcripts, abbreviation, for instance "….He uttered: "erm, at home, I already prepared some words for this part (pointing at the screen), but it was not spoken out.", "…e.g., Tomorrow I have a job to send a parcel…". The first pronoun I was primarily found to be an example or extract of their object research instead of referring to the authors. Therefore, we provide the frequency of self-mention markers that the authors actually use to refer to themselves:  Table 2, we can see that now the pronoun we is the most frequent with 154 occurrences. Although the frequency of the pronoun I is frequently decreasing, the pronoun I sits in the second position with 72 occurrences. The possessive pronouns such as our and my are also decreasing significantly, but having these occurrences may indicate that the authors want to flag their ownership of their research. In contrast with the pronoun us, the pronoun me is not found in the corpus to refer to the authors.
Interestingly, these occurrences may indicate that some authors are willing to use this explicit pronoun since the authors in this corpus were not entirely co-authors, some single authors as well. The appearances of subjective pronouns in this corpus point out that the authors marked their authorial stance in arguing and claiming their work on their research (Khedri, 2016). Thus, it may also indicate that the authors in journals SINTA 1 and 2 are more aware of how to promote themselves using these pronouns, which contrasts with authors who refuse to use self-mention markers such as first-person pronouns . In other words, the findings harmonize with the idea that research articles are shifting to a more impersonal way. It clearly demonstrates that the research articles are no longer faceless as they used to be.
These self-mention markers can be used in various ways. Thus, we analyze their function in detail based on their context in the corpus. As we mentioned earlier, we adopted four categories of Hyland's taxonomy in categorizing their functions. The following table presents the function of each self-mention marker employed by the authors.  Table 3 illustrates the function used by authors in their writing. It appears that the authors employed the self-mention markers both in high-risk and low-risk functions. In line with Molino (2010), these self-mention markers are commonly associated with defining the research procedure, emphasizing the data and result found in their writing. However, most self-mention markers indicate a low-risk function, explaining a procedure, comprising 43% in total. While the high-risk function, stating results/claims, comprised 27% of the occurrences. The other high-risk function, elaborating an argument, slightly outnumbered the other low risk-function, stating a goal/purpose. The pronoun I, my, we, us, and our are seen to be fulfilled almost every function except me because we did not find this pronoun used by the authors to refer to themselves.

Explaining Procedure
At this point, we describe each function extracted from the samples that we found based on the categories. We start by describing the highest function among those four categories. Our findings align with (Al-Shujairi, 2020; Khedri & Kritsis, 2020) that explaining procedure was used mainly by the authors. As suggested by its name, explaining the procedure, we found that they mainly used the self-mention markers in the method section of the research articles. In explaining the procedure, the authors preferred to employ the pronoun I and we as seen in the following examples: These examples show that the authors construct their identity as the researchers by being willing to use the self-mention markers in research procedures or steps on their writing. It also demonstrates that authors are highlighting themselves as performers of the research process to the readers. The verbs such as calculated, counted, utilized, chose, and collected assisted with the pronoun I and we as their subject indicate their procedure was successfully done by the agent or the subject of the process. In this case, they simply marked their personal contribution to their research to display their ability to conduct the research procedure and research decision as in examples (4) and (5).
From this finding, we can say that authors of linguistics and applied linguistics project themselves well in explaining a procedure by emphasizing their professional abilities in the research procedure, as seen in example (3). However, this function is categorized as one of the low-risk functions, which Hyland (2002a) stated that low-risk function only simply signposting the readers. Such function carries a minor threat or rejection.

Stating Result/Claims
In most cases, non-native authors tend to downplay the self-mention markers when stating claim/results as found in Işık-Taş (2018). Interestingly, we found that Indonesian authors employ 27% of the self-mention markers to state their findings in their articles. Furthermore, we found subjective pronouns, but we also noticed the use of possessive pronouns in stating results/claims. The following examples illustrated how authors make use of these self-mention markers: 1. Similarly, in this study we found that gender played an important role in the students' proficiency in writing essay (ERA-047-1C) 2. Based on the finding we noticed that several flouting maxims happen during the interview.
( English in their EFL classes relatively confidently. (ERA-54-1C) When the authors explicitly state their results and knowledge claims, they promote their unique findings, and this high-risk function can potentially object the readers (Hyland, 2002a). Furthermore, it can also promote their worth noting findings as they evaluate, interpret and claim membership to the discourse community. As a result, it makes this function commonly appears in the discussion section or conclusion section. We would assume that these authors are considered to be assertive in stating their result explicitly collocating with cognitive verbs as in reporting their findings (we found and we noticed), conveying knowledge claims, or offering the interpretation (I offer and I conclude).
As can be seen in examples (10), (11), and (12), possessive pronouns are performed in stating the results. Possessive pronoun our followed by the research term such as research findings and interview data suggest their ownership and originality of their findings to the potential readers. It goes the same as when the authors used the pronoun my followed by the classroom observation that the authors emphasize their consistent confidence in revealing their reliable result and mark their persistent willingness to discuss the result of their observation to the readers directly. Hyland (2001) mentioned that these possessive pronouns are also used for marketing the authors' contribution and flag their involvement in research outcomes. Also, it implies responsibility for and commitment to the findings (Li, 2021).

Elaborating an Argument
The findings revealed that the authors consciously set out a line of reasoning using first-person pronouns in elaborating an argument. Moreover, Hyland (2002a) added that only professional academics chose to do this in academic writing. Besides, elaborating an argument is included in the high-risk function that may indicate a face-threatening.
1. Further, we calculated the percentage of their occurrences; we counted the number of variants used in each type, divided by total of variants in that type, and multiplied by 100 %.  2. …we employed both member-checking and external review to establish trustworthiness in the data.  3. I utilised AntConc to explore the use of personal pronouns.  4. Although the RTC consists of approximately 75 million Twitter posts, I randomly chose 1000 tweets for each gender using excel RAND formula from 1-gram tweets of RTC.  5. I collected data for one semester (five months) by examining comments from the subject, notes from classroom observations, course materials, and the text written for an assignment. (ERA-

040-1B)
The authors use the function of elaborating an argument to express their opinion or argument to the theory, their works, their research process, and the method applied in their study. In this section, professional academic writers are required to perform the authorial identity to convey their original interpretations of findings to the discourse community. These examples show the authors' confidence in self-assertion to their argument whether they reflect their argument, suggestion, or doubt. Even in examples (17) and (18), the authors decided to use the pronoun I followed by the verb argue to point out their argumentation. It indicates that authors are not avoiding themselves in using to I to stress their authorial identity. Additionally, Hyland & Tse (2005) and Wu & Paltridge (2021) mentioned that the higher frequency of I imply the authors are more comfortable using I to increase authoritativeness.

Stating a Purpose/Goal
The least function that appears in our corpus is stating a purpose/goal. This function referred to flagging the research intention or focus and providing a clear structure for the text. Despite the low-risk function reflected in this function, they do explicitly present the authorial identity and be responsible for their research decisions. The authors need to present their confidences while introducing the research purposes to reflect their certainty of conducting the research (Walková, 2019). In example (19), the authors present what they intend to do in each research process step indicated by the verb aim. The authors also mentioned their goal in their articles, as demonstrated in example (22)  The functions implied from self-mention markers showed that Indonesian authors are more adapted to this linguistic strategy despite their collective culture, unlike other non-native authors in Aminifard (2020), Karahan (2013) and  that downplayed their selfmention markers in their writings. The use of self-mention markers in academic texts does not necessarily make it less objective, but in certain sections of the research articles, such as in the findings section, self-mention markers can project and promote their authorial identity to be recognized as a credible scholar in their field for claiming their findings. Moreover, authors in applied linguistics are moving towards the tendency to express their stance more subjectively in recent years (Dontcheva-Navrátilová, 2013), as we have discovered in our research. Accordingly, if there are no explicit self-mention markers in their writing at all, Tang & John (1999)mentioned that the authors would leave the potential readers in doubt and confuse their findings. The readers would hesitate to accept the findings if the authors did not convey their argument convincingly (Loan & Pramoolsook, 2015) Conclusion This research focused on identifying authorial identity through explicit self-mention markers in the corpus of English research articles in the field of linguistics and applied linguistics by Indonesian authors. As discussed earlier, the findings have discovered that the Indonesian authors are not completely impersonal in writing the research articles. It is implied that the authors tend to be aware of how to exploit the self-mention markers in academic writings in the last five years. In other words, we assume that they understand their academic convention to market their identity as credible scholars. Furthermore, we also think that the authors in these high-ranking national journals are professional and competent authors who are conscious of using self-mention markers to mark their authorial identity. This assumption is supported by our frequency analysis result that the authors have explicitly projected their identity by using various kinds of self-mention markers, particularly subjective pronoun we and I followed by other pronouns such as my, our, and us.
Regarding the function from the employment of self-mention markers, The findings are also in line with (Alyouse & Alotaibi, 2019) that we found that Indonesian authors tend to be confident and reliable in using the pronouns to explain the research procedure, which comprises 43%. On the other hand, about 27% of Indonesian authors are considered assertive when they state results/claims by using the self-mention markers, then elaborating arguments 17% and stating a goal/purpose 13%.
Knowing that research articles publication is now in a competitive setting, we expect that this research serves pedagogical impact that contributes to EAP/ESP courses and novice writers, especially in linguistics and applied linguistics fields. Writing instructors in Indonesia are suggested to introduce and encourage our novice writers' to acknowledge self-mention markers in presenting convincing arguments that in line with the academic writing norms to project authorial identity as competent scholars. This research also showed how expert authors employ self-mention markers to fulfil particular functions in their writings.
Since our research only focuses on the employment of self-mention markers in linguistics and applied linguistics, we suggest that future researchers conduct comparative research in identifying the authorial identity through self-mention markers. Future researchers can compare English research articles in another field, such as in hard science by Indonesian authors, to gain a deeper insight on how authors in our country, as non-native speakers of English, project their identity through various forms of self-mention markers.