Home / Articles / Volume 17 (2019) / Emoji as Digital Gestures
Document Actions



Emoji (small coloured images encoded like text) went from unavailable outside Japan in 2010 to active use by 92% of the world's online population in 2016. Their sharp rise is often explained by noting that it is difficult to convey emotion in writing without tone of voice and body language, and that emoji fill in this gap. But what exactly is the nature of this gap, and how exactly are emoji filling it? We argue that the most insightful explanation for the function of emoji in digital communication comes by drawing comparisons with existing theoretical literature on gesture. In addition to the obvious similarities between certain emoji and certain gestures (e.g., winking, thumbs up), gestures are commonly grouped into subcategories according to how codified their meaning is and how much they are dependant on surrounding speech. Drawing on individual and aggregate examples of emoji used by English speakers, we show that this same range of functions accounts for how people use emoji.


Informal written communication online has a much-observed problem: Without the additional information provided by tone of voice and body language in face-to-face communication, it is easy for internet users to miss each other's sarcasm, fail to divine humour, and misinterpret a range of emotions (Crystal, 2001; Dresner & Herring, 2010; McCulloch, 2019).

Emoji are one popular solution to this problem, along with emoticons and animated GIFs. Emoji are small images that are encoded with the same protocol as letters, numbers, and other punctuation and orthographic characters. This means that they are lightweight in terms of file size, and can be integrated in-line directly with text, unlike regular images or animated GIFs (Pohl et al., 2017). At the same time, emoji are often multicoloured and can contain a fair amount of detail, more than that which can be represented by repurposing ASCII characters as in emoticons like :). With these advantages, emoji were used by 92% of the world's online population in 2016 (Emogi Research Team, 2016).

This represents a steep rise within barely five years of international availability. The 176 emoji often cited as the first set of emoji were designed by Shigetaka Kurita for the Japanese cell phone carrier DoCoMo in 1999, although it has recently come to light that SoftBank released an earlier set of 90 characters in 1997 (Burge, 2019). While their use was originally limited to Japanese handsets, decisions about what emoji are included in the set have been made by the Unicode consortium since 2007, a group that has, for most of its existence, predominantly been responsible for ensuring that orthographic and punctuation characters work across platforms.[1] The inclusion of emoji in Unicode enabled Apple to include emoji with their iPhone when Apple moved into the Japanese mobile market and made this part of the global release of the iPhone with iOS 5 in 2011. While the Unicode Technical Committee is responsible for decisions on what typographic characters, including emoji, are included in the Unicode Standard, proposals for emoji can be submitted to the Unicode Emoji Subcommittee by anyone.

Given the popularity of emoji among internet users at large, and the centrality of the problem of conveying vocal inflection and physical embodiment to the study of computer-mediated communication (CMC), research has increasingly turned to how, exactly, emoji solve specific aspects of this problem (Dürscheid & Siever, 2017; Miyake 2007; Wagner, 2016; Zappavigna & Zhao, 2017). After briefly surveying the literature on emoji to date, we provide an introduction to gesture, and how the structure of emoji reflect similarities to gesture. Finally, we draw an analogy between common categories of co-speech gestures and the emoji that accompany written communication.


In our survey of the literature on emoji we identify three main waves of understanding of the function of emoji: emoji as language proxy, emoji as emotional indicator, and emoji as pragmatic indicator.

A branch of discussion about emoji that is commonly found in popular discourse emphasizes the potential of emoji as a system of small, icon-like pictures which can be used in sequences without accompanying words for complex, extended communication. Danesi (2017) uses screenshots of stories told in emoji sourced from viral humour websites to characterize emoji as a "visual language," while Evans (2017) argues that emoji are not a language but "incontrovertibly the world's first truly universal form of communication" (p. 20). Schnoebelen (2018) documents and criticizes the narrative of the universality of emoji, which he demonstrates is also prevalent in popular culture discourse — in many ways, a new version of what Thurlow (2006) calls a "moral panic" regarding computer-mediated discourse "replacing" language. Emoji lack linguistic features and do not meet the basic definition of language, according to Dürscheid and Siever (2017). However, Ge and Herring (2018) argue that emoji on Sina Weibo are an "emergent graphical language" based on Chinese speakers’ use of emoji sequences.

The earliest literature looked at emoticons, faces made out of ASCII symbols such as :-), highlighting their function as indicators of emotion, linking them to the etymology of "emoticon" as "emotion" + "icon." Many emoji studies continue in this vein (Miyake, 2007; Wagner, 2016; Zappavigna & Zhao, 2017). Several recent studies describe how emoji assume the emotional load in written speech that is usually conveyed by prosody in spoken language (Wagner, 2016; Zappavigna & Zhao, 2017). As early as 2007, Miyake (2007, p. 57) wrote about Japanese school children using these "mobile-phone-specific pictorial signs" as a way to include communicative nuances that are usually delivered by prosody in spoken interactions.

The most sophisticated approach to emoticon and emoji interpretation highlights their functions beyond emotion expression and explores their role as pragmatic markers of speaker intention (Dresner & Herring, 2010; Herring & Dainas, 2018; Na'aman et al., 2017). This research may also be a reflection of increasing sophistication on the part of emoji users as they manipulate these communicative symbols for more than their iconic emotive value. Dresner and Herring (2010) highlight the pragmatic function of emoticons as indicators of illocutionary force, noting that typing "I'm sick and tired all the time :)" does not indicate that the writer is happy about being sick (emotional interpretation) but rather that the writer is trying to soften a statement that might be perceived as a complaint (pragmatic interpretation).

Such analyses sometimes mention that emoji take on some of the role of gesture (Miyake, 2007 ; Na'aman et al., 2017), although the precise nature of the parallels between emoji and gesture remains underexplored. Schandorf (2012) argues that "nonverbal" features of language are transposed into strategies in computer mediated communication, which share "embodied cognitive roots" (p. 321). Schandorf draws parallels between different types of gesture and features of online communication including hashtags, non-standard orthography, punctuation, and basic ASCII symbols that were precursor options to emoji. Schandorf notes that some of these, like non-standard orthographies, act more like phonetic features. We specifically look at emoji as used with text, but agree with Schandorf that people seek to return features of gesture to online communication, even in the absence of the physical body.

There are, of course, emoji that are straightforward representations of gestures, such as thumbs up 👍, but beyond this obvious relationship the function of emoji can be clarified by examining them in light of the growing body of literature on co-speech gesture over the last 30 years,

We begin by comparing gesture to a range of other communicative actions that exploit the manual domain, including pantomime and sign languages. We then turn specifically to the literature on co-speech gestures, which offers insight into how meaning is structured differently in communicative resources like gesture and emoji, in comparison to the verbal language with which they co-occur. Finally, we look at the functional categories of gesture and the parallels we see in emoji use, to provide a novel demonstration of the variety of functions that emoji can have when used with written language. We argue that the parallels between gesture and emoji provide a specific insight into the discussion about emoji becoming their own "language" and conclude with a discussion of what it means for the future of emoji when they are considered as digital gestures.

Before getting into the similarities between the functions of gesture and emoji, we note a few obvious and immediate differences between emoji and gesture. The first relates to temporal synchrony. Gestures co-occur with speech, displaying tight temporal synchrony (Breckinridge Church et al., 2014; Habets et al., 2011; Kelly, 2014). This temporal relationship is unavailable in the written domain. To the extent that non-verbal content can be integrated into written text, emoji provide the most stable and lightweight option, in that they are encoded like text characters (Davis, 2016). The fact that emoji generally accompany written text (Medlock & McCulloch, 2016) indicates that just as people see gesture and speech as different channels, emoji often provide a different information channel from text in writing, as we show below. The other difference is that people are constrained to a limited character set of emoji, while the hands are capable of a much more flexible gestural repertoire. However, emoji may contain colour and details that are difficult to represent gesturally (it is unclear how one would distinguish between, say, a red apple and a green apple in gesture, but they are readily distinguished as emoji). Nonetheless, as we demonstrate below, even with the finite set of emoji available, the expressive functionality of emoji still parallels that of gesture. Further, the rapid expansion of the Unicode emoji set since its introduction on Western handsets in 2011, many proposed by people with traditionally no interest in font encodings (such as Jennifer 8 Lee's grassroots emoji organization, EmojiNation[2]), demonstrates a strong interest in continuing to expand the communicative repertoire of emoji.


Data for this article are drawn mostly from our observation of different uses of emoji in online communication, providing an exploratory framework for future quantitative investigation of emoji. Most examples are drawn from Twitter, as a rich source of informal internet language use. Individual tweets have been anonymized by replacing content words with synonyms but leaving the emoji unchanged, per the recommendations in Ayers et al. (2018) and the method in Tatman (2018a). An exception is where the tweet was written by a participant in a popular hashtag game, in which cases the tweet remains unchanged and we cite the original author.

We generally avoid "stunt" emoji examples from humour websites, screenshots, ad campaigns, brands, politicians, and other potentially atypical sources; instead, our focus is on "natural" emoji examples which we can trace back to regular users, whether individually or in aggregate. It is these typical emoji cases which we argue are used like gesture.

Gestures and Other Communicative Uses of the Hands

Not every movement of the body is a gesture. In this section, we outline the way intentionally communicative gestures are approached in the growing field of Gesture Studies. We explore how gestures are distinct from other phenomena that use the hands and body as a communicative medium. Just as not every body movement is a gesture, not all uses of emoji are gestural. For example, the use of the bee as a rebus for the lexical item 'be' is a possible communicative use of emoji, but as Na'aman et al. (2017), Donato and Paggio (2017), and Dürscheid and Siever (2017) have found, the rebus use of emoji is atypical among internet users in general.

Intentionally communicative bodily actions can be distinguished from non-communicative movements. For example, an intentional flirtatious wink is different from a sneeze, although it is possible to perform a sneeze for communicative effect if retelling a story about a sneeze. Similarly, gestures are broadly distinguished from picking up objects or touching one's own body (e.g., writing with a pencil, scratching one's nose), in that gestures usually indicate items or actions that are not present. As with many categorical phenomena in gesture studies, this distinction is more gradient than discrete (cf. Goodwin, 2007). Gestures are further distinguished from phenomena like gaze and posture because they relate much more closely to the propositional content of speech (Kendon, 2004). While both gaze and posture have been studied for their importance in regulating discourse relationships, they do not coexist with speech to produce linguistic utterances (Kendon, 2000).

Gestures are not the only intentionally communicative non-speech channel actions. Intentionally communicative body movements have been conceptualised by David McNeill (1985, 1992, 2005) as sitting along a continuum:

Co-Speech Gestures → Pantomime → Emblems → Sign Language (McNeill, 1992, p. 37)

Co-Speech Gestures

Co-speech gestures generally occur in conjunction with speech and are highly idiosyncratic, both in their content and performance. Co-speech gestures have been demonstrated to be of particular communicative importance. When trying to explain a route to a collaborator who has to draw a map, a person who sits on their own hands will modify their speech to include more spatial information, which could otherwise be represented with gesture (Graham & Heywood, 1975; Jenkins et al., 2018). There is also strong evidence for the importance of gestures in linguistic cognition. People still gesture even if their audience is not present – for example, while talking on the telephone (Cohen & Harrison, 1973) –, and congenitally blind people still gesture while they speak to another blind person (Iverson & Golden-Meadow, 1998; Özçalışkan et al., 2016). Krauss et al. (2000) argue that we see such effects because gestures are as much a tool of internally accessing words as they are a tool of interpersonal communication. Kita et al. (2017) argue that gesture is actually more cognitively central than accompanying speech, facilitating conceptualisation and paving the way for speech. Co-speech gesture is not only a cognitively important part of our linguistic faculty, it is also important in interaction. Gestures are not redundant: They include information not included in the spoken channel, particularly spatial information (e.g., Fricke, 2013; Kita & Özyürek, 2003). Similarly, even by strict definitions of sentential semantic redundancy, emoji were found to be non-redundant 66.3% of the time in a corpus study by Donato and Paggio (2017). We discuss how co-speech gestures relate to typical emoji use in the next section.


Pantomime is a combination of gestures (sometimes with simplified speech) that provides limited communicative abilities. Pantomime without speech has been codified into parlor games such as charades, but it can also be used in contexts where environmental factors constrain normal communication, such as around loud music, in a noisy factory, or when a glass wall separates would-be interlocutors.

People use emoji as a kind of pantomime in games in which Twitter users create sequences of emoji to illustrate the plots of famous books, such as in the tweet below. This tweet was selected from among those using the Twitter hashtag #emojireads, where people turn their favourite books into an emoji pantomime sequence (Furlan, 2014). (Other Twitter hashtag games, such as #emojispellingbee and #emojimovies, provide similar examples of emoji pantomime.) Just as charades is a bodily game, these tweets are playing a game with emoji. Once a reader knows the context and the plot of Lord of the Flies, it is clear that the skulls in Figure 1 stand for a particular death, and the 'policeman' emoji stands for a British naval officer who arrives at the end.

Figure 1. Lord of the Flies in emoji, by @zamosta ( http://twitter.com/zamosta/status/495440302430515200 )

With extended use, both gesture and emoji can refer to a specific referent (e.g., a character in a story), but only within a given communicative interaction. For example, while the pig emoji above refers specifically to 'Piggy' in the William Golding novel Lord of the Flies, this only holds across this particular short context. For a more extended example of emergent stable meanings for emoji over a single text, see Emoji Dick, a retelling of the classic Herman Melville novel Moby Dick collaboratively "translated" into emoji (Benenson, 2010). We enclose “translated" in quotation marks because any referents or norms that emerge in the text only exist for the duration of the book (and, as Dürscheid & Siever [2017] note, some remain opaque even to a reader with the English text to hand). A translation of the novel into, say, Japanese has no such problem. The use of particular pantomime elements can quickly become encoded in particular interactional contexts (McNeill, 1992) and may be the basis for lexical items in emergent sign systems (Goldin-Meadow, 2002; van Loon et al., 2014).


Emblems are culturally established gestures with defined meanings, such as 'thumbs up,’ the two-finger 'peace' sign, or nodding to signal 'yes.' These gestures can be understood with or without speech and have standards of well-formedness not expected of co-speech gestures. For example, the 'peace sign' does not retain its meaning if the performance is varied. Removing or adding a finger removes the meaning, and so does bringing the fingers together. Some people are able to rotate the palm to face inwards and still see this as a variant of the peace sign (Morris et al., 1979), but in Britain, Australia, and New Zealand, this rotation changes it into the 'up yours' gesture, which has similar connotations to giving the middle finger, a gesture of offence common across Western Europe and other western countries. The 'peace sign' and 'up yours' form an emblem minimal pair: Slightly different gestures producing very different meanings. Like other stable semiotic entities, emblems are only meaningful to those who are familiar with them. Emblems tend to transcend individual languages and instead have meaning that extends across a cultural area (Morris et al., 1979). This does not, in any way, mean that these emblems are "universal." Perhaps unsurprisingly, given their long history of elegantly signalling meaning, quite a number of emblematic handshapes are encoded as emoji.

There are also non-handshape emoji that take on semantic properties similar to emblematic gestures. Just as the 'ok’ sign does not immediately connote positivity (and in Greece, Turkey, and Southern Italy where it means ‘asshole,’ it certainly does not), the eggplant emoji can refer generically to the vegetable, but for many frequent emoji users it has an additional emblematic function. In one of the earliest widespread emblematic uses of an emoji, it came to be used euphemistically to refer to the phallus (Highfield & Leaver, 2016). It is not possible to use other similarly phallic-shaped emoji to the same effect (such as the cucumber or the corn-on-the-cob): This meaning is only associated with this specific emoji. However, the emblematic meaning of the eggplant emoji is still a nonspecific meaning: Like how the whale emoji can refer to Moby Dick or a generic whale depending on the communicative context, the eggplant emoji can refer to a specific phallus (such as in an emoji pantomime of Fifty Shades of Grey) or to a generic one.

Finally, at the far right of the spectrum described by McNeill we have sign languages, such as ASL and Auslan, which have all the grammatical features of language.

As we move along the continuum from left to right, several interrelated property changes occur. Firstly, speech-dependence declines. Co-speech gestures are dependent on the spoken context for meaning, while sign languages obviously do not require speech because they are fully grammatical languages in their own right. Secondly, as a corollary, language properties increase. Co-speech gesture does not have the kind of grammatical structure that signed or spoken languages have. Pantomime can have 'rules' that emerge through sustained use (like how the policeman emoji in Figure 1 can continue to refer to the naval officers) but only as long as we are in the game of emoji-fying Lord of the Flies. Emblems have stable meanings and do not need accompanying speech, but there is no grammar to make cohesive use of emblems alone. Thirdly, as an extension, socially coded signs replace idiosyncratic gestures. Co-speech gestures can vary greatly among speakers: When relating the story of a car crash, one person may gesture acting out driving the car, another person may gesture demonstrating where the car was on the road, and a third may not gesture at all. The same person could also vary among these options during different tellings of the same story (McNeill, 1992). In comparison, signed languages and emblems are highly codified, much like any other semiotic system. It is possible that in particular communicative environments emoji could become codified to be structured more like language (as Ge & Herring [2018] argue is happening with emergent Chinese usage of emoji). Even pantomime can involve emergent norms which can become codified signs in environments where speech is not possible.

Co-speech gestures structure meaning in a different way from languages, both spoken and signed. In the next section we discuss how the structural properties of gesture can facilitate understanding of how emoji are combined with written text.

Diagnostic Criteria for Whether Emoji have Grammar

In order to ask whether emoji are linguistic or have grammar, we must first ask what the diagnostic criteria are for a communicative system to be a language or to have grammar in the first place. Furthermore, our criteria for "grammar" should be justified by showing how they successfully distinguish between non-controversial examples of grammatical and non-grammatical systems of communication. Previous proposals for grammatical diagnostic criteria as applied to emoji include: emoji in sequences appearing in a consistent order with respect to each other (Ge & Herring, 2018; Schnoebelen, 2018), emoji conveying meaning that repeats or adds to the meaning of the associated words (Ge & Herring, 2018), and emoji conveying meaning that is different in different cultures (Danesi, 2016). While emoji do satisfy many of these criteria (emoji can convey meaning and are sometimes found in consistent orders), it is not clear that these criteria are sufficient to conclude that a symbol system is linguistic or has grammar.

However, the emoji literature is not the first to grapple with the question of what characterizes language. The gesture literature has a set of established diagnostic criteria that characterize the differences between gesture and language (Kendon, 2004; McNeill 1992, 2004). Crucially, the diagnostic criteria for gesture are modality-independent: Despite the fact that modality is a popular shorthand for distinguishing between language and gesture (the voice versus the hands), sign languages use the hand modality for linguistic purposes, and "verbal gestures" such as ‘tsk tsk’ (Grenoble, Baglini, & Martinović, 2015) use the voice modality for gestural purposes. The modality-independent nature of these criteria makes them ideal for applying to a third, pictorial modality, that of emoji.

The four parameters on which gesture differs from language are summarized in Table 1, based on McNeill (1992, p. 41). In the following subsections, we examine each of these diagnostic criteria and apply them to the question of emoji, showing that for all parameters, typical uses of emoji are more similar to the properties of gesture than those of language.

Properties of co-speech gesture

Properties of language

Global and synthetic

Hierarchical and analytic





No standards of form

Standards of form

Table 1. Summary of the properties of gesture and speech

Global and Synthetic

Gesture is said to be "global and synthetic" because the meaning of gestures is derived from the position of the fingers, hands, and arms as a whole, rather than by considering each element individually (McNeill, 1992, p. 41). In contrast, language is readily broken down from sentences into morphemes and from morphemes into phonemes, i.e., it is hierarchical and analytic. From a cognitive perspective, Kita et al. (2017) suggest that this property of gesture arises because co-speech gestures are related to the human ability to think "schematically" about the physical environment.

The question is, are emoji used with synthetic, global meanings or with analytic, hierarchical structure? Consider the sequences of emoji commonly used in messaging someone to wish them "Happy Birthday." Birthday emoji frequently include the cake with candles 🎂, the slice of cake 🍰, the balloon 🎈, the wrapped gift 🎁, the party popper 🎉, or general positive emoji such as hearts, sparkles, happy faces, and positive hand shapes like the thumbs up or fistbump. However, in contrast to English where "Birthday Happy" or "Merry Birthday" are clearly anomalous, the birthday emoji show no such restrictions: Any combination of the above would be a reasonable birthday wish, with no particular change in meaning when a wrapped gift is substituted for a balloon. Medlock and McCulloch (2016) analysed the combinations of emoji sent across SwiftKey's mobile keyboard platform. The five combinations in Table 2 were found among the top 200 combinations of three emoji. Strings of cake, gift, balloon, and party popper emoji are interpreted as a single, synthetic utterance conveying a global meaning of birthday wishes, rather than analyzed for their specific elements.


Ranking (/200)

🎁 🎂 🎈


🎂 🎁 🎉


🎂 🎁 🎈


🎂 🎈 🎉


🎂 🎈 🎁


Table 2. Party emoji in trigrams out of the 200 most common emoji trigrams

Other research has shown that object emoji and attitude emoji do sometimes co-occur in particular linear orders (Ge & Herring 2018; Schnoebelen, 2018), but what these ngram frequency data show is that such potentially analytic use of emoji is not common. Rather, the typical use of emoji is synthetic and global, just as is the typical use of gesture.


Gestures are said to be "noncombinatoric" (McNeill, 1992, p. 41) because they do not combine to form larger structures, whereas language is combinatoric because it does. For example, English speakers may express negation using both words/morphemes (‘not,’ ‘un-,’ etc.) and gestures (shaking one's head, making the "cut-off" gesture across the throat). But English negative words and morphemes can be readily combined with each other, even to create multiple complex negations or to negate some parts of an utterance and not others (e.g., "It's not that I don't like you, it's just that I can't trust you" or "undoable" which can mean both "not able to be done" and "able to be undone"). By comparison, negative gestures add a negative meaning to an entire situation and do not combine their meaning with specific other gestures or with each other in a predictable way (e.g., shaking the head while giving the thumbs up is not interpreted as meaning "not good" or as equivalent to "thumbs down"). Signed languages, of course, are perfectly capable of combining signs like negation into larger structural relationships.

When it comes to emoji, there are, in principle, some emoji that are available for combinatoric sequences. Ge and Herring (2018) found that Chinese speakers on Sina Weibo used the crossed hands emoji 🙅 as a way of negating one or more following emoji. McCulloch (2015) found that English-speaking Twitter users tended to use the (not technically an emoji) equals sign = or arrows 🔄 ➡️ when asked to express an "is" relationship in emoji. And, as mentioned in the previous subsection and discussed by Steinmetz (2014) and Ge and Herring (2018), a heart or a face emoji can be used to express a stance towards an object, such as ❤️🍕 to convey "I love pizza" or ☹❄ to convey "Snow makes me sad." Emoji related to methods of transportation, such as the train and airplane emoji, are also sometimes used to convey a verb of motion or transportation, such as 🇨🇦🛫🇦🇺 ('flying from Canada to Australia') (Tatman, 2016a, 2018b), although these transportation emoji can be ambiguous or difficult to interpret depending on the word order of the language they are interpreted with reference to (Gawne, 2015).

In language, the words that are the most combinatorially useful are also the most common: Function words like ‘the,’ ‘of,’ ‘and,’ ‘is,’ and ‘not’ are the most frequent words in any English corpus (Davies, 2008, 2018; Kucera & Francis, 1967), and sequences containing function words are the most common word ngrams across a wide variety of genres (Davies, 2008, 2018). If people are making use of the emoji combinatoric resources available to them, we might expect to see that our potential combinatoric emoji, such as 🙅 or 🚫 for "not," 🔄 or ➡️ for "is," and the representational faces of people (such as man, woman, baby) are among the most common emoji, and that sequences containing functional emoji (such as heart plus an object, or an object with a face) are among the most common emoji and emoji ngrams. This is not what we find in our data. In fact, zero of the potential combinatoric emoji are found in several lists of the 50 most frequently used emoji (Azhar, 2017; Chalabi, 2014; Joyce, 2019), and there are zero potential emoji combinatoric sequences in the top 200 emoji bigrams, trigrams, and quadrigrams examined by Medlock and McCulloch (2016).

The non-combinatoricity of gesture should not be taken to mean that gestures cannot be found in sequences at all, however. Indeed, gestures are found in several kinds of sequences: pantomime, beats, and co-speech gesture more broadly. Beats are a subtype worth noting independently here, as the typical use of emoji is repetition, which we discuss in detail below.


Language is said to be "context-insensitive" because when listening to someone tell a story without looking at them, it is possible to understand most of what is said through the verbal channel alone. Similarly, radio and telephone conversations are readily interpretable without visuals. By contrast, gesture is said to be "context-sensitive" (McNeill, 1992) because when watching a video of the same story in a spoken language without any audio, it is much more difficult to derive meaning from the co-speech gestures bereft of speech. Similarly, when playing a video on mute, such as in a waiting room, or in the case of a silent film, it is necessary to add text, such as closed captions and banners, in order to make clear what is happening. Signed languages are also context-insensitive and do not require captions to be perfectly understandable by other signers (captions for translation purposes may be used for any language).

When it comes to emoji, determining the referent of a string of emoji is often treated as an enjoyable guessing game, rather than a complete story by itself. Although we all know that a man emoji represents a man in example 1, the context sensitivity of emoji means that without supporting context this string of emoji makes little sense:

  1. ☎️👨🏻⛵️🐳👌

Given the context that this is the opening line of Moby Dick ("Call me Ishmael.") rendered into emoji (Emoji Dick, Benenson 2010), the pantomime becomes clear, and stable referents that will recur across the narrative are established. But even after having read the section on pantomime above, in which we discuss Emoji Dick, it is unlikely that a reader would have immediately identified the generic male face emoji as the narrator Ishmael. Indeed, Emoji Dick was printed with English equivalents on the facing pages, demonstrating that the reader is not expected to be able to interpret its emoji sequences without context.

No Standards of Form

Language is said to have standards of form because there are some utterances that any speaker of a given language would agree are grammatical in that language (e.g., ‘the cat sat on the mat’) and other utterances that no speaker of the same language would consider grammatical (e.g., ‘cat the mat on the sat’). By contrast, gesture is said to have no standards of form because there is a great deal of idiosyncratic variation in gesture, even when describing the same event, and all of them are considered just as well-formed (McNeill, 1992, p. 41). In fact, an utterance can still be 'well-formed' even without any gesture at all. (Signed languages, of course, do have standards of form.)

If emoji are more like language, we should expect to see that some emoji sequences are rejected by speakers as being ill-formed, and that when speakers try to express the same information, they converge on a limited number of forms. If emoji are more like gesture, we should expect to see that speakers accept a wide range of ways of expressing the same sentiment. As an example of a natural context where speakers are all expressing the same sentiment, consider the tweets in examples 2-6 made in reply to a tweet by the popular American singer and actress Cher announcing that she would be touring in Sydney. The replies below all expressed excitement at the news:

  1. Woo ! can't wait 💖

  2. See u soon in Sydney 💋🌈🌈🌈💋

  3. lucky #mardigras can't wait to see you here in aus 🎶💃🏻💕🌈🏖

  4. Can't wait to see you onstage! ❤️😘

  5. Omg I can't wait. I'll be counting the days till then

Omitting or reordering the words in the excited replies would easily make them no longer adhere to the standards of form for English (e.g., "Woo ! can't" or "u soon in Sydney see"). However, omitting or reordering the emoji in the excited replies would not make them cease to adhere to any sort of standards of form for emoji. For example, omitting the kiss-face in (10) would produce a single heart, which is consistent with the single heart emoji in (7). Similarly, various random reorderings or zero emoji (11) are also fine. Similarly, the SwiftKey dataset cited above showed various orderings of the birthday wishes and alcoholic drinks emoji sequences. We do not expect to be able to reorder and delete words at random and produce licit linguistic sequences; however, this is exactly what is possible with emoji, showing that they pattern with gesture in not having standards of form.

Where gestures do have constraints on how they are performed, those constraints appear to be driven by the grammar of the language associated with them. For example, speakers of English and Turkish gesture about motion events in ways that reflect the encoding of manner and path in verbs (Özyürek & Kita, 1999). The finding that English emoji users tend to place a stance emoji before an object emoji may be an example of emoji structure being dependent on linguistic structure, just as gestures are, while Chinese speakers do the opposite (Ge & Herring, 2018; although the authors note that this emoji order differs from the word order of Chinese).

To best understand emoji, we need to appreciate them for their current function for the speakers we draw on in this corpus: They are not necessarily composed of meaningful units, nor do they necessarily build up into more complex units of meaning, like language does. Rather, like gesture, emoji are context-sensitive and have far more flexibility in use than language. We now turn to the different functions emoji can have.

The Different Functions of Gestures and Emoji

The most commonly used gesture categorisation was formulated by McNeill (1992), who also laid out the gesture continuum we introduced above. It builds on earlier category proposals (including Efron, 1941/1972; Kendon, 1980; McNeill, 1985). The categorical system takes into account both the form and function of a gesture. In some cases, function may be sufficient to identify a gesture's category: The extended index finger (a single form) could indicate the function of pointing or illustrating size (that something is the length of the index finger). In other cases, the primary indicator may be form: 'Beat' gestures (introduced below) are identified as much through their repeating 'beat' form as their emphatic function.

The schematisation of gesture in this way is not without criticisms, mostly framed around the attempt to universalise a highly ideosyntactic phenomenon (Bavelas, 1994; Farnell, 1994, p. 929; Feyereisen, 1994; Kendon, 2004, p. 84). Other categorizations of gesture may have features that elucidate particular aspects of emoji use, and we look forward to further work that expands upon the exact range of these parallels.

It should be noted that even though we discuss distinct categories of gesture and emoji, even McNeill acknowledges that most gestures are "multi-faceted" and should be treated as "dimensional and not categorical." A single gesture may manifest multiple semiotic dimensions (McNeill, 2005, p. 38) in its use, and we see the same for emoji.

We begin with a discussion of the four most commonly agreed upon co-speech gesture categories: illustrative, metaphoric, pointing, and beat. We then discuss the growing body of work on illocutionary gestures and their relevance to emoji, as well as the parallels between the use of gesture as part of backchannelling in face-to-face interaction and emoji backchannelling in the written domain.


Illustrative gestures refer to concrete objects. A gesture may be modelled on an object by outlining a property of its shape, use, or movement. The classic ‘and the fish was THIS big’ gesture is an example of indicating the shape and size of an object, two hands grabbing for an imaginary flopping fish is an example of illustrated action, and a flapping hand depicting how the fish got away is a gesture that illustrates movement. Although McNeill uses the term 'iconic' here, we prefer illustrative (Feyereisen & de Lannoy, 1991), which avoids confusion with icons as clickable images (such as the 'save' icon) in the context of CMC.

The majority of emoji by type (although not by usage, as we discuss elsewhere) depict concrete objects. This preponderance of concrete object emoji can be seen by examining the categories used to sort emoji on emoji keyboards, such as Smileys & Emotions, People, Animals & Nature, Food & Drink, Travel & Places, Activities & Events, Objects, Symbols, Flags (Gboard); or Smileys & People, Animals & Nature, Food & Drink, Activity, Travel & Places, Objects, Symbols, Flags (iOS default keyboard). Within these categories, all but smileys, people, and symbols typically depict concrete objects. (Even activities and events are often represented by concrete objects, such as basketball 🏀 or camera 📷, although some are also represented by a human figure performing the action, such as skier ⛷).

Most of these concrete object emoji are typically used to illustrate, whether alone or with associated text. We saw examples of birthday-related emoji above; the sports-related emoji, such as basketball 🏀, soccer ball ⚽, and hockey stick with puck 🏒, frequently illustrate tweets about the associated sport; and the food emoji frequently illustrate tweets about the relevant food (such as ‘I love pizza 🍕’ ‘avocado toast for breakfast 🥑🍞’ ‘[photo of coffee in mug] ☕️☕️☕️☕️’). However, some of the concrete object emoji, such as the eggplant 🍆 and the peach 🍑, are more typically used non-illustratively, as discussed in other sections. Human figures performing sports or professions (such as skier ⛷ or scientist 👩‍🔬) are also typically used illustratively, such as in tweets about skiing or doing science. Some hand emoji are also used illustratively, such as the writing hand ✍️ in tweets about writing or drawing, and the pinching hand emoji to indicate a small thing or amount.

In a study of emoji ngrams (the top 300 sequences of 2, 3, and 4 emoji) by users of the smartphone keyboard app SwiftKey, Medlock and McCulloch (2016) observed with surprise that the popular eggplant emoji and the smiling pile of poo emoji only showed up in homogeneous sequences (like 🍆🍆🍆, 💩💩💩), not in heterogeneous sequences with other emoji (🍆👉👌) — despite the fact that heterogenous euphemistic sequences like 👉👌 show up separately. This reinforces that the functions of the eggplant and poo emoji are predominantly emblematic, since one property of emblems in contrast with illustrative gestures is that they do not combine in sequences with other gestures (McNeill, 1992, p. 38).


Metaphoric gestures generally have the same form as illustrative gestures, but refer to abstract concepts. Metaphoric gestures tend to draw on larger cultural metaphors for understanding abstract concepts. For example, when referring to the abstract concept of 'ideas,' English speakers tend to gesturally represent 'ideas' and 'knowledge' as concrete, bounded entities. This seems entirely natural to English speakers, but it is not the only way to conceptualise knowledge. Turkana speakers from Kenya represent knowledge as a more abstract stream that emanates from the brow (Kendon, 2004, pp. 46-47).

Not everyone agrees with separation of metaphoric and illustrative gestures (Fischer, 1994) or finds it useful to separate them from other illustrative gestures (Cassell, 2007), but while they often have similar formal properties to the illustrative category, there are sufficient functional differences that are relevant to emoji that we describe them separately. One key functional difference is that metaphors are cognitively pervasive (Lakoff & Johnson, 1980), and metaphoric gestures are good evidence for this persistence even when there is no lexical evidence (Núñez & Sweetser, 2006). This adds a degree of consistency of form to metaphoric gesture use that we may also expect to see in emoji choice.

Some metaphoric uses of emoji draw on the existing metaphors in spoken language. For example, using the Top emoji 🔝 for something that is good (as in example 7, a tweet about the results of a sports game) is an extension of the metaphor that maps 'good' to ‘up’ (Lakoff & Johnson, 1980, p. 16).

  1. Bad luck to lose on penalties but we won't stop 🔝🙌😜

The bee emoji 🐝 has a range of literal and metaphorical uses, being used not just by Beyonce fans ("Queen Bey") and beekeepers, but also by people who like honey, fans of sports teams named Hornets and Wasps, and several businesses named after bees or hives. The writing hand emoji ✍️ is used both literally (illustrating tweets about writing or drawing) and metaphorically (in tweets about a player ‘signing on’ to a new sports team). The Unicode Consortium considers metaphoric potential favorably in evaluating proposals for new emoji (Unicode, 2019).


One of the most salient categories of co-speech gesture is pointing, also called deixis or deictic gestures (McNeill, 1992). Pointing is one of the first communicative skills we learn as infants (Liszkowski & Tomasello, 2011), and it is so ubiquitous we often take its complexity for granted (Cooperrider, 2011). Pointing gestures have the function of referring to locations or objects. What is being referred to may or may not be in the visual field of the interlocutors in the interaction. While many Westerners may think of index-finger pointing as the most prototypical, and indeed it is common cross-culturally, it is far from a universal (Cooperrider, Slotta, & Núñez, 2018). People from other cultures use middle-finger pointing as part of their regular pointing repertoire, for example, Arrernte speakers in Australia (Wilkins, 2003). In some cultures, such as Laos, lip pointing is more common (Enfield, 2001).

Unicode created code points for pointing index fingers going up, down, left, and right in the first version of Unicode in 1991, before the invention of emoji.[3] There is a long history of 'manicules' in printed text since the 12th century, especially to call attention to particular sections of text in the margins, which only decreased around the time that the abstract arrow → became popular (Sherman, 2005). Although the codepoints for these pointing gestures are technically not part of the emoji subset of Unicode, both finger pointing and abstract arrows are included in emoji sets and given the same graphic rendering as other emoji (☝️👇👈👉 ⬆️⬇️⬅️➡️). As with pointing in a face-to-face setting, emoji pointing involves orienting the outstretched finger towards the item of interest. In written digital communication, this requires appreciation of the spatial arrangement of different social media platforms. One common use of pointing emoji is to pair it with the word 'this,' encouraging followers or friends to read a shared post. In Facebook, a comment is located below a post and a reshare above, which influences the orientation of the selected pointing form.

Of course, we can construct pointing more broadly when it comes to spoken conversations: People can use gaze, head nods, or point with an object to guide their interlocutor's attention. Colourful emoji in general can effectively draw attention, such as the police car revolving light emoji 🚨 around a "breaking news" style announcement.


Beat gestures are easily distinguished by their repetitive up-down 'beat' pattern. They are useful for adding rhetorical emphasis, and are easy to spot in videos of political speeches. To return to a point we made at the beginning of this section, gestures can exhibit properties of multiple categories. The fact that a beat gesture is primarily distinguished by this repetitive movement means that it easy to demonstrate how it can 'blend' with other categories. To emphasise the size of a particularly large cake, the illustrative hands-stretched-out gesture could be repeated as a beat, or to emphasize a 'really great idea,' the metaphoric gesture of the hand grasping could be performed with the same repetitive action.

One use of emoji that is explicitly beat-related is when each word is followed by a clapping hands emoji, as in WHAT👏ARE👏YOU👏DOING👏. This started as an emoji representation of a beat gesture Comedian Robin Thede described as the "double clap on syllables" as part of a Night Show segment on "Black Lady Sign Language,"[4] but other African American writers were quick to point out that the gesture has a more extensive history (Brown, 2016). The emoji form spread to mainstream Twitter users unaware of its offline, African American origins starting in 2016 (LaBouvier, 2016).

Understanding the emphatic nature of beat gestures helps explain a fact about the most common strings of emoji that may be initially surprising. McCulloch and Gawne (2018) and Medlock and McCulloch (2016) looked at the most common sequences of two, three, and four emoji. In the top 200 sequences of each length, about half were pure repetition, such as two tears of joy emoji 😂😂, three loudly sobbing emoji 😭😭😭, or four red heart emoji ❤️❤️❤️❤️. The first non-repeating emoji sequences show up at #10 on the bigram list (😍😘) and #23 on both the trigram (😍😍😘) and quadrigram (😍😍😘😘) lists. Within the non-identical sequences, there remains a high degree of semantic similarity and also internal repetition. In the top 200 non-identical trigrams and quadrigrams only, over half contain a partial repetition in sequences such as aab, abb, and aba for trigrams (75.5%), and aabb, abab, aaab, abbb for quadrigrams (67.5%). (Non-identical bigrams were not counted, as they must consist of ab).

Even within entirely heterogeneous sequences, all of the top 200 non-identical sequences were thematically similar, such as heart eyes and kiss face 😍 😘 or single tear and loudly crying 😢 😢 😭 😭, strings of related objects like birthday 🎂 🎈 🎉 or junk food 🍝 🍕 🍖 🍗, and strings of hearts in different colours or sizes such as 💓 💕 💖 💞. In comparison, even in informal English, only 2.23% of tweets containing the word ‘very’ repeated the ‘very’ (Lamontagne & McCulloch, 2017): higher than formal English, but still much lower than emoji. Repetition is common for both gestures and emoji, while it is rare for words, suggesting that gestures and emoji draw from similar resources.


The co-speech gesture categories described so far all involve gestures that closely relate to the semantic content or structure of an utterance. There is another category of gestures that do not contribute to the propositional content of an utterance, but indicate the type of speech act: the intention of the speaker in saying a particular utterance or its illocutionary force (Austin, 1975). For example, excusing oneself from a meeting with a shrug indicates that it is for reasons beyond one's control (Debras, 2017), and telling someone they must leave the meeting while wiping the hand outwards indicates that the speaker will broker no disagreement (Bressem & Müller, 2014). Of course, gestures are not the only tool that people have to mark illocutionary force; lexical choice and phonetic features like intonation and stress are also used to indicate the illocutionary force of an utterance. These gestures are referred to as "pragmatic gestures" by Kendon (2004), who also includes modal and parsing functions in this category. They are not given a distinct category in McNeill (1992, 2005).

Although these gestures are not discussed as a separate category by McNeill (1992, 2005), there is a growing literature on gestures that have a "pragmatic" function that indicate the illocutionary force of an utterance (Bressem & Müller, 2014; Kendon, 2004, 2018; Neumann, 2004). Illocutionary gestures tend to be reoccuring, with some stable features, although not enough to be reliably categorised as emblems: Kendon refers to them instead as "gesture families" because of their similarities (2004, p. 281). For example, Bressem and Müller (2014) identify several types of gestures that make use of an 'away' hand trajectory and are all used with negative utterances (see also Kendon, 2004, pp. 248-264). Illocutionary gesture “families” have also been observed for precision and force (Kendon, 2004; Neumann, 2004), speech acts offering something to the addressee (Kendon, 2004), and interrogatives (Cooperrider, Abner, & Goldin-Meadow, 2018).

Dresner and Herring (2010) argue that emoticons such as the smiley :) are illocutionary force markers, such as indicating pro-social intention on the part of the speaker. In a sentence like ‘i feel sick and tired all the time :)’ the smiley emoticon is not intended to indicate that the speaker is happy, but instead functions as an indicator that the speaker is attempting not to complain. We see this pragmatic function continue, expand, and evolve with emoji and other emoji-like images such as stickers (Herring & Dainas, 2017): The various smile emoticons and emoji such as :-), :), :D, 😃, 😄, 😊, 😁 could thus be considered an illocutionary emoji "family" (in analogy with gesture families, cf. Fricke et al.; 2014; Kendon, 2004) used to indicate positive intent.

A number of emoji illocutionary markers draw their function by analogy with illocutionary gestures. The Thinking Face emoji (🤔) often marks disingenuous or sarcastic questions, or a skeptical stance. Even though there has been a dedicated Shrug emoji (🤷) since 2016, many emoji users have codified the Information Desk Person emoji (💁) that has been part of the emoji set since 2010 or the kaomoji ¯\_(ツ)_/¯ for this function. The Upside-Down Face (🙃) is still a bodily-motivated emoji, but goes a step beyond representation. It is used as a marker of ambivalence that "can effortlessly oscillate between […] seemingly disparate emotions in one compact symbol" (Solomon, 2016, n.p.). Moving away from illocutionary markers that are based on the human body, the Fire emoji (🔥) is frequently used as a positive force marker, from the positive idioms 'on fire' or 'lit'.

It is worth noting that while both gestures and emoji can have illocutionary functions, many other resources can also be used to clarify the intent of the speaker, in both the written and spoken domains. In the domain of informal internet writing, the illocutionary functions of emoji, spelling, punctuation, and social acronyms seem to be in competition with each other: When people use emoji, they employ less nonstandard spelling and punctuation (Pavalanathan & Eisenstein 2016). An analysis of data from Instagram found that people used tears of joy emoji 😂 in the kinds of sentences where they might otherwise have used lolol, lmao, lmfao, etc.; they used the heart emoji 💓 in the same contexts as strings like xoxoxox, babe, and loveyou; and they used the loudly crying emoji 😭 like they used ugh, wahhhh, omfg, and whyyy (Dimson, 2015).


Backchanneling is the response of someone listening to the speaker, which can include words ('yeah,’ 'really!?'), verbalizations ('uh huh,’ 'hmm'), or actions. Although 'backchanneling' is not usually discussed as a distinct category of gesture, gestures can perform this function (most notably the affirmative head-nod while someone else is talking), along with gaze, posture, and auditory feedback. Just as a nod or thumbs up can signal that the addressee is listening and understanding in spoken communication, or has nothing further to add to the conversation (Sherzer, 1991), emoji can serve as a backchannelling or terminating move in online interaction. Emoji offer a way to indicate active listening in text (Kelly & Watts, 2015). More than just a read receipt in a text program, emoji can acknowledge the topic or the feeling of a particular message. For example, in a Twitter search filtered to include only tweets that are replies to other tweets, a person replied to a tweet containing photos of a marine ecosystem with "Wonderful photos of a spectacular place in our gorgeous world ♥️🐠🦀🐚🐋," using the fish, shell, crab, and whale emoji to acknowledge that the topic of the original tweet is about oceans, even though the original tweet contained photos of fish and a seabird but not shells, crabs, or whales. In another search for replies on Twitter, we found that one user started by tweeting about a dramatic moment from a TV show, to which another user replied "That moment always breaks my heart," and a third user replied to the second tweet with simply "😭😭😭😭😭," acknowledging and sharing the feeling by sending loudly crying emoji. Such results are common in searches for strings of several emoji sent as replies on Twitter, with object emoji tending to indicate acknowledgement of a topic and body emoji tending to indicate acknowledgement of a feeling. One promising direction for future research is to examine how backchannelling emoji are used in private chat conversations as well.

Conclusion: The Future of Emoji as Gesture

This article has provided an initial exploration of the parallels between emoji and gesture, an exciting new avenue of theoretical exploration to better understand the role of emoji in computer-mediated interaction. Different uses of emoji have different relationships to written text, which are parallel to the different types of bodily action found on McNeill's continuum; gestures and co-speech emoji are closely integrated into meaning with the accompanying speech/text, while pantomime allows for more structured meaning to emerge, but only in a specific context. The use of emoji alongside text demonstrates similar properties of meaning to gesture: They do not decompose into smaller morphological units, they do not show predictable syntax, their meaning is shaped by context-specific use, and there is accepted variation in form. Finally, the different functions of gestures documented in the literature of that field show similarities to different uses of text-accompanying emoji, which provides a new way of considering the different communicative functions of emoji.

One limitation of this study has been the predominantly qualitative analysis of individual examples. Quantitative studies may show that some functions we describe are more or less frequently used in particular genres or on particular platforms, or that certain emoji are acquiring emblem status in particular internet communities. We have also focused specifically on emoji, but there is no reason to assume that this kind of analysis could not be used to elucidate the function of emoticons, GIFs, or other features of the online communicative environment. When the environment permits, language wants to be multimodal, whether that be via speech and gesture, text and illustrations, or text and graphicons.

We are also aware that this study has focused on a single, commonly used gesture classification schema. We deliberately focused on the tradition that centres on the work of McNeill (1992, 2005) and Kendon (2004), as this is a dominant paradigm in gesture studies, and it takes into account both formal and functional features of gestural communication. This is not to say that other approaches to gesture might not also yield novel and illuminating insights into the use of emoji.

This analysis has also focused specifically on the use of emoji in English. Just as there is cross-cultural variation in gesture, we also anticipate cross-cultural variation in emoji. There is some promising work on this in Chinese; de Seta (2018) argues that emoji are just part of a larger set of resources including stickers, GIFs, and custom images that people remix, while Ge and Herring (2018) argue that Chinese use of emoji sequences is moving further towards the 'language' end of the properties we see in analogy to McNeill's gesture continuum. We also acknowledge that there are some uses of emoji that do not fit our analysis, such as the occasional use of rebus, or Herring and Dainas's (2017) category of 'riffing,’ where people send each other emoji or other image content that relates as a form of play. Just as the hands and the body can be employed for more than gesturing, we look forward to the ever-expanding range of functions for which people will deploy emoji.

Although emoji are more constrained than gesture, we do not see this as a limitation of our theory, but rather as an acknowledgement that different domains have different constraints, and even within those constraints speakers find creative ways to return multimodality to communication. Emoji use is constrained by the limits of what is included in the character set sanctioned by Unicode or a specific platform (although these do not have the flexibility and stability of the standard emoji set).

The paradigm of emoji as digital gesture has potentially important ramifications for the Unicode Consortium's emoji proposal and approval process. The Consortium has a dedicated Emoji Subcommittee,[5] which has been responsible for expanding the number of emoji in Unicode from the original set of 176 in 1999 to 2,784 in Unicode v. 11.0, and the demand for further emoji expansion shows no sign of slowing down. The Unicode 1.0 Emoji set contained 20 handshapes, many of them recognisable as emblematic gestures, including the 'Raised Fist' ✊, 'Raised Hand' ✋ and 'Oncoming Fist' 👊. Unicode 11.0, released in June 2018, includes at least 29 hand-only emblems, seven hand-and-body emblems, and a number of facial actions that can be considered emblematic, such as winking or sticking out your tongue. The analysis of emoji as gesture can help inform future emoji encoding by incorporating evidence from names for emblem gestures and even typological surveys of common gestures across cultures.

The parallels between emoji and gesture furthermore provide a new argument against the media hyperbole about emoji criticized by Schnoebelen (2018); that is, the idea that emoji are becoming or replacing language. This article not only illustrates the ways in which the majority of current uses of emoji are more like gesture than verbal language, but also provides clear criteria for how emoji would have to be used to move towards being more language-like, thanks to McNeill's (1992, 2005) criteria for how gesture are different from language.

Finally, the intersection of emoji studies with gesture studies points towards an interesting framework for analyzing other aspects of CMC. It is natural to extend the digital gesture analogy to other graphical additions to text, including plain-text emoticons, platform-specific stickers, and animated GIFs, which are already readily compared with emoji (see, e.g., Herring & Dainas, 2017). But gestures act in concert with two other aspects of speech: individual sounds (phonemic or segmental information) and broader tone of voice (prosodic, intonational, or supersegmental information). It would be interesting to see future internet linguistic research continue bringing together various subfields of linguistics, such as an acoustic analysis of specifically what is involved when informal internet punctuation is interpreted as tone of voice or intonation (e.g., Lamontagne & McCulloch, 2017 on the phonetic feasibility of repeated letters like ‘nooooo’ or ‘sameeee’), or supplementing the corpus data that is so common in internet linguistics with experimental data (see, e.g., Heath, 2018 on how all-caps affects the interpretation of various kinds of emotions). We look forward to seeing further work on how particular informal written features have specific communicative effects within the broad domain of tone, gesture, and intention.


We would like to thank the two anonymous reviewers for their thoughtful feedback on this paper and Susan Herring for her attentive editorial hand. Thanks to the folks at SwiftKey for data used in this paper and Jeffrey Lamontagne for discussion regarding the beat gesture section. This work was funded by patrons of Lingthusiasm and by La Trobe University.


  1. http://www.unicode.org/consortium/consort.html http://www.unicode.org/reports/tr51/index.html

  2. http://www.emojination.org, accessed 8 March, 2018.

  3. http://www.unicode.org/versions/Unicode1.0.0/

  4. In the March 17th, 2016 segment of The Night Show on "Black Lady Sign Language," comedian Robin Thede describes how black women "double clap on syllables" for emphasis. http://www.youtube.com/watch?v=34PjKtcVhVE

  5. http://www.unicode.org/emoji/


Austin, J. L. (1975). How to do things with words. Oxford: Oxford University Press.

Ayers, J. W., Caputi, T. L., Nebeker, C., & Dredze, M. (2018). Don't quote me: Reverse identification of research participants in social media studies. npj Digital Medicine, 1(1), article 30. doi: 10.1038/s41746-018-0036-2

Azhar, H. (2017). Top emojis of World Emoji Day. Emojipedia Blog. Accessed 21 May, 2019 from https://blog.emojipedia.org/top-emojis-of-world-emoji-day/

Bavelas, J. B. (1994). Gestures as part of speech. Research on Language and Social Interaction, 27(3), 201-222.

Benenson, F. (2010). Emoji Dick. Lulu.

Breckinridge Church, R., Kelly, S., & Holcombe, D. (2014). Temporal synchrony between speech, action and gesture during language production. Language, Cognition and Neuroscience, 29(3), 345-354.

Bressem, J., & Müller, C. (2014). The family of AWAY-gestures. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill, & J. Bressem (Eds.), Body-language-communication: An international handbook on multimodality in human interaction (pp. 1592-1604). Berlin, Boston: De Gruyter: Mouton.

Brown, K. (2016, April 6). Your Twitter trend analysis is not deep, and it's probably wrong. Jezebel. Available at http://jezebel.com/your-twitter-trend-analysis-is-not-deep-and-it-s-proba-1769411909

Burge, J. (2019). Correcting the record on the first emoji set. Emojipedia Blog. Accessed 25 March, 2019 from https://blog.emojipedia.org/correcting-the-record-on-the-first-emoji-set/

Cassell, J. (2007) Body language: Lessons from the near-human. In J. Riskin (Ed.), Genesis redux: Essays in the history and philosophy of artificial intelligence (pp. 346-374). Chicago: University of Chicago Press.

Chalabi, M. (2014). The 100 most-used emojis. FiveThirtyEight. Accessed 21 May, 2019 from https://fivethirtyeight.com/features/the-100-most-used-emojis/

Cohen, A. A., & Harrison, R. P. (1973). Intentionality in the use of hand illustrators in face-to-face communication situations. Journal of Personality and Social Psychology, 28(2), 276-279.

Cooperrider, K. (2011). Reference in action: Links between pointing and language. Unpublished doctoral dissertation, University of California, San Diego.

Cooperrider, K., Abner, N., & Goldin-Meadow, S. (2018). The palm-up puzzle: Meanings and origins of a widespread form in gesture and sign. Frontiers in Communication, 3. doi: 10.3389/fcomm.2018.00023.

Cooperrider, K., Slotta, J., & Núñez, R. (2018). The preference for pointing with the hand is not universal. Cognitive Science, 42(4), 1375-1390. doi: 10.1111/cogs.12585

Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press.

Danesi, M. (2016). The semiotics of emoji: The rise of visual language in the age of the internet. London: Bloomsbury Publishing.

Davies, M. (2008-2019). The corpus of contemporary American English (COCA): 560 million words, 1990-present. Available at https://www.english-corpora.org/coca/

Davis, M. (2016). Unicode & emoji. Talk presented at EmojiCon 1, San Francisco, November 5. Available at https://unicode.org/emoji/slides.html

Davies, M. (2018-2019). The 14 billion word iWeb Corpus. Available at https://www.english-corpora.org/iweb/.

Debras, C. (2017). The shrug: Forms and meanings of a compound enactment. Gesture, 16(1), 1-34.

de Seta, G. (2018). Biaoqing: The circulation of emoticons, emoji, stickers, and custom images on Chinese digital media platforms. First Monday, 23(9). Available at https://firstmonday.org/article/view/9391/7566

Donato, G., & Paggio, P. (2017). Investigating redundancy in emoji use: Study on a Twitter based corpus. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 118-126.

Dresner, E., & Herring, S. C. (2010). Functions of the nonverbal in CMC: Emoticons and illocutionary force. Communication Theory, 20(3), 249-268.

Dürscheid, C., & Siever, C. M. (2017). Jenseits des Alphabets–Kommunikation mit Emojis. Zeitschrift für germanistische Linguistik, 45(2), 256-285.

Efron, D. (1941/1972). Gesture, race and culture; A tentative study of the spatio-temporal and "linguistic" aspects of the gestural behavior of eastern Jews and southern Italians in New York City, living under similar as well as different environmental conditions. The Hague: Mouton.

Emogi Research Team (2016). 2016 emoji report. Available at https://cdn.emogi.com/docs/reports/2016_emoji_report.pdf

Enfield, N. J. (2001). 'Lip-pointing': A discussion of form and function with reference to data from Laos. Gesture, 1(2), 185-211.

Evans, V. (2017). The emoji code: How smiley faces, love hearts and thumbs up are changing the way we communicate. London: Michael O'Mara Books.

Farnell, B. (1994). Ethno-graphics and the moving body. Man, 29(4), 929-974.

Feyereisen, P. (1994). Hand and mind: What gestures reveal about thought (Review). The American Journal of Psychology, 107(1), 149-155.

Fischer, S. D. (1994). Hand and mind: What gestures reveal about thought (Review). Language, 70(2), 345-350.

Fricke, E. (2013). Towards a unified grammar of gesture and speech: A multimodal approach. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill, & S. Teßendorf (Eds.), Body – language – communication. An international handbook on multimodality in human interaction (pp. 733-754). Berlin; Boston: De Gruyter Mouton.

Furlan, J. (2014). 18 books perfectly described using emojis. BuzzFeed. Accessed 21 May, 2019 from https://www.buzzfeed.com/juliafurlan/book-emoji-heart-emoji

Gawne, L. (2015). Emoji deixis: When emoji don't face the way you want them to. Superlinguo. Accessed 15 August, 2018 from www.superlinguo.com/post/130501329351/emoji-deixis-when-emoji-dont-face-the-way-you

Ge, J., & Herring, S. C. (2018). Communicative functions of emoji sequences on Sina Weibo. First Monday, 23(11). Available at https://firstmonday.org/ojs/index.php/fm/article/view/9413/7610

Goldin-Meadow, S. (2002). Getting a handle on language creation. In T. Givon & B. F. Malle (Eds.), The evolution of language out of pre-language (pp. 342-374). Amsterdam: John Benjamins.

Goodwin, C. (2007). Environmentally coupled gestures. In S. D. Duncan, J. Cassell, E. T. Levy (Eds.) Gesture and the dynamic dimensions of language: Essays in honor of David McNeill (pp. 195-212). Amsterdam: John Benjamins.

Graham, J. A., & Heywood, S. (1975). The effects of elimination of hand gestures and of verbal codability on speech performance. European Journal of Social Psychology, 5(2), 159-195.

Grenoble, L. A., Martinović, M., & Baglini, R. (2014). Verbal gestures in Wolof. In Selected Proceedings of the 44th Annual Conference on African Linguistics (pp. 110-121). Somerville, MA: Cascadilla Press.

Habets, B., Kita, S., Shao, Z., Özyurek, A., & Hagoort, P. (2011). The role of synchrony and ambiguity in speech–gesture integration during comprehension. Journal of Cognitive Neuroscience, 23(8), 1845-1854.

Heath, M. (2018). Orthography in social media: Pragmatic and prosodic interpretations of caps lock. Paper presented at the Linguistic Society of America annual meeting, Salt Lake City, January 4-7.

Herring, S. C., & Dainas, A. R. (2017). "Nice picture comment!" Graphicons in Facebook comment threads. In Proceedings of the Fiftieth Hawai'i International Conference on System Sciences (HICSS-50). Los Alamitos, CA: IEEE.

Herring, S. C., & Dainas, A. R. (2018). Receiver interpretations of emoji functions: A gender perspective. In S. Wijeratne, E. Kiciman, H. Saggion & A. Sheth (Eds.), Proceedings of the 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA, USA, 25-JUN-2018 (Emoji2018). Available at http://ceur-ws.org/Vol-2130/paper5.pdf

Highfield, T., & Leaver, T. (2016). Instagrammatics and digital methods: Studying visual social media, from selfies and GIFs to memes and emoji. Communication Research and Practice, 2(1), 47-62.

Iverson, J. M., & Goldin-Meadow, S. (1998). Why people gesture when they speak. Nature, 396(6708), 228.

Jenkins, T., Coppola, M., & Coelho, C. (2018). Effects of gesture restriction on quality of narrative production. Gesture, 16(3), 416-431.

Joyce, G. (2019). The most popular emojis. Brandwatch. Accessed 21 May, 2019 from https://www.brandwatch.com/blog/the-most-popular-emojis/

Kelly, B. F. (2014). Temporal synchrony in early multi-modal communication. In I. Arnon, M. Casillas, C. Kurumada, & B. Estigarribia (Eds.), Language in interaction: Studies in honor of Eve V. Clark (pp. 117-138). Amsterdam: John Benjamins.

Kelly, R., & Watts, L. (2015). Characterising the inventive appropriation of emoji as relationally meaningful in mediated close personal relationships. Paper presented at Experiences of Technology Appropriation: Unanticipated Users, Usage, Circumstances, and Design, Oslo, Norway, September 20.

Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), The relationship of verbal and nonverbal communication (pp. 207-227). The Hague; New York: Mouton.

Kendon, A. (2000). Language and gesture: Unity or duality? In D. McNeill (Ed.), Language and gesture: Window into thought and action (pp. 47-63). Cambridge: Cambridge University Press.

Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.

Kendon, A. (2018). Pragmatic functions of gestures. Gesture, 16(2), 157-175.

Kita, S., Alibali, M.W., & Chu, M. (2017). How do gestures influence thinking and speaking? The gesture-for-conceptualization hypothesis. Psychological Review, 124(3), 245-266. doi: 10.1037/rev0000059

Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16-32. doi: 10.1016/S0749-596X(02)00505-3.

Krauss, R. M., Chen, Y., & Gottesman, R. F. (2000). Lexical gestures and lexical access: A process model. In D. McNeill (Ed.), Language and gesture: Window into thought and action (pp. 261-283). Cambridge: Cambridge University Press.

Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

LaBouvier, C. (2017, May 16). The clap and the clap back: How twitter erased black culture from an emoji. Motherboard. Available at http://motherboard.vice.com/en_us/article/jpyajg/the-clap-and-the-clap-back-how-twitter-erased-black-culture-from-an-emoji

Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: Chicago University Press.

Lamontagne, J., & McCulloch, G. (2017). Wayyyy longgg: Orthotactics & phonology in lengthening on Twitter. Paper presented at the Linguistic Society of America annual meeting, Austin, TX, January 5.

Liszkowski, U., & Tomasello, M. (2011). Individual differences in social, cognitive, and morphological aspects of infant pointing. Cognitive Development, 26(1), 16-29.

McCulloch, G. (2015). Twitter thread beginning: "New rule: anyone who wants to say that emoji are language must make that assertion entirely in emoji. Should be no problem if they're right." Available at https://twitter.com/GretchenAMcC/status/601077165300523008

McCulloch, G. (in press). Because internet: Understanding the new rules of language. New York: Riverhead Books.

McCulloch, G., & Gawne, L. (2018). Emoji grammar as beat gestures. In S. Wijeratne, E. Kiciman, H. Saggion, & A. Sheth (Eds.), Proceedings of the 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA, USA, 25-JUN-2018 (Emoji2018). Available at http://ceur-ws.org/Vol-2130/short1.pdf

McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92(3), 350-371.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: The University of Chicago Press.

McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.

Medlock, B., & McCulloch, G. (2016). The linguistic secrets found in billions of emoji. Paper presented at SXSW, March 11-20, Austin, TX. Available at http:// www.slideshare.net/SwiftKey/the-linguistic-secrets-found-in-billions-of-emoji-sxsw-2016-presentation-59956212

Miyake, K. (2007). How young Japanese express their emotions visually in mobile phone messages: A sociolinguistic analysis. Japanese Studies, 27(1), 53-72.

Morris, D., Collett, P., Marsh, P., & O'Shaughnessy, M. (1979). Gestures: Their origins and distribution. London: Jonathan Cape.

Na'aman, N., Provenza, H., & Montoya, O. (2017). Varying linguistic purposes of emoji in (Twitter) context. In Proceedings of ACL 2017, Student Research Workshop, 136-141.

Neumann, R. (2004). The conventionalization of the ring gesture in German discourse. In C. Müller & R. Posner (Eds.), The semantics and pragmatics of everyday gestures (pp. 217-223). Berlin: Weidler.

Núñez, R. E., & Sweetser, E. (2006). With the future behind them: Convergent evidence from Aymara language and gesture in the crosslinguistic comparison of spatial construals of time. Cognitive Science, 30(3), 401-450.

Özyürek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn & S. C. Stoness (Eds.), Twenty-first Annual Conference of the Cognitive Science Society (pp. 507-512). London: Erlbaum.

Pavalanathan, U., & Eisenstein, J. (2016). More emojis, less :) The competition for paralinguistic function in microblog writing. First Monday, 21(11). Available at https://firstmonday.org/ojs/index.php/fm/article/view/6879/5647

Pohl, H., Domin, C., & Rohs, M. (2017). Beyond just text: Semantic emoji similarity modeling to support expressive communication. ACM Transactions on Computer-Human Interaction, 24(1), article 6.

Schandorf, M. (2013). Mediated gesture: Paralinguistic communication and phatic text. Convergence, 19(3), 319-344.

Schnoebelen, T. (2018, June 25). Emoji are great and/or they will destroy the world. Keynote presented at 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA.

Sherman, W. H. (2005). Toward a history of the manicule. Owners, annotators and the signs of reading. London: Oak Knoll Press.

Sherzer, J. (1991). The Brazilian thumbs‐up gesture. Journal of Linguistic Anthropology, 1(2), 189-197.

Solomon, J. (2016, October 2). Upside-down smiley face. Lexical Items. Accessed 27 February 2018 from www.lexicalitems.com/blog/2016/10/02/upside-down-smiley-face

Steinmetz, K. (2014). TIME exclusive: Here are rules of using emoji you didn't know you were following. Time Magazine. Accessed May 21, 2019 from http://time.com/2993508/emoji-rules-tweets/

Sugiyama, S. (2015). Kawaii meiru and Maroyaka neko: Mobile emoji for relationship maintenance and aesthetic expressions among Japanese teens. First Monday, 20(10). Available at https://journals.uic.edu/ojs/index.php/fm/article/view/5826

Tatman. R. (2016, December 7). Do emojis have their own syntax? Making Noise & Hearing Things. Accessed 6 August 2018 from http://makingnoiseandhearingthings.com/2016/12/07/do-emojis-have-their-own-syntax/

Tatman, R. (2018a). What you can, can't and shouldn't do with social media data. Paper presented at Joint Statistical Meetings, Vancouver BC, July 28. Accessed 6 August 2018 from http://www.rctatman.com/talks/social-media-jsm

Tatman, R. (2018b, July 7). Are emoji sequences as informative as text? Making Noise & Hearing Things. Accessed 6 August 2018 from http://makingnoiseandhearingthings.com/2018/07/07/are-emoji-sequences-as-informative-as-text/

Thurlow, C. (2006). From statistical panic to moral panic: The metadiscursive construction and popular exaggeration of new media language in the print media. Journal of Computer-Mediated Communication, 11(3), 667-701.

Unicode. (2019). Submitting emoji proposals. The Unicode Consortium (updated March 11, 2019). Accessed May 31, 2019 from http://unicode.org/emoji/proposals.html

van Loon, E., Pfau, R., & Steinbach, M. (2014). The grammaticalization of gestures in sign languages. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill, & S. Tessendorf (Eds.), Body – language – communication: An international handbook on multimodality in human interaction (pp. 2133-2149). Berlin: De Gruyter Mouton.

Wagner, M. (2016). How to be kind with prosody. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Speech prosody 2016, 1 (pp. 250-1253). Urbana IL: Speech Prosody Special Interest Group (SProSIG).

Wilkins, D. (2003). Why pointing with the index finger is not a universal (in sociocultural and semiotic terms). In K. Sotaro (Ed.), Pointing: Where language, culture and cognition meet (pp. 171-215). Hillsdale, NJ: Lawrence Erlbaum.

Zappavigna, M., & Zhao, S. (2017). Selfies in 'mommyblogging': An emerging visual genre. Discourse, Context & Media, 20, 239-247.

Biographical Notes

Lauren Gawne [l.gawne@latrobe.edu.au] is a David Myers Research Fellow in Linguistics at La Trobe University, Melbourne, Australia. Her research focuses on grammar and gesture, both in English and in Tibetic languages of Nepal.

Gretchen McCulloch [contact@lingthusiasm.com] is an Internet Linguist. She is a regular columnist at Wired, co-hosts Lingthusiasm, and is the author of Because Internet: Understanding the New Rules of Language (Riverhead Books).


Any party may pass on this Work by electronic means and make it available for download under the terms and conditions of the Digital Peer Publishing License. The text of the license may be accessed and retrieved at http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_en_06-2004.html.