We help you speak English clearly.
Free Speech Lesson

English communication

How Long Does It Take to Get Accurate and Clear English Speech?

How Long Does It Take to Get Accurate and Clear English Speech?


   On average it takes  about 70 days of practice every day to change or form a daily life habit (Frothingham, 2019).   The range for daily life habits is 18 to 254 days according to a 2009 article in the European Social Psychology  by  Lally et al.  It depends on the habit.  For example, it takes a shorter number of days to form the habit of drinking a glass of water with breakfast than to automatically engage in daily performing 50 push-ups before breakfast.   Notice the phrase in the first sentence  “ it takes about 70 days of practice every day.”  It goes without saying, the amount of time to acquire a habit also depends upon the person.

    Speech which is easily understood by others is called Clear Speech.   Acquisition of clear American English (AE), Clear Speech has been investigated since the 1920s for native-born AE speakers (Denes & Pinson, 1993, Smiljanic & Bradlow, 2009). Not so much investigation has been reported in journals for nonnative-born speakers for American English (Smiljanic & Bradlow, 2009).

      Antonia Johnson put together the research for her dissertation  in 2000 for the mode or style of AE Clear Speech.  That mode or style of speaking includes greater speech volume or loudness,  aiming for accuracy for all words, clear enunciation of consonants and vowels, and not slurring words together.  These strategies are consistent with the Task Dynamic Model of Motor Control and Task Dynamic Model of Speech Production (Kelso & Tuller, 1984, Saltzman et al, 2010, also see Parrell et al, 2018). 

    Both Casual Speech (“every day” speech)  and Clear Speech  are styles or modes of talking.  So both Casual Speech and Clear Speech  are coordinated manners of talking.  The difference between these modes ia comparable to the differences for muscles and movement in human walking compared to running.

        For native-born AE speakers, the characteristic of enunciating speech sounds when doing the Clear Speech mode or style targets feature enhancement of AE speech sounds (Kelso & Tuller, 1984,  Smiljanic &Bradlow, 2018).  For instance,  native-born speakers of North American English automatically emphasize the lengthier duration in time for the SH speech sound compared to the CH English speech sound, and they emphasize the lengthier duration and the two speech sounds for the English long vowel O compared to the short vowel O.

      Importantly, when nonnative-born speakers engage in the strategies of Clear Speech, unless they are deliberately taught which English speech sounds require feature enhancement and how to produce accurate AE speech consonants and vowels (not produced the same in their mother language)  the result is not greater speech intelligibility or understandability, Instead the result is only louder words and sentences in which the speech sounds continue to be errors for American English and continue to  match the speech sounds from their home or mother language.

       Our work at Clear Talk Mastery has found that  forming the habit of using the strategies of Clear Speech along with the requisite or needed feature enhancement  (also called accurate enunciation or pronunciation)  for the 23 or 25 consonant and 14 vowel sounds on average takes 70 days. Learned and habitualized is the dynamic task of the Clear Talk Mode in American English.  For intelligibility and understandability, those skills are most important.  

       Once those highest priority habits have been acquired, then other skill sets can be added systematically. That’s because proficiency in English intelligibility or understandability and communication includes other skills, notably core skills for pronouncing multiple syllable words. These skills include English written word syllable division rules and patterns.    Crucial also for AE proficiency is acquiring AE word syllable accent stress for multiple syllable words.   American English word syllable accent stress is different than, for example, Spanglish,  Chinglish, Indian English or  South African English. 

     High priority for a wide variety of persons at their workplace is acquiring the American English speech characteristics of voice inflection in sentences ranging from a few sentences needed for talking in a meeting to many sentences in a presentation. The overarching purpose for the dynamic task of the speaking style with voice inflection is to enhance or improve the listener’s memory and understanding of information.   

   The most practical purpose of using the voice inflection patterns is so that your speaking is not monotone and boring.

    The economic  and career advancing purpose is that experts say that voice inflection (and asking questions) are the two skills most important for native-born persons for advancing their career.  We think the same is true for nonnative-born persons for their career.  So we teach that skill in Level Two and higher levels.

      Because information giving in meetings or explaining information in English is a minimal requirement for all work situations requiring English, mastering the characteristics of optimal or good presentations are  the logical next step after the acquisition of the core American English speech skills for intelligibility and using the Clear Talk Mode of speaking which includes accurate enunciation.    Fact is, if listeners cannot understand the words you are saying in American English, what good is having voice inflection and  mastering the characteristics of optimal presentations?

        How long does it take to form the habits of clear and accurate English?   We break those habits into systematic and ordered skill sets and adhere to the average 70 days to form a habit for that speaking tool set.  For more detail,   http://www.cleartalkmastery.com/blog/2023/03/17/assessment-why-bother/

   Later, we’ll get to the keys and secrets to how to speed up the process of acquiring  accurate and clear American English speech along with other critical English speech skill sets for English speech communication proficiency.  Hint—one is needs assessment and skills assessment.  Another is distributed learning.  Another is purposeful deliberate practice.  And there really are 15 dimensions for successful acquisition of clear English speech.

Assessment– Why Bother?

Article 5 English Speech Assessment? Why Bother?


How important is assessment for  successful acquisition of clear American English (AE) speaking?  If we didn’t care about efficiency of learning, not important at all.   Your money is worth a lot, but your time is worth even more.  Important is determining  nonnative-born individuals’ pronunciation for the 23 (some count 25) consonants and the 14 AE vowels (some count more), their knowledge of pronunciation rules and their current manner of talking.  Easy to recognize is that in all spoken languages there are  consonants and vowels which are pronounced the same as American English, others that are different.  If the instructor (teacher, coach, tutor) and the student-learner know the errors for AE speech sounds and pronunciation rules, then instruction and learning can put disproportionate and more time to acquiring the AE pronunciation for errors and deficient skills with more efficiency and less time.  “Thus, you know what to fix and what doesn’t need fixing.”  Also, we also know what is the appropriate Level of Course for each person.

Critical is to assess or test all of the AE speech sounds, the most important pronunciation rules and the manner of talking  Critical also is to assess or determine sources of the speech errors, including underlying physical differences, such as vocal strengt, range speech volume or loudness, and vocal flexibility.

We use the term  “English speech communication and intelligibility.”   Other terms used for decades include “Accent Reduction” or “Accent Modification”  or English Pronunciation.  What is “accent”?  It is a pattern of speaking.  Twenty-three languages of the roughly 7,000 languages in the world’s 196 countries are spoken by more than half of the world’s population, according to Ethnologue and The Intrepid Guide, 2022.  Also there are a multitude of Englishes. The 2018  CIA World Factbook  “Field Listing-Languages” reported that  58 sovereign states and 28 non-sovereign entities use English as their official language.

Fact is, many nonnative-born speakers of English or persons who have English as a Second Language (ESL), or English as an Other Language (ESOL) are using the pronunciation of consonants and vowels from their  mother-tongue  (the language they started speaking at about age one to four and beyond).   Even if the individual is from a country where English is the official language, the pronunciation and other physiological characteristics of speech are not the same as American English speech.

For example, a prevalent and frequent  difference in the  pronunciation of consonants and vowels in other languages compared to American English is the duration of the speech sound.  Specifically 70% of AE speech consonant and vowel sounds are double in duration of time (“slow”) compared to the quick or short in duration consonants and vowels.  Other languages frequently speak the same consonants and vowels in a quicker or shortened duration compared to American English.  For instance, prevalent is nonnative speakers pronouncing V, TH, M or N  much more quickly than American English speech.  Or the first language could make the speech sound more lengthy or slower.  For example, in Spanish, the consonant sound CH is pronounced slowly, like the AE speech sound SH.

Not only that, the general stiffness or tension of the speech articulator muscles  or the force of contraction (especially tongue, lips, jaw and muscles in the throat attached to the vocal folds) is a recognized feature of speech production (Gracco, 1994).   Based on the articulatory acoustics (the “sound characteristics” of consonants and vowels) our observations and reports from nonnative speakers,  American English has differences  compared to other languages for speech articulator muscle tension and force of contractions  in addition to critical differences for position of the tongue, lips, teeth and jaw.  Muscular features can be inferred from an oral assessment  of speech that tests all of the consonants and vowels in American English and uses sentences designed to control for coarticulation effects. 

Task Dynamic Model of Speech Production focuses on the dynamics of human speech in that speech production, including clear English speech production, is a coordinated action (Kelso and Tuller, 1984,  Saltzman et al, 2010, Parrell, B. et al 2018).  Specifically, American English and clear American English speech are examples of  manner or mode or style of speaking. The Central Nervous System (CNS) and especially the brain, dictates in a complex way the stiffness or tension of the muscles, the force of the muscles, the activation of motor neuron units and slow twitch and fast twitch muscle fibers,  the duration of the speech sounds, and the coordination with the voicing at the vocal folds in the larynx of the throat.  For more detail see Article 3 “Task Dynamic Model of Speech Production” – link here.

Initial diagnostic assessment tells the student-learner and the instructor/teacher/coach what to focus on for efficient acquisition of clear American English speech.  We’ll come back to more on this later.

To circle back — 90 sovereign and non-sovereign entities have English as their official language,  includjng India, Australia, Nigeria, Great Britain.  British English matches most frequently American English pronunciation except for notably the American English short vowel A, short vowel O, and consonant R.  For the other Englishes, there are multiple differences for duration of the consonants and vowels, the movement or the articulators (tongue, teeth, lips and jaw), and  the volume or loudness of consonants especially at the ends of words or syllables.  These differences put together are called “accented English.”  Put simply, the more heavily the English is accented or the more differences in the speech production features compared to American English, the more difficult it is for native-born American persons and other internationally born speakers to understand the nonnative-born speaker.  That’s called intelligibility (understandability).

Back to the topic of “Why bother with oral speech assessments? ” Vitally important are mid-course assessments to determine the change in pronunciation of all of the AE consonants and vowels, skill for pronunciation rules and patterns,  and manner of talking.  Is there an improvement  in AE intelligibility (understandability)?  Which AE speech sounds have improved and which sounds have not.  Is the instruction and practice working for the individual like it works for most people?  Thus at 3 weeks and 6 weeks of the 10 week instruction course, we do another assessment using an equivalent phonetically balanced test (10 different assessment tests).  Thus, the instruction and home practice/direct practice and focus on deliberate practice in daily life (taking every opportunity to deliver clear American English) can be modified.   Since on average it takes  70 days of practice everyday to change a habit (Frothingham, 2019) –in this case from accented English to clear American English speech– the end of the course assessment (10 weeks of coached instruction), is essential to determine intelligibility change.  Assessment, especially after 10 weeks, is critical to measuring efficacy or success of the course and the methodology, and measuring speech changes which accompany specific changes in instruction.

As a sidebar,  our initial diagnostic assessment also includes determining intelligibility of the student-learner when talking with background noise.  That’s because  all humans, especially  those in professional roles that call for extended speaking such as teachers/professora, supervisors, ministry, tech people in collaboration, leaders, etc. need to be understood in large rooms or where there is background noise.

Sidebar number two-  our initial diagnostic assessment includes a segment where we do a brief     (about 25 minutes)  training of the student-learner of the Six Clear Talk Strategies used by American English talkers when they want to be easily understood.  Also the brief training includes critical enunciation instruction for clear American English, such as where to position or place the tongue for particular consonants and which AE speech sounds are quick and which have lengthier durations in time.  Then we assess the student-learner on a different equivalent phonetically balanced test to determine how well they learn the strategies with added enunciation instruction.   That information tells us a great deal about student-learners: How well do they learn from auditory instruction?  How do they respond to the (dynamic) task of speak clearly using these strategies with the added enunciation training for American English.   This gives us a leg-up or advantage to making the instruction for the coached course for each individual   even more efficient.

And the initial diagnostic assessment answers the question of prognosis for the student-learner for the methodology of Clear Talk Mastery. In other words, with that brief training, did the student-learner measure better on the intelligibility test after the brief training compared to before the training?  What speech sounds improved, and what are the likely sources or reasons for  speech sounds and intelligibility not improving for American English after the training?

To circle back to the initial question, how important is assessment for successful acquisition of clear American English?  Our answer — scientifically based English speech assessment is critical  for several reasons.   Most importantly, initial diagnostic assessment and mid-course assessments make for more efficient learning.  Crucial for our instruction is also long-lasting learning – more about that later.  Post course assessment  examines the  efficacy or success of the learning in our clear American English speech training program.   It goes without saying that to determine success or efficacy requires comparison to skills and assessment before the instruction- the initial diagnostic assessment. The key question for post course assessment  is “Does the Clear Talk Mastery program work or not?”  and what are the successes and failures. That’s part of our Action Research—keep doing what works and change what doesn’t work  (after you have tested it on a multiple people, not just one person!).  Training and instruction improvement is one goal.  Discovering what to change or keep for efficient and long-lasting American English, — that’s the other target for assessment.  Can instruction and learning get better with using assessments and Action Research?  We bet our life and work on that.

copyright Clear Talk Mastery, Inc 2023 Antonia L. Johnson

Pronunciation Tactics or Techniques To Speed Up Learning Clear English Speech

Why you should grow tongue muscle fibers using pronunciation tactics or techniques to most efficiently acquire and maintain clear American English speaking.


Understand this: By the time  native-born children are 4 to 5 years old, they typically have a 1,500 to 2,200-word expressive vocabulary (Barnes, 2022).  They pronounce most sounds correctly but may still have trouble with TH, R, S, L, V, CH, SH, and Z.  At 8-years-old, native-born children have mastered all speech sounds as well as rate, pitch and volume and are capable of carrying on a conversation with an adult (Stewart, 2022).    Notice that even for native-borns, the TH and L are not acquired accurately by 4 to 5 year old children who have been talking for 4 years!

Now for the topic at hand. Specific speaking tactics and exercises have speeded up learning and increased accuracy of English speech sounds for our student-learners. How do we know? Not just because we hear that, but because that is measured by assessments.

If you know the “why” you will understand the “how.” 

For skeletal muscles (tongue muscles are skeletal), there are two kinds of muscle fibers, slow twitch muscle fibers and fast twitch muscle fibers.  Scientific evidence indicates average percentage of slow twitch muscle fibers in human tongue is 54% — two-year-olds and adults (Sanders et al 2013).  

Most English consonant and vowel sounds have an extended duration in time, double or more, compared to the quick consonants or vowels.  Additionally, when  the task is to speak clearly,  English talkers do feature enhancement—they extend the duration of speech sounds (the slow consonants and lengthier duration vowels) and  range of articulator movements (which is congruent with the task-dynamic model of speech production—Kelso & Tuller, 1984). 

Getting the long duration English consonants (16 of 24 total consonant sounds) and vowels ( 9 of 14 vowel sounds) and  mastering a different position of articulators for clear easy to perceive English speech sounds is challenging, to say the least,  to the nonnative speaker. That two pronged skill is so critical, we teach it right away.   Of course for some speech sounds, the positioning and speed of  the articulators ( tongue, lips, teeth, jaw, vocal folds/chords) are the same  as for other languages.   It’s where English is different that makes the challenge.

For example TH both voiced and not voiced and L are high error speech sounds for nonnative speakers.

To produce clear, easy to understand  TH  or L speech sounds requires the tongue to be extended forward and for the duration of the speech sound to be extended  for at least double or greater duration in time than a quick English sound such as the consonant sound D.   With the eye, humans can’t see the slow twitch muscle fibers in the tongue.  But it stands to reason that slow twitch muscle fibers are activated to push the tongue blade forward and to extend out or stretch out the tongue blade so the tip extends to the front of the mouth.

To systematize the new learning and to simplify  (and because it works!), we teach the position of the tongue tip for the TH sounds and the L consonant sound to be the same.  That is, push forward  the tip of the tongue so it goes between the upper and lower front teeth or, better yet, to touch the lower lip. 

Consonants TH and L are slow in speed and duration of the speech sound is lengthier than the quick consonants.  The action of pushing the tongue tip all the way to the position of between upper and lower teeth or to touch the lower lip gives sensory feedback to the brain when the target has been reached.  Critically, it takes time— milliseconds— for that tongue action which adds to the duration in time of the TH and L English speech sounds.

Thus you as speaker are taking advantage of biomechanical characteristics of movement of the tongue to extend the duration of the speech sound for the slow consonants TH and L.  It stands to reason that your brain processes the task of pushing your tongue forward to  the lower lip or between top and bottom front teeth  and activates exactly the correct slow twitch muscle fibers.  The central nervous system and the slow twitch muscle fibers must learn this pattern for easy to perceive North American English consonants TH and L.  To make that tongue gesture and movement habitual takes much repeated practice.

Take home message for today, to acquire accurate American English pronunciation requires a tongue forward position for the consonants TH voiced and unvoiced and for L  (and for the American short vowel A).   The same is true for maintaining  the accurate pronunciation for these speech sounds and maintaining the strength of those slow muscle fibers in the tongue needed for these speech sounds.  The key  for acquiring accuracy and maintaining speech sound accuracy is activating the slow muscle fibers to push forward and stretch forward the tongue—that stretching and lengthening the tongue blade not only grows the slow twitch muscle fibers but also biomechanically lengthens the duration of the speech sound when coordinated with voicing at the vocal folds.

Seeing and hearing is understanding.

Below is our speech tip 4 for WORLD— see the pronunciation for L. Hmm, picture says “PAPER.” Unfortunate that YouTube made a mistake for the picture– but click on this YouTube video for WORLD and L. You’ll be glad you did!

Copyright 2022 by Clear Talk Mastery, Inc

15 Dimensions For Acquiring Clear English Speech

Why would you want to read this article?  It’s for people who are a little intense about getting the best out of learning. 

Intro to the 15 Dimensions for Acquiring Clear English Speech

Methodology of Clear Talk Mastery Courses

Physics has its M string theory, with eleven dimensions – the explanation and theory behind all things “physics.”  We submit that acquisition of clear North American English speech has fifteen dimensions.

For nonnative-born speakers of North American English speech (adults), prior learning of English is typically five to seven years.  Thus they are not newbies with zero knowledge.

The time has come to communicate detail for the methodology of  Clear Talk Mastery courses which is achieving gains with student-learners previously unheard of.  Even with us, the additional consistent gains of student-learners in the last two years has surpassed and surprised us.  In the preceding twenty years, student-learner made great gains – but between 2020 and beginning of 2023, the gains have even significantly surpassed those.

Why the big leap forward?   Actively since 2017,  my dream has been to put everything we have learned together and come up with coherent theory for how best to facilitate acquisition of clear North American English.  I’ve done lots and lots of thinking, putting ideas, our experience, and findings from scientific assessments together.  Especially the last 18 months was the delving back into the seminal and current research on as many of the 15 dimensions as possible.   The great leap came because I was doing all that thinking, researching, and working on Edition 4 of three of our textbooks so as to get all that information down on paper (in the textbooks!).

How to innovate?   That is what we have strived for since 2000—to find as many ways as possible to help nonnative-born adult speakers of English acquire clear English efficiently for long lasting learning.  To paraphrase Confucius:  Reflection is gold, imitation is quickest, and experience the most painful.  Innovation requires all three avenues.  For imitation, humbly remember that we are all standing on the shoulders of many others who have come before us.  For experience,  if you are not making enough mistakes (and feeling bad about it), then you are not innovating.   Reflection takes oodles of time.  Like Einstein said, “If I had an hour to solve a problem, I would spend 55 minutes thinking about the problem and five minutes thinking about the solution.”  In 2017, we had reached 17 years of providing instruction for successful clear speaking mastery of North American English and it was then I started my big thinking  to codify and describe the many dimensions which are described in this article.

For this article, rejoice–  you get the end of the story first.

Below are the 15 dimensions for a person with English as a second language to successfully acquire clear North American English speech communication.  Success is defined as efficient and long lasting learning.

  1. Critical importance of assessment– initial diagnostic assessment to determine needs, then mid course and end of course assessment to monitor progress, redirect goals and methodology for the course(s), and also use the end-of-course assessment to scientifically assess the success for different strategies, tactics,  and methodologies. 
  2. From the beginning of instruction, training for six Clear Talk Strategies and four Tactics.  The 6 Strategies were derived from decades of previous research about the difference between characteristics of clear compared to casual English,  the characteristics of speech sounds accurately perceived as North American English, and clear speaking training used for improving the speech intelligibility of person’s speaking English.
  3. Systematic learning for positioning and action of articulators,  and coordination between articulator systems (e.g. vocal folds and voicing coordinated with positioning and action especially of articulators tongue, lips, teeth and jaw), 
  4. Muscle strengthening  for requisite slow and fast muscle fibers (MF)  needed for North American English accurate pronunciation especially for the tongue, lips, jaw and muscles attached to the vocal folds (for voicing).  Muscle strengthening is associated with growth of numbers of slow and fast MFs.  We used both direct articulator exercise and modes of clear talking to specifically grow and strengthen requisite speech muscles for North American English,
  5. Systematic learning for sequencing of English speech communication skills, especially English speech intelligibility  (e.g. speech sound accuracy before adding learning skills for word syllable accent stress and voice inflection of sentences). 
  6. Using the categories of coordinative structures or coordinated modes observed during 20 years of instruction, which we give the terms  Workout Clear Talk Mode, Careful Leveled-up Clear Talk Mode and Leveled- Up Clear Talk Mode– all of which are conscious but get easier with lots of practice, 
  7. Cognitive learning (learning rules and patterns)  to “bootstrap” physical learning and enhance memory for the complex procedural and multidimensional learning needed for English speech intelligibility and speech communication,
  8. Employing mastery learning principles (80 to 90% mastery before graduating to the next module’s learning) which also adheres to the well known principle of  “don’t add too much learning too quickly.”
  9. Distributed learning or spaced learning for the procedural learning associated with the complexity of intelligible North American English  and for  long lasting learning–    “What’s the good of efficient learning if you forget everything within months (or years) of finishing your instruction.”
  10.  Combining skills to level-up communication proficiency including, for example, combining thinking information and talking clearly at the same time or combining voice inflection patterns during sentences/connected speech and high English speech sound articulation accuracy,  
  11.  Speech feedback to the student-learner and standards of speech production which train high level attention and increased duration of time for attention to the task of accuracy–  “It’s not practice makes perfect, but perfect practice makes perfect,”
  12.  Student-learner’s commitment to the program of learning  (e.g. 12 consecutive weeks, etc)  –“It takes 70 days of practice every day to change a habit”, for example,  changing Spanglish, Chinglish,  Indian English,  Vietnamese English, Arabic English, etc., to clear North American English speech,  
  13.  Student-learner’s personal involvement in doing direct practice homework, deliberate practice in daily life, and focused attention during coachings– plus using enhanced tutoring via  24/7 video and audio lessons for direct home practice,
  14.  Student-learner’s belief in the learning principles explained  by the coach/instructor and in the textbook, videos, audios, 
  15.  Student-learner attachment to wanting to go to the next level of communication in English and notably the process of learning, interaction, encouragement with another human person who is coach or instructor.

It is all of these 15 dimensions which contribute to the exceptional progress of  our  student-learners for acquiring clear North American speech communication.   Note especially that without attributes of the student-learner for commitment, involvement, belief, and attachment, exceptional gains for acquisition of clear North American English speech could not be achieved.

©Clear Talk Mastery, Inc. 2023

How to Develop Lifetime Customers or Clients

How to Develop Lifetime Customers or Clients

If you have a product or a service you sell, don’t think you are done when you have made the sale.  The sale is actually the beginning of building a client relationship that will lead to a lifetime of repeat business and referrals.

In addition, by listening and responding to their needs, you are adding value by thinking beyond what you can sell them and showing an interest in other aspects of their life and business.  “How are things going with you?” “How can I help?” Talk with emotion and feeling– even on the phone people can hear that. This approach to being of service sets you apart from others in your industry.