# [SOLVED] Find repeated words and highlight them



## ShaileshShinde (Sep 29, 2010)

Hello Team,
I would like to create an macro which will highlight all repeated words [word length is what user needs to enter before running the macro] in full document and highlight all repeated word leaving the first one un-highlighted.

Can you please provide any references or sample for this?

Thanks,
Shailesh


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Hi Shailesh,

If you want to find repeated words, I can't see how inputting the word length is especially helpful. A Find/Replace can do as you ask without the need to specify the word length. A spell-check will also find repeated words.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hello,

The word length is required so that the macro should not find the words like [is, the,of,to...]. it should ignore these words and highlight the others which were repeated.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Hi Shail,

It would be better to have the exlusion words in a list, so that you don't miss other words that should be highlighted. For example, if you set the minimum length to 3, you'd not find & highlight Mr Mr.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Ah! yes, you were right. I need to do this using macro or .vbs so that it can be run on either one or many files.


Thanks
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Hi Shail,

Try the following 'HilightDocumentDuplicates' macro. All you should have to do is to point the macro to a folder, then let it run. All documents in the folder will be processed. The exlusions list is defined by the 'StrExcl' variable in the 'ConcordanceBuilder' sub.

```
Option Explicit
Dim SBar As Boolean
Dim TrkStatus As Boolean
Dim StrFnd As String

Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Store current Track Changes status, then switch off
With ActiveDocument
  TrkStatus = .TrackRevisions
  .TrackRevisions = False
End With
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & wdDoc.Name
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp & " " & StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        'Highlight the 2nd word
        .Duplicate.Words.Last.HighlightColorIndex = wdBrightGreen
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore original Track Changes status
ActiveDocument.TrackRevisions = TrkStatus
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub

Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function

Sub ConcordanceBuilder(wdDoc As Document)
Dim StrIn As String, StrTmp As String, StrExcl As String
Dim i As Long, j As Long, k As Long
'Define the exlusions list
StrExcl = "a,am,and,are,as,at,be,but,by,can,cm,did,do,does,eg,en,eq,etc," & _
          "for,get,go,got,has,have,he,her,him,how,i,ie,if,in,into,is,it,its," & _
          "me,mi,mm,my,na,nb,no,not,of,off,ok,on,one,or,our,out,re,she,so," & _
          "the,their,them,they,t,to,was,we,were,who,will,would,yd,you,your"
With wdDoc
  'Get the document's text
  StrIn = .Content.Text
  'Strip out unwanted characters
  For i = 1 To 255
    Select Case i
      Case 1 To 38, 40 To 64, 91 To 96, 123 To 144, 147 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Convert to lowercase
  StrIn = " " & LCase(Trim(StrIn)) & " "
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
  Next
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    'Find how many occurences of each word there are in the document
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    k = j - UBound(Split(StrIn, " "))
    'If there's more than one occurence, add the word to our Find list
    If k > 1 Then
      StrFnd = StrFnd & " " & StrTmp
    End If
    j = UBound(Split(StrIn, " "))
  Next
End With
End Sub
```


----------



## ShaileshShinde (Sep 29, 2010)

Hi,
I am getting an error message at below line...

SBar = Application.DisplayStatusBar

Run-time error 4605
This method or property is not available because a document window is not active....for line of code....
Application.DisplayStatusBar = SBar


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Where are you running the code from? The error message suggests you may not have any open documents. See: You receive run-time error 4248, 4605 or 5941 when you try to change properties on an unopened document in Word. Although the documents being processed are not made visible, that shouldn't cause an error on either of the lines you mentioned.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hello,

I resolved this issue with the help of above mentioned link. However, when I ran this macro it does not highlight the repeated words and no error message has been thrown.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Hi Shail,

The code works for me. Did you select a folder to process? Even though the code as posted works fine, here's a slightly improved version of the 'HilightDocumentDuplicates' sub:

```
Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & strFile
  ' Store current Track Changes status, then switch off
  TrkStatus = wdDoc.TrackRevisions
  wdDoc.TrackRevisions = False
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp & " " & StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        'Highlight the 2nd word
        .Duplicate.Words.Last.HighlightColorIndex = wdBrightGreen
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  ' Restore original Track Changes status
  wdDoc.TrackRevisions = TrkStatus
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub
```


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Yes, I have selected the folder. This does not highlight the repeated word and also no error messages.
I open the word app with default blank document and press ALT+F8 run the macro. it ask for folder path selected the folder and the script runs, but when i open the document from the folder. no repeated words gets highlighted.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Perhaps you could attach one of the documents you're having trouble with to a post here (use the paperclip on the 'Go Advanced' tab).


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hi,

I have attached the sample doc file for this.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

I do not see any repeated words (eg EditPlus EditPlus) in that document. Perhaps you could clarify your expectations.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hi,

The repeated words here means it can appear anywhere in the document. for example...[Document template offers] these three words has appears twice in the document. so, these needs to be highlighted.

Sorry, if my question is not clear to you.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Try this version of the 'HilightDocumentDuplicates' sub:

```
Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long, bFnd As Boolean
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & strFile
  ' Store current Track Changes status, then switch off
  TrkStatus = wdDoc.TrackRevisions
  wdDoc.TrackRevisions = False
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    bFnd = False
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        If bFnd = True Then
          .Duplicate.HighlightColorIndex = wdBrightGreen
        End If
        bFnd = True
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  ' Restore original Track Changes status
  wdDoc.TrackRevisions = TrkStatus
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub
```


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hi,

I have debug the updated code and still found that repeated words are not getting highlighted. Can't figure out why this is not getting highlighted. I am using office 2007 but that's not an issue i think.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Hi Shail,

Are any words getting highlighted? If some are and others aren't have you checked whether the missed words are in the exclusions list? In my testing, the code works for any 'normal' word that's not in the exclusions list.

You've got some terms in your document that don't count as 'normal' words, though, such as C/C++. Words like that won't be processed correctly by my code because (a) the pre-processing strips out the / and + characters) and treats the two Cs as separate words. I can code around that, but it would be useful if you could tell me what particular words are causing problems.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Hi,
None of the repeated words are getting highlighted. I have cross checked with the exclusion list. For example...the word "document" is appearing many times. However, none of these words get highlighted. Also, no issue if "C/C++" does not get highlighted.

Thanks,
Shail


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

It does when I run the code. See attached. Below is a revised version of the 'ConcordanceBuilder' sub. I've added an inclusions list for expressions that get munged by the pre-processing.

```
Sub ConcordanceBuilder(wdDoc As Document)
Dim StrIn As String, StrTmp As String, StrIncl As String, StrExcl As String
Dim i As Long, j As Long, k As Long
'Define the exlusions list
StrExcl = "a,am,and,are,as,at,b,be,but,by,c,can,cm,d,did,do,does,e,eg,en,eq,etc,f," & _
          "for,g,get,go,got,h,has,have,he,her,him,how,i,ie,if,in,into,is,it,its,j,k,l," & _
          "m,me,mi,mm,my,n,na,nb,no,not,o,of,off,ok,on,one,or,our,out,p,q,r,re,s,she,so," & _
          "t,the,their,them,they,t,to,u,v,w,was,we,were,who,will,would,x,y,yd,you,your,z"
'Define an inclusions list for terms that otherwise don't survive the initial cleanup
StrIncl = "c/c++,c#"
With wdDoc
  'Get the document's text
  StrIn = .Content.Text
  'Strip out unwanted characters
  For i = 1 To 255
    Select Case i
      Case 1 To 38, 40 To 44, 46 To 64, 91 To 96, 123 To 144, 147 To 149, 152 To 171, 174 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Convert to lowercase
  StrIn = " " & LCase(Trim(StrIn)) & " "
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    While InStr(StrIn, " " & Split(StrExcl, ",")(i) & " ") > 0
      StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
    Wend
  Next
  StrIn = Replace(StrIncl, ",", " ") & StrIn
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    'Find how many occurences of each word there are in the document
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    k = j - UBound(Split(StrIn, " "))
    'If there's more than one occurence, add the word to our Find list
    If k > 1 Then
      strFnd = strFnd & " " & StrTmp
    End If
    j = UBound(Split(StrIn, " "))
  Next
End With
End Sub
```


----------



## macropod (Apr 11, 2008)

*Re: Find repeated words and highlight them*

Attached is a Word document with the working macro code. I've simplified the code a bit, by removing the progress report - it doesn't work when the visibility of the documents being processed is set to False.


----------



## ShaileshShinde (Sep 29, 2010)

*Re: Find repeated words and highlight them*

Thanks a lot!
I have tested this and it works as per the requirement.

Thanks,
Shail


----------

