Sunday, December 14, 2014

UserProfile taxonomy properties not getting crawled ??

I just discovered a pretty strange behaviour. And I can't really find this limitation mentioned anywhere so i thought i would share it.

I had this custom taxonomy userprofile property that remained empty after a full search crawl.
Other taxonomy properties in the same profile looked just fine. My custom taxonomy property remained empty, no matter what.
Now this happened on 2 of about 9000 users, so it should work.

So i started removing one field at a time from the profile, reran a incremental search on the profile.
So finally after removing the 5 terms in the "About me" field, my custom taxonomy property suddenly got crawled.
Then i kept adding back one term at a time. And when adding back the 4th term my property was not beeing crawled.
Maybe the term was corrupted, so i tried removing it, and adding another term and got the same result..

Then i counted the number of terms on the problem profile like this:
using (SPSite site = new SPSite("https://whatever"))
{
    UserProfileManager manager = new UserProfileManager(SPServiceContext.GetContext(site));
    UserProfile userProfile = manager.GetUserProfile("domain\\whatever");
    IEnumerator<ProfileSubtypeProperty> e = userProfile.Properties.GetEnumerator();
    int count = 0;
    while (e.MoveNext())
    {
        UserProfileValueCollection values = userProfile[e.Current.Name];
        List<Term> terms = new List<Term>(values.GetTaxonomyTerms());
        count += terms.Count;
    }
    Console.WriteLine("Number of terms: " + count);
}

Number of terms: 31

So if i removed one term, my custom taxonomy user profile property got crawled again.
It seems like there is a limit of max 30 terms per user, more than 30 will not be picked up
by the crawler..

I have no idea if it is possible to increase this limit. But if the limit is there for a reason
then saving the profile should at least give an error.

If you move the userprofile property up in CA, this gives the property higher priority when getting crawled.
So if it is important that the property should be crawled, then move it up.

Here is some code that counts number of terms in all user profiles, so that yo can indentify profiles
that is not getting crawled correctly:

using (SPSite site = new SPSite("https://whatever"))
{
    UserProfileManager manager = new UserProfileManager(SPServiceContext.GetContext(site));
    foreach (UserProfile userProfile in manager)
    {
        IEnumerator<ProfileSubtypeProperty> e = userProfile.Properties.GetEnumerator();
        int count = 0;
        while (e.MoveNext())
        {
            UserProfileValueCollection values = userProfile[e.Current.Name];
            List<Term> terms = new List<Term>(values.GetTaxonomyTerms());
            count += terms.Count;

        }
        if (count > 30)
        {
            Console.WriteLine("Account: " + userProfile.AccountName + ", has " + count + " number of terms");
        }
    }
}

Sunday, September 21, 2014

Taxonomy Update Scheduler not working ?


Let's try to understand why the Taxonomy Update Scheduler job stops working.


Taxonomy Update Scheduler Job (UpdateHiddenListJobDefinition)
This update job is responsible to update the terms in a hidden list which is located in each site collection.
The reason for this is speed, it's faster to query a internal list than the managed metadata service.
But of course this complicates things, keeping things synced are always a challenge.

The job is executed every hour, so if you rename a term, it can take one hour before the changes
are updated in your site collections.

There is one update job per web application.

TaxonomyHiddenList
You can find the list here:
[site collection url]/Lists/TaxonomyHiddenList/AllItems.aspx

It's a standard list which contains one row per term.
One term can look like this:
Hidden list item

Search uses this hidden list too, the fields named CatchAllData and CatchAllDataLabel.
So if the search results are wrong, then the hidden list aren't up to date because of trouble with the Taxonomy Update Scheduler job.


Troubleshooting
Anyway i had this term that did not update in my site collection no matter what i did, so i had to find out why.

If you Google this problem you probably end up with this script:

$url = "https://yoursite"
$site=Get-SPSite $url
[Microsoft.SharePoint.Taxonomy.TaxonomySession]::SyncHiddenList($site)
$site.dispose()

And of course running this script definitly works.
If you rename a term in the termstore and run this script, the taxonomy hidden list is updated.
But the update job still doesn't work after running this script..
This method is very cpu intensive and updates all terms in the hidden list by querying the termstore for every term.

Well we can't really manually run this update every time we make changes to terms.

I even saw someone that had created a custom timer job, calling this sync method instead of using the builtin Taxonomy Update Scheduler job, this is not recomended at all. It's a resource hog.
The update job only updates terms that has changed.


So let's try to find out why it isn't working:

I first checked the ULS after executing the update job and saw this:
  • Entering monitored scope (Timer Job UpdateHiddenListJobDefinition). Parent NoName=Timer Job UpdateHiddenListJobDefinition
  • Site Enumeration Stack:    at Microsoft.SharePoint.Administration.SPSiteCollection.get_Item(Int32 index) ..snip
  • Site Enumeration Stack:    at Microsoft.SharePoint.Administration.SPSiteCollection.get_Count() ..snip
  • Site Enumeration Stack:    at Microsoft.SharePoint.Taxonomy.UpdateHiddenListJobDefinition.Execute(Guid targetInstanceId) ..snip
  • Updating SPPersistedObject PersistedChangeToken Name=086eb3a5-8529-4a57-a9f7-3171294aa483TimeOfLastTaxonomyUpdate5d8ae25a-b6b3-470c-a0a2-513260cc8ce5. Version: 1453655 Ensure: False, HashCode: 20286682, Id: f2c91ae7-27fb-43e0-9d25-217a6dca71ff,    Stack:    at Microsoft.SharePoint.Taxonomy.UpdateHiddenListJobDefinition.SetChangeToken(String databaseKey, PartitionSettings proxySettings, String value) ..snip
  • Leaving Monitored Scope (Timer Job UpdateHiddenListJobDefinition). Execution Time=69.0284659083766
(WCF  calls stripped)

So after wasting several hours pursuing these errors, well let me tell you this: this is NORMAL.
Pfft another exampel of Microsoft polluting the ULS with crap to keep the consultants world wide busy.

I tried a lot of things to get this working, checking rights etc. but nothing worked on the problem term.
Then I decided to create a new term, and use that in a site collection. It appeared in the taxonomy hidden list. I renamed it in the term store, executed the update job. And voila the term was updated ??
So new terms works, "old" terms does not work..


So i started looking at the timer job to get an idea what it was doing:

There is a method beeing called named: GetChangesForListSync
which in turn Calls this stored proc: proc_ECM_GetChangesForListSync
in the Managed Meta Data database.

This proc loads data mainly from a table called ECMChangeLog.
Every time you change a term, it will be recorded in this table.
I checked this table and both my working and non working term was logged to this table as changed.

Then it joins on a table called ECMUsedTerms.
So here i found out that my working term had an entry in this table, and the non working term was missing an entry in this table.
Great, finally getting somewhere!

Ok so i started looking at the procs for this table: proc_ECM_AddUsedTerm
This is used in TaxomomyField.AddTaxonomyGuidToWssCore

So when you look at the TaxomomyField.ValidateTaxonomyFieldValue function which in the end calls the function TaxomomyField.AddTaxonomyGuidToWssCore.
This function will only be called if the term is new, if it exists in the hidden list on the site it will use the wssid of that term and the ECMUsedTerm table is never updated.

Ok, looking some more at the stored procedures for this ECMUsedTerm table, i found this:
proc_ECM_ClearUsedTerms

This proc is called from TermStore.UpdateUsedTermsOnSite(SPSite site)
Okay... i have never seen or heard about this function at all :)

Anyway i made this script and executed it:

$url = "https://yoursite"
$site=Get-SPSite $url
$tax = Get-SPTaxonomySession -Site $url
$ts = $tax.TermStores[0]
$ts.UpdateUsedTermsOnSite($site)
$site.dispose()

Make sure you are a sitecollection admin when executing this script.
There is a check for this in the function.

And if i now looked at the ECMUsedTerm table it had way more rows than before, and my problematic term finally had an entry in the table.
Now renaming my problem term was updating just fine by the taxomomy update scheduler job.

Ok so what has happened here ??

There are at least two scenarios that can cause this:

Scenario 1:
If you recreate the Managed Metadata Service with a clean database, importing the terms with the same id's then of course the ECMUsedTerm table is empty / information is lost.
Your solution will still work using taxonomy pickers etc. and you will probably never notice anything unless you change a term.
(This was the reason for my troubles, i had to recreate the MMS because of the bug in SharePoint 2013 which sometimes makes the MMS corrupt).

Scenario 2:
If you move the content database from one farm to another (maybe from staging to prod) which of course has a new Managed Metadata Service, then the ECMUsedTerm table is of course not in sync.

I'm not sure what happens if you do a site restore, if this takes care of this sync.. But i have my doubts.


And of course there are probably several other reasons for the hidden list's getting out of sync with the termstore

Anyway lesson learned:
If the terms are not updating when executing the Taxonomy Update Scheduler job then execute UpdateUsedTermsOnSite for every site collection.