Sunday, December 14, 2014

UserProfile taxonomy properties not getting crawled ??

I just discovered a pretty strange behaviour. And I can't really find this limitation mentioned anywhere so i thought i would share it.

I had this custom taxonomy userprofile property that remained empty after a full search crawl.
Other taxonomy properties in the same profile looked just fine. My custom taxonomy property remained empty, no matter what.
Now this happened on 2 of about 9000 users, so it should work.

So i started removing one field at a time from the profile, reran a incremental search on the profile.
So finally after removing the 5 terms in the "About me" field, my custom taxonomy property suddenly got crawled.
Then i kept adding back one term at a time. And when adding back the 4th term my property was not beeing crawled.
Maybe the term was corrupted, so i tried removing it, and adding another term and got the same result..

Then i counted the number of terms on the problem profile like this:
using (SPSite site = new SPSite("https://whatever"))
{
    UserProfileManager manager = new UserProfileManager(SPServiceContext.GetContext(site));
    UserProfile userProfile = manager.GetUserProfile("domain\\whatever");
    IEnumerator<ProfileSubtypeProperty> e = userProfile.Properties.GetEnumerator();
    int count = 0;
    while (e.MoveNext())
    {
        UserProfileValueCollection values = userProfile[e.Current.Name];
        List<Term> terms = new List<Term>(values.GetTaxonomyTerms());
        count += terms.Count;
    }
    Console.WriteLine("Number of terms: " + count);
}

Number of terms: 31

So if i removed one term, my custom taxonomy user profile property got crawled again.
It seems like there is a limit of max 30 terms per user, more than 30 will not be picked up
by the crawler..

I have no idea if it is possible to increase this limit. But if the limit is there for a reason
then saving the profile should at least give an error.

If you move the userprofile property up in CA, this gives the property higher priority when getting crawled.
So if it is important that the property should be crawled, then move it up.

Here is some code that counts number of terms in all user profiles, so that yo can indentify profiles
that is not getting crawled correctly:

using (SPSite site = new SPSite("https://whatever"))
{
    UserProfileManager manager = new UserProfileManager(SPServiceContext.GetContext(site));
    foreach (UserProfile userProfile in manager)
    {
        IEnumerator<ProfileSubtypeProperty> e = userProfile.Properties.GetEnumerator();
        int count = 0;
        while (e.MoveNext())
        {
            UserProfileValueCollection values = userProfile[e.Current.Name];
            List<Term> terms = new List<Term>(values.GetTaxonomyTerms());
            count += terms.Count;

        }
        if (count > 30)
        {
            Console.WriteLine("Account: " + userProfile.AccountName + ", has " + count + " number of terms");
        }
    }
}

Sunday, September 21, 2014

Taxonomy Update Scheduler not working ?


Let's try to understand why the Taxonomy Update Scheduler job stops working.


Taxonomy Update Scheduler Job (UpdateHiddenListJobDefinition)
This update job is responsible to update the terms in a hidden list which is located in each site collection.
The reason for this is speed, it's faster to query a internal list than the managed metadata service.
But of course this complicates things, keeping things synced are always a challenge.

The job is executed every hour, so if you rename a term, it can take one hour before the changes
are updated in your site collections.

There is one update job per web application.

TaxonomyHiddenList
You can find the list here:
[site collection url]/Lists/TaxonomyHiddenList/AllItems.aspx

It's a standard list which contains one row per term.
One term can look like this:
Hidden list item

Search uses this hidden list too, the fields named CatchAllData and CatchAllDataLabel.
So if the search results are wrong, then the hidden list aren't up to date because of trouble with the Taxonomy Update Scheduler job.


Troubleshooting
Anyway i had this term that did not update in my site collection no matter what i did, so i had to find out why.

If you Google this problem you probably end up with this script:

$url = "https://yoursite"
$site=Get-SPSite $url
[Microsoft.SharePoint.Taxonomy.TaxonomySession]::SyncHiddenList($site)
$site.dispose()

And of course running this script definitly works.
If you rename a term in the termstore and run this script, the taxonomy hidden list is updated.
But the update job still doesn't work after running this script..
This method is very cpu intensive and updates all terms in the hidden list by querying the termstore for every term.

Well we can't really manually run this update every time we make changes to terms.

I even saw someone that had created a custom timer job, calling this sync method instead of using the builtin Taxonomy Update Scheduler job, this is not recomended at all. It's a resource hog.
The update job only updates terms that has changed.


So let's try to find out why it isn't working:

I first checked the ULS after executing the update job and saw this:
  • Entering monitored scope (Timer Job UpdateHiddenListJobDefinition). Parent NoName=Timer Job UpdateHiddenListJobDefinition
  • Site Enumeration Stack:    at Microsoft.SharePoint.Administration.SPSiteCollection.get_Item(Int32 index) ..snip
  • Site Enumeration Stack:    at Microsoft.SharePoint.Administration.SPSiteCollection.get_Count() ..snip
  • Site Enumeration Stack:    at Microsoft.SharePoint.Taxonomy.UpdateHiddenListJobDefinition.Execute(Guid targetInstanceId) ..snip
  • Updating SPPersistedObject PersistedChangeToken Name=086eb3a5-8529-4a57-a9f7-3171294aa483TimeOfLastTaxonomyUpdate5d8ae25a-b6b3-470c-a0a2-513260cc8ce5. Version: 1453655 Ensure: False, HashCode: 20286682, Id: f2c91ae7-27fb-43e0-9d25-217a6dca71ff,    Stack:    at Microsoft.SharePoint.Taxonomy.UpdateHiddenListJobDefinition.SetChangeToken(String databaseKey, PartitionSettings proxySettings, String value) ..snip
  • Leaving Monitored Scope (Timer Job UpdateHiddenListJobDefinition). Execution Time=69.0284659083766
(WCF  calls stripped)

So after wasting several hours pursuing these errors, well let me tell you this: this is NORMAL.
Pfft another exampel of Microsoft polluting the ULS with crap to keep the consultants world wide busy.

I tried a lot of things to get this working, checking rights etc. but nothing worked on the problem term.
Then I decided to create a new term, and use that in a site collection. It appeared in the taxonomy hidden list. I renamed it in the term store, executed the update job. And voila the term was updated ??
So new terms works, "old" terms does not work..


So i started looking at the timer job to get an idea what it was doing:

There is a method beeing called named: GetChangesForListSync
which in turn Calls this stored proc: proc_ECM_GetChangesForListSync
in the Managed Meta Data database.

This proc loads data mainly from a table called ECMChangeLog.
Every time you change a term, it will be recorded in this table.
I checked this table and both my working and non working term was logged to this table as changed.

Then it joins on a table called ECMUsedTerms.
So here i found out that my working term had an entry in this table, and the non working term was missing an entry in this table.
Great, finally getting somewhere!

Ok so i started looking at the procs for this table: proc_ECM_AddUsedTerm
This is used in TaxomomyField.AddTaxonomyGuidToWssCore

So when you look at the TaxomomyField.ValidateTaxonomyFieldValue function which in the end calls the function TaxomomyField.AddTaxonomyGuidToWssCore.
This function will only be called if the term is new, if it exists in the hidden list on the site it will use the wssid of that term and the ECMUsedTerm table is never updated.

Ok, looking some more at the stored procedures for this ECMUsedTerm table, i found this:
proc_ECM_ClearUsedTerms

This proc is called from TermStore.UpdateUsedTermsOnSite(SPSite site)
Okay... i have never seen or heard about this function at all :)

Anyway i made this script and executed it:

$url = "https://yoursite"
$site=Get-SPSite $url
$tax = Get-SPTaxonomySession -Site $url
$ts = $tax.TermStores[0]
$ts.UpdateUsedTermsOnSite($site)
$site.dispose()

Make sure you are a sitecollection admin when executing this script.
There is a check for this in the function.

And if i now looked at the ECMUsedTerm table it had way more rows than before, and my problematic term finally had an entry in the table.
Now renaming my problem term was updating just fine by the taxomomy update scheduler job.

Ok so what has happened here ??

There are at least two scenarios that can cause this:

Scenario 1:
If you recreate the Managed Metadata Service with a clean database, importing the terms with the same id's then of course the ECMUsedTerm table is empty / information is lost.
Your solution will still work using taxonomy pickers etc. and you will probably never notice anything unless you change a term.
(This was the reason for my troubles, i had to recreate the MMS because of the bug in SharePoint 2013 which sometimes makes the MMS corrupt).

Scenario 2:
If you move the content database from one farm to another (maybe from staging to prod) which of course has a new Managed Metadata Service, then the ECMUsedTerm table is of course not in sync.

I'm not sure what happens if you do a site restore, if this takes care of this sync.. But i have my doubts.


And of course there are probably several other reasons for the hidden list's getting out of sync with the termstore

Anyway lesson learned:
If the terms are not updating when executing the Taxonomy Update Scheduler job then execute UpdateUsedTermsOnSite for every site collection.

Monday, December 23, 2013

SharePoint 2013, search driven lightweight article scheduling


I'm currently working on a search driven intranet. And one requirement was no approval of articles
but with a lightweight article scheduling.
That means, the "unpublished" articles should still be available in standard search, but removed from all rollups.


Problem 1:
The builtin support for article scheduling requires approval to be turned on.

Problem 2:
The builtin approval logic removes the page entirely from the standard search.

Problem 3:
The SharePoint search index does not include null values. So it's not possible to get the articles
with no scheduling (null values).

Problem 4:
The KQL ignores the timeportion of DateTime fields when filtering the results.

So how can we work around these limitations ?
We can't unpublish the pages with a custom timer job,  because that will remove it from standard search.
So the pages needs to be in a published state even after it has been expired.

First of all, approval was turned off, which hides the PublishingStartDate and the PublishingExpirationDate fields in the pages list.
This will hide the fields when you edit the properties of an item. But this is ok, we can still use the fields in our custom layout pages, even if it is hidden.

As i said null values in items are excluded from the search index. So there is no way we can filter on null values. (No start schedule is null, no end schedule is null).
So we need some values that can represent null values.
And we need fields that are not hidden, hidden fields are excluded entirely by the search index.
And don't try to unhide PublishingStartDate / PublishingExpirationDate it will simply not work.

So what we need to do is to create two "hidden" fields that looks like this:

<Field ID="{SOMEGUID}"
     Name="MySchedulingStartDate"
     DisplayName="$Resources:My,field_schedulingstart;"
     StaticName="MySchedulingStartDate"
     Type="DateTime"
     Format="DateTime"
     Group="My Intranett"
     ShowInDisplayForm="FALSE"
     ShowInEditForm="FALSE"
     ShowInFileDlg="FALSE"
     ShowInNewForm="FALSE"
     ShowInVersionHistory="FALSE"
     ShowInViewForms="FALSE"
     ShowInListSettings="TRUE"
     Hidden="FALSE"
     Required="FALSE">
</Field>

<Field ID="{SOMEGUID}"
     Name="MySchedulingEndDate"
     DisplayName="$Resources:My,field_schedulingend;"
     StaticName="MySchedulingEndDate"
     Type="DateTime"
     Format="DateTime"
     Group="My Intranett"
     ShowInDisplayForm="FALSE"
     ShowInEditForm="FALSE"
     ShowInFileDlg="FALSE"
     ShowInNewForm="FALSE"
     ShowInVersionHistory="FALSE"
     ShowInViewForms="FALSE"
     ShowInListSettings="TRUE"
     Hidden="FALSE"
     Required="FALSE">
</Field>


So these fields are not hidden, but they are invisible in edit forms etc. So the fields are picked up
by the search index.


Ok one problem solved, then we need to assign these hidden fields a value that can represent
no scheduling start and no expiration date, so that the search index will index them.

So we need a Event Receiver on the pages list that looks like this:

public class MyEventReceiver : SPItemEventReceiverBase
{
    public override void ItemUpdating(SPItemEventProperties properties, bool isCheckIn)
    {
        try
        {
            base.ItemUpdating(properties, isCheckIn);

            object startValue = properties.AfterProperties[Fields.PublishingStartDateName];
            if (startValue == null)
            {
                properties.AfterProperties[Fields.MySchedulingStartDateName] = SharePointContstants.MinDateIso8601String;
            }
            else
            {
                properties.AfterProperties[Fields.MySchedulingStartDateName] = startValue;
            }

            object endValue = properties.AfterProperties[Fields.PublishingExpirationDateName];
            if (endValue == null)
            {
                properties.AfterProperties[Fields.MySchedulingEndDateName] = SharePointContstants.MaxDateIso8601String;
            }
            else
            {
                properties.AfterProperties[Fields.MySchedulingEndDateName] = endValue;
            }
        }
        catch (Exception exception)
        {
            //Log to ULS
        }
    }

So in the event receiver we will assign these values as "null" values:
No start schedule = "1900-01-01T00:00:00Z"
No end schedule  = "8900-12-31T00:00:00Z"

DO NOT use DateTime.MinValue or DateTime.MaxValue, they are not compatible with
the DateTime field. So you need use the values above.
DateTime.MinValue = "0001-01-01 00:00:00"
DateTime.MaxValue = "9999-12-31 23:59:59"
If you try to assign these values, it will not update the field value, and it just silently ignores it.

DO NOT assign a DateTime object to the after properties, you need to assign a string value with correct datetime format.

So every time the item is updated we will update the hidden fields accordingly.
The hidden fields will always have a DateTime value and will be indexed by the search index.
You also need to create managed search properties for the "hidden" fields, so that we can use them in our search query.

Ok so how can we use this in our queries ?

using (KeywordQuery keywordQuery = new KeywordQuery(web))
{
    keywordQuery.TrimDuplicates = false;
    keywordQuery.StartRow = 0;
    keywordQuery.SelectProperties.Add(SearchFields.Title);
    keywordQuery.SelectProperties.Add(SearchFields.MySchedulingStartDate);
    keywordQuery.SelectProperties.Add(SearchFields.MySchedulingEndDate);
    keywordQuery.EnableSorting = true;
    keywordQuery.SortList.Add(SearchFields.ArticleDate, SortDirection.Descending);
    
    string contentType = "<MYContentTypeId>";
    string friendlyUrl = "<MYFriendlyUrl>";
    string now = SPUtility.CreateISO8601DateTimeFromSystemDateTime(DateTime.Now);
    string dateFilter = string.Format(" AND {0}<={2} AND {1}>={2}",SearchFields.MySchedulingStartDate, SearchFields.MySchedulingEndDate, now);
    string query = string.Format("Path:\"{0}\" AND ContentTypeId:\"{1}*\"", friendlyUrl, contentType) + dateFilter;
    keywordQuery.QueryText = query;

    SearchExecutor exec = new SearchExecutor();
    ResultTableCollection resultsCollection = exec.ExecuteQuery(keywordQuery);
    ResultTable resultsTable = resultsCollection.Filter("TableType"
    KnownTableTypes.RelevantResults).FirstOrDefault();
    return resultsTable;
}

Since all items will have a valid DateTime value the query is quite simple <= =>.

As i said the time portion when filtering on DateTime fields in the KQL
are just ignored, and there is no way that this can be handled by the KQL.
Another "By Design" from Microsoft. Which basically means it was too hard to implement at that time.

The strange thing is that the time portion are returned properly in the result, and that's a good thing so we just refilter the results when iterating the results from the search, like this:

DateTime now = DateTime.Now;
foreach (DataRow result in results.ResultRows)
{
    DateTime startDate = 
   Convert.ToDateTime(result[SearchFields.MySchedulingStartDate]).ToLocalTime();
    DateTime endDate = 
   Convert.ToDateTime(result[SearchFields.MySchedulingEndDate]).ToLocalTime();
    if (startDate <= now && endDate >= now)
    {

    }
}

Not a big problem, you will only get false results on the starting / ending dates. So it won't be many refiltered results.

Let's say you have an article that expires on: "2013-12-01 10:00:00"

Now this article will occur in the search result the entire day, and needs to be refiltered that day only.
So between 2013-12-01 10:00:00 and 2013-12-01 23:59:59 it will occur in the search result and will be refiltered.
The next day it will be filtered by the search, so no need for refiltering.

And of course if you still want to unpublish an article and remove it from the search index, you can still do this manually.

So that's just one way of doing it.

Welcome

Welcome to my brand new blog, and hopefully i will update this more frequently than the others.

Since i'm working fulltime with SharePoint now, the blog will be about SharePoint only.

I'm discovering strange things in SharePoint daily and i need to share my findings to everyone.

Regards
Bjørn