Monday 7 August 2017

Topics to study for Data Architecture and Management Designer Exam

Hi folks,

I have passed my Data Architecture and Management Designer exam couple of months back and was keen to share my learnings but thanks to my procrastination it was delaying. Finally, I got a chance to post something in a structured way which might help you to pass this credentials.

Data Architect is the domain exam which falls under the Application Architect tree which leads to Technical Architect (its a long way but getting there) I won't go into the details of the credentials overview since all the instructions are mentioned here  rather I highlight the key areas which you need to focus on, in order to fully understand the large data volume considerations and become an expert in Data management. Roll up your sleeves and lets dig in:

Multi-tenancy & Metadata Overview:

Multitenancy is a means of providing a single application to multiple organizations such as different companies or departments within a company, from a single hardware-software stack.

To ensure that tenant-specific customizations do not breach the security of other tenants or affect their performance, Salesforce uses a runtime engine that generates application components from those customizations. By maintaining boundaries between the architecture of the underlying application and that of each tenant, Salesforce protects the integrity of each tenant’s data and operations.

When any updates are being made related to organization, the platform tracks metadata. Salesforce stores the application data for all virtual tables in a few large database tables, which are partitioned by tenant (organization) and serve as heap storage. The platform’s engine then materializes virtual table data at runtime by considering the corresponding metadata. Details here 



Search Architecture:

Search is the capability to query records based on free-form text.For data to be searched, it must first be indexed. The indexes are created using the search indexing servers, which also generate and asynchronously process queue entries of newly created or modified data. After a searchable object’s record is created or updated, it could take about 15 minutes or more for the updated text to become searchable.

Salesforce performs indexed searches by first searching the indexes for appropriate records, then narrowing down the results based on access permissions, search limits, and other filters. This process creates a result set, which typically contains the most relevant results. After the result set reaches a predetermined size, the remaining records are discarded. The result set is then used to query the records from the database to retrieve the fields that a user sees. Details here

Force.com query optimizer:

The Force.com query optimizer helps the database system’s optimizer produce effective execution plans for Salesforce queries. Force.com query optimizer works on the queries that are automatically generated to handle reports, list views, and both SOQL queries. Details here 

Skinny Tables:

Salesforce creates skinny tables to contain frequently used fields and to avoid joins, and it keeps the skinny tables in sync with their source tables when the source tables are modified. To enable skinny tables, contact Salesforce Customer Support. For each object table, Salesforce maintains other, separate tables at the database level for standard and custom fields. This separation ordinarily necessitates a join when a query contains both kinds of fields. A skinny table contains both kinds of fields and does not include soft-deleted records.

This table shows an Account view, a corresponding database table, and a skinny table that would speed up Account queries. Details here 



Indexes:

Salesforce supports custom indexes to speed up queries, and you can create custom indexes by contacting Salesforce Customer Support. Nulls in the criteria prevented the use of indexes. Details here

Index Tables

The Salesforce multitenant architecture makes the underlying data table for custom fields unsuitable for indexing. To overcome this limitation, the platform creates an index table that contains a copy of the data, along with information about the data types. By default, the index tables do not include records that are null (records with empty values)


Standard and Custom Indexed Fields:

The Force.com query optimizer maintains a table containing statistics about the distribution of data in each index. It uses this table to perform pre-queries to determine whether using the index can speed up the query.

Standard Indexed Fields:

Used if the filter matches less than 30% of the total records, up to one million records.
For example, a standard index is used if:
• A query is executed against a table with two million records, and the filter matches 600,000 or fewer records.
• A query is executed against a table with five million records, and the filter matches one million or fewer records.

Custom Indexed Fields

Used if the filter matches less than 10% of the total records, up to 333,333 records.
For example, a custom index is used if:
• A query is executed against a table with 500,000 records, and the filter matches 50,000 or fewer records.
• A query is executed against a table with five million records, and the filter matches 333,333 or fewer records.

Two-column Custom Indexes 

Two-column indexes are subject to the same restrictions as single-column indexes, with one exception. Two-column indexes can have nulls in the second column by default, whereas single-column indexes cannot, unless Salesforce Customer Support has explicitly enabled the option to include nulls.

PK Chunking: 

Use the PK Chunking request header to enable automatic primary key (PK) chunking for a bulk query job. PK chunking splits bulk queries on very large tables into chunks based on the record IDs, or primary keys, of the queried records. Each chunk is processed as a separate batch that counts toward your daily batch limit, and you must download each batch’s results separately. Details here

Big Objects for data archiving:

BigObjects is a new capability to let you store and manage data at scale on the Salesforce platform. This feature helps you engage directly with customers by preserving all your historical customer event data. If you have a large amount of data stored in standard or custom objects in Salesforce, use BigObjects to store historical data.

Best Practices for Data archiving

Big Objects 


Data Skew: 

“Data skew” refers to the condition of having the ratio of records to a parent record or to an owner out of balance. As a basic rule of thumb, you want to keep any one owner or parent record from “owning” more than 10,000 records.  If you find that is happening a lot, revisit the idea of archiving your data, or create a plan to spread the wealth by assigning the records to more owners, or to more parent records.

Data Governance

Data Governance – refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures

Salesforce governance simplified

Best Practice for Data Governance

Data Stewardship

Management and oversight of an organization’s data assets to help provide business users with high-quality data that is easily accessible in a consistent manner

Data Stewardship/ Data governance

Techniques for Optimizing Performance

1) Mashups - External Website and Callouts

2) Using SOQL and SOSL
SOQL - You know in which objects or fields the data resides. (if multiple object are related to eachother)

SOSL - You don’t know in which object or field the data resides, and you want to find it in the most efficient way possible. (if multiple objects aren't related to eachother) It is generally faster than SOQL if the search expression uses a CONTAINS term.

3) Deleting Data:

While the data is soft deleted, it still affects database performance because the data is still resident, and deleted records have to be excluded from any queries.

We recommend that you use the Bulk API’s hard delete function to delete large data volumes.

4) Defer Sharing Rules: 

It allows users to defer the processing of sharing rules until after new users, rules, and other
content have been loaded.

Best Practice for achieving good performance in deployments

1) Loading Data from the API
  • Use the fastest operation possible—insert() is fastest, update() is next, and upsert() is next after that. If possible, also break upsert() into two operations: create() and update().
  • When updating, send only fields that have changed (delta-only loads)
  • For custom integrations, authenticate once per load, not on each record.
  • Use Public Read/Write security during initial load to avoid sharing calculation overhead
  • When changing child records, group them by parent—group records by the field ParentId in the same batch to minimize locking conflict

2) Extracting Data from API 
  • Use the getUpdated() and getDeleted() SOAP API to sync an external system with Salesforce at intervals greater than 5 minutes
3) Searching
  • Keep searches specific and avoid using wildcards, if possible. For example, search with Michael instead of Mi*
  • Use single-object searches for greater speed and accuracy.
4) Deleting 
  • When deleting records that have many children, delete the children first.
  • When deleting large volumes of data, a process that involves deleting one million or more records, use the hard delete option of the Bulk API
Event Monitoring

Event monitoring is one of many tools that Salesforce provides to help keep your data secure. It lets you see the granular details of user activity in your organization. We refer to these user activities as events. You can view information about individual events or track trends in events to swiftly identify abnormal behavior and safeguard your company’s data. Details here

Field Audit Trail

Field Audit Trail lets you define a policy to retain archived field history data up to ten years, independent of field history tracking Details here 

Setup Audit Trail 

Setup Audit Trail tracks the configuration and metadata changes that you and other admins have made to your org. Details here

Merge accounts 

Merging multiple accounts into one helps you keep your data clean so you can focus on closing deals. Details here 

I would be highly suggest to go through the guide for best practices of deployments with large data volume.

Best of luck for your exam. Hope this information helps. Feel free to comment below if there's anything which need you understand more.

Cheers ears