Technical SEO standards

Standard scope

This standard applies to:

pennmedicine.org
All Penn Medicine websites
Penn Medicine mobile applications

Overview

Technical SEO techniques help improve search visibility, enhance user trust and experience, and increase site performance and efficiency. At Penn Medicine, we focus on the following elements of technical SEO work to improve crawlability and site performance:

HTTPS encryption - Ensures site security and builds trust (also a ranking signal).
Canonical URLs - Prevent duplicate content issues and clarify preferred versions of pages.
XML sitemaps - Help search engines crawl and index pages efficiently.
Robots.txt file - Block search engines from accessing content and sections of a website.
Schema markup - Enhances search visibility with structured data.

HTTPS encryption

HTTPS encryption secures your site and protects user data while signaling trust to Google to:

Avoid “Not Secure” browser warnings
Improve ranking signals and user trust

At Penn Medicine, our standard is that all pages use HTTPS encryption. We maintain a valid SSL certificate for our domains to ensure secure, encrypted communication.

Canonical URLs

Canonical URLs tell Google which version of a page is the preferred one when duplicates exist to:

Prevent duplicate content issues
Consolidate ranking strength to one URL

At Penn Medicine, we ensure that URLs include a self-referencing canonical. All our canonicals are placed in the <head> of our HTML code.
‍
For example, on our home page https://www.pennmedicine.org/ we include the following tag in the <head>: <link rel="canonical" href="https://www.pennmedicine.org/">
‍
We also set canonicals on parameterized variations of URLs, so they point back to the core URL. An exception to this standard is when there is a similar page where a different URL variation is intentionally meant to rank on its own. In this case, the canonical would reference the preferred URL.

XML sitemaps

XML sitemaps help Google discover and index website pages efficiently to:

Ensure all important pages are found and indexed
Speed up discovery of new or updated content

At Penn Medicine, we maintain a clean sitemap with only the pages we want indexed.

‍Our sitemap only contains:

Canonical, indexable URLs only
High-value pages intended for search discovery
Pages that are not blocked by robots.txt

Our sitemap excludes:

Non-canonical URLs
Parameterized URLs
Redirects (308, 301, 307), 404s, or staging environments
Duplicate or thin content pages

Robots.txt files

A robots.txt file is used to guide search engine crawlers by allowing access to important site areas and blocking non-essential or duplicate content.
‍
This simple text file plays a crucial role because mishandling or altering the rules can have significant consequences. Pages and files meant for indexing should never be blocked, and all Disallow rules must be carefully reviewed to ensure they do not prevent critical content from being crawled.
‍
At Penn Medicine, our robots.txt file currently does not use Disallow to block any content, ensuring full crawlability of public pages.

‍User-agent: * Allow: / Sitemap:https://www.pennmedicine.org/sitemap.xml‍

Any future Disallow rules will be added only when necessary and reviewed to prevent accidental blocking of indexable content.

Schema markup

Schema markup helps Google understand your content and enhances your listings with rich results to:

Qualify for enhanced search results (reviews, FAQs, events, etc.)
Improve visibility and click-through rates for both search engines and AI search / tools
Strengthen a site’s authority and discoverability across search and AI platforms

At Penn Medicine, we improve our visibility by adding Schema markup to our content to help inform search engines and AI tools about:

Who we are (organization, physicians, authors)
What the page is about (conditions, treatments, services, locations)
How pieces of content relate to one another (condition → treatment → provider → location)

Each page type has schema markup tailored to its specific purpose, implemented in JSON-LD. For example, a condition page uses MedicalCondition schema, while a treatment page uses MedicalProcedure schema to indicate that the content is about a procedure rather than a condition and vice versa.

Following are our current schema by page type.

Page Type

Current Schema

Conditions

- Condition
- MedicalWebpage

Form

- Webpage

Homepage

- Webpage

Location

Select only the most relevant:
- Hospital
- MedicalClinic
- Pharmacy

News Hub

- NewsArticle
- Webpage (for patient stories)

Physician resources

- MedicalWebpage

Physician

- NewsArticle

Provider

- Physician
- MedicalWebpage

Services

- MedicalWebpage

Specialty

- MedicalWebpage

Treatment

- MedicalProcedure
- MedicalWebpage

Do’s and Don’ts

Do

Keep the sitemap updated and refresh when pages are added, removed or moved
Validate the sitemap formatting with Google XML Sitemap Validator
Host the sitemap at the root domain for easy access
Only include canonical URLs in the sitemap
Secure the entire site using a valid SSL certificate
Redirect all HTTP traffic to HTTPS using 301 redirects
Update internal links and canonical tags to use HTTPS URLs
Use HSTS headers to enforce HTTPS connections for repeat visitors
Use JSON-LD format, as recommended by Google
Implement only relevant schema types
Validate markup with Google’s Rich Results Test or Schema org validator
Keep markup content consistent with on-page content to reflect what users see
Update schema when page content or metadata changes
Use the <link rel="canonical"> tag to specify your preferred page version
Point canonicals to self-referencing URLs on each primary page to prevent ambiguity
Ensure consistency between canonical tags, sitemaps and internal links
Use absolute URLs with the complete address instead of relative URLs which only specifies the path

Don't

List noindex, 404, duplicate or redirect (3XX) pages in the sitemap
Exceed 50,000 URLs in the sitemap
Mix HTTP and HTTPS content - this causes “mixed content” errors that hurt both US and rankings
Let SSL certificates expire as Google will flag as unsafe
Use self-signed certificates for public-facing sites
Use fake or misleading structured data for schema
Overuse nested schemas that make your code unreadable and redundant
Canonicalize to redirected, broken or non-indexed URLs
Have multiple canonicals per page—only one canonical tag should exist
Rely on canonicals to fix large-scale duplicate content issues

Related resources

Penn Medicine SEO Standards
Google Search Essentials: Google Search Essentials (formerly Webmaster Guidelines)
Google Technical SEO Techniques: Technical SEO Techniques and Strategies

Contact

For assistance, please contact web-standards@pennmedicine.upenn.edu

Last updated

Date

Version

Description

12/22/2025

1.0.0

Initial Release