Standard scope
This standard applies to:
- pennmedicine.org
- All Penn Medicine websites
- Penn Medicine mobile applications
Overview
Technical SEO techniques help improve search visibility, enhance user trust and experience, and increase site performance and efficiency. At Penn Medicine, we focus on the following elements of technical SEO work to improve crawlability and site performance:
- HTTPS encryption - Ensures site security and builds trust (also a ranking signal).
- Canonical URLs - Prevent duplicate content issues and clarify preferred versions of pages.
- XML sitemaps - Help search engines crawl and index pages efficiently.
- Robots.txt file - Block search engines from accessing content and sections of a website.
- Schema markup - Enhances search visibility with structured data.
HTTPS encryption
HTTPS encryption secures your site and protects user data while signaling trust to Google to:
- Avoid “Not Secure” browser warnings
- Improve ranking signals and user trust
At Penn Medicine, our standard is that all pages use HTTPS encryption. We maintain a valid SSL certificate for our domains to ensure secure, encrypted communication.
Canonical URLs
Canonical URLs tell Google which version of a page is the preferred one when duplicates exist to:
- Prevent duplicate content issues
- Consolidate ranking strength to one URL
At Penn Medicine, we ensure that URLs include a self-referencing canonical. All our canonicals are placed in the <head> of our HTML code.
For example, on our home page https://www.pennmedicine.org/ we include the following tag in the <head>: <link rel="canonical" href="https://www.pennmedicine.org/">
We also set canonicals on parameterized variations of URLs, so they point back to the core URL. An exception to this standard is when there is a similar page where a different URL variation is intentionally meant to rank on its own. In this case, the canonical would reference the preferred URL.
XML sitemaps
XML sitemaps help Google discover and index website pages efficiently to:
- Ensure all important pages are found and indexed
- Speed up discovery of new or updated content
At Penn Medicine, we maintain a clean sitemap with only the pages we want indexed.
Our sitemap only contains:
- Canonical, indexable URLs only
- High-value pages intended for search discovery
- Pages that are not blocked by robots.txt
Our sitemap excludes:
- Non-canonical URLs
- Parameterized URLs
- Redirects (308, 301, 307), 404s, or staging environments
- Duplicate or thin content pages
Robots.txt files
A robots.txt file is used to guide search engine crawlers by allowing access to important site areas and blocking non-essential or duplicate content.
This simple text file plays a crucial role because mishandling or altering the rules can have significant consequences. Pages and files meant for indexing should never be blocked, and all Disallow rules must be carefully reviewed to ensure they do not prevent critical content from being crawled.
At Penn Medicine, our robots.txt file currently does not use Disallow to block any content, ensuring full crawlability of public pages.
User-agent: *
Allow: /
Sitemap: https://www.pennmedicine.org/sitemap.xml
Any future Disallow rules will be added only when necessary and reviewed to prevent accidental blocking of indexable content.
Schema markup
Schema markup helps Google understand your content and enhances your listings with rich results to:
- Qualify for enhanced search results (reviews, FAQs, events, etc.)
- Improve visibility and click-through rates for both search engines and AI search / tools
- Strengthen a site’s authority and discoverability across search and AI platforms
At Penn Medicine, we improve our visibility by adding Schema markup to our content to help inform search engines and AI tools about:
- Who we are (organization, physicians, authors)
- What the page is about (conditions, treatments, services, locations)
- How pieces of content relate to one another (condition → treatment → provider → location)
Each page type has schema markup tailored to its specific purpose, implemented in JSON-LD. For example, a condition page uses MedicalCondition schema, while a treatment page uses MedicalProcedure schema to indicate that the content is about a procedure rather than a condition and vice versa.
Following are our current schema by page type.
- MedicalWebpage
- Hospital
- MedicalClinic
- Pharmacy
- Webpage (for patient stories)
- MedicalWebpage
- MedicalWebpage
Do’s and Don’ts
Do
- Keep the sitemap updated and refresh when pages are added, removed or moved
- Validate the sitemap formatting with Google XML Sitemap Validator
- Host the sitemap at the root domain for easy access
- Only include canonical URLs in the sitemap
- Secure the entire site using a valid SSL certificate
- Redirect all HTTP traffic to HTTPS using 301 redirects
- Update internal links and canonical tags to use HTTPS URLs
- Use HSTS headers to enforce HTTPS connections for repeat visitors
- Use JSON-LD format, as recommended by Google
- Implement only relevant schema types
- Validate markup with Google’s Rich Results Test or Schema org validator
- Keep markup content consistent with on-page content to reflect what users see
- Update schema when page content or metadata changes
- Use the <link rel="canonical"> tag to specify your preferred page version
- Point canonicals to self-referencing URLs on each primary page to prevent ambiguity
- Ensure consistency between canonical tags, sitemaps and internal links
- Use absolute URLs with the complete address instead of relative URLs which only specifies the path
Don't
- List noindex, 404, duplicate or redirect (3XX) pages in the sitemap
- Exceed 50,000 URLs in the sitemap
- Mix HTTP and HTTPS content - this causes “mixed content” errors that hurt both US and rankings
- Let SSL certificates expire as Google will flag as unsafe
- Use self-signed certificates for public-facing sites
- Use fake or misleading structured data for schema
- Overuse nested schemas that make your code unreadable and redundant
- Canonicalize to redirected, broken or non-indexed URLs
- Have multiple canonicals per page—only one canonical tag should exist
- Rely on canonicals to fix large-scale duplicate content issues
Related resources
- Penn Medicine SEO Standards
- Google Search Essentials: Google Search Essentials (formerly Webmaster Guidelines)
- Google Technical SEO Techniques: Technical SEO Techniques and Strategies
Contact
For assistance, please contact web-standards@pennmedicine.upenn.edu