Apache POI Word 提取段落

为了提取段落文本,我们使用 XWPFDocument 类的 getParagraphs() 方法。此方法返回文档所有段落的列表,该列表可以存储在列表变量中并通过迭代循环获取。

让我们看一个使用 Java 程序提取段落的示例。

Apache POI Word提取段落示例

package com.yiidian;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import java.io.FileInputStream;

public class ReadParagraphExample {
    public static void main(String[] args) {  
         try(FileInputStream fis = new FileInputStream("yiidian.docx")) {
             XWPFDocument doc    = new XWPFDocument(OPCPackage.open(fis));
             java.util.List<XWPFParagraph> paragraphs =  doc.getParagraphs();
             for (XWPFParagraph paragraph: paragraphs){  
                 System.out.println(paragraph.getText());  
             }  
         }catch(Exception e) {  
             System.out.println(e);  
         }  
    }  
}  

Word文档内容如下:

控制台输出结果为:

Apache POI (Poor Obfuscation Implementation) is a project design and developed by Apache Software
Foundation. It is a collection of pure Java libraries, used to read and write Microsoft office 
files such as Word, PowerPoint etc. The purpose was to design a cross-platform API that can 
manipulate various file formats of Microsoft Office and Open Office Documents.

 

热门文章

优秀文章